Protein sequence viewer

A common element on several InterPro website pages is the protein sequence viewer (in the sequence search result, on the protein and structure pages). It summarises the InterPro entries (IPR) (top coloured bar) and member database signatures matches to the protein or structure being looked at, represented by the grey bar at the top of the viewer, categorised by InterPro entry types.

The AlphaFold confidence track is displayed in the protein sequence viewer in the protein page and in the AlphaFold subpage when a predicted structure is available.

The Representative Domains track is displayed in the protein sequence viewer in the protein page. This representation is generated automatically using the type of the member databases models, which might differ from the InterPro entries types. When multiple models are overlapping, the representative domain is chosen by selecting the model covering the longest region of the protein. Be aware that in case of models made of multiple fragments, not all the fragments are necessarily chosen as representative, they are considered as individual entities for the selection.

Protein sequence viewer

Various options, make it easy to work with (as illustrated in the figure above):

  1. Clicking on the Full screen button at the top of the viewer will switch to full screen view.

  2. The viewer can be zoomed in and out by:

  1. Clicking the two buttons (+ and -) at the top right corner.

  2. Dragging the grey scale at the top to the desired positions on both left and right sides

  3. Pressing the [Ctrl] key and scroll through the viewer

  1. More options that customise the viewer are grouped under Options dropdown.

Protein sequence viewer options
  1. Colour By allows to change the colours in which the InterPro entries and signatures bars based on accession, member database or domain relationship.

  2. The labels on the right side of the viewer can be customised. The Accession labels are shown by default. To see names and/or short names along with accession, the name/short name checkboxes should be ticked or if the user prefers to see the names/short names alone, the respective options should be selected.

  3. Snapshot has two options: Save as image allows to take a snapshot of the viewer and is saved as an image (.png).

  4. Collapse All allows to collapse all the signatures bars displayed in the viewer at once to only display the InterPro entries bars.

  1. The tooltips are shown when hovering over each bar. They can be disabled by unchecking the Tooltip Active option.

Protein sequence viewer tooltip

Tooltip example.

  1. Residues annotations are provided by the CDD, SFLD and PIRSR databases.

  1. Clicking on the header of a category (say Unintegrated) hides the bars for the entire category.

When zoomed in, panning can be achieved by either dragging the scale at the top or by dragging any bar in the desired direction (see figure below).

Protein sequence viewer panning

For some proteins, additional information are provided by resources other than the member database consortium, they are displayed under the Other features category of the viewer. Available data include:

  • Disordered regions from MobiDB

  • Transmembrane regions from Phobius and/or TMHMM

  • Coiled regions from COILS

  • Cytoplasmic/non-cytoplasmic domains from Phobius

  • Signal peptide regions from SignalP and/or Phobius

  • Spurious protein from AntiFam

  • CATH-FunFams is an automatically generated profile HMM database, with FunFams entries segregated by an entropy-based approach that distinguishes different patterns of conserved residues, corresponding to differences in functional determinants

  • Pfam-N annotations result from a deep learning methodology developed by the Google Research team led by Dr Lucy Colwell to increase the Pfam coverage of protein sequences

  • Eukaryotic linear motifs from ELM

For some proteins, we also have annotations that are fetched directly from the resource API. These annotations are displayed under the External Sources category of the viewer. Note: by default this category is collapsed. Available data include:

Protein sequence viewer External Sources for the protein O75069

Protein sequence viewer External Sources for O75069