Protein sequence viewer

A common element on several InterPro website pages is the protein sequence viewer (in the sequence search result, on the protein and structure pages). It summarises the InterPro entries (IPR) (top coloured bar) and member database signatures matches to the protein or structure being looked at, represented by the grey bar at the top of the viewer, categorised by InterPro entry types.

The AlphaFold confidence track is displayed in the protein sequence viewer in the protein page and in the AlphaFold subpage when a predicted structure is available.

Protein sequence viewer

Various options, make it easy to work with (as illustrated in the figure above):

  1. Clicking on the Full screen button at the top of the viewer will switch to full screen view.

  2. The viewer can be zoomed in and out by:

  1. Clicking the two buttons (+ and -) at the top right corner.

  2. Dragging the grey scale at the top to the desired positions on both left and right sides

  3. Pressing the [Ctrl] key and scroll through the viewer

  1. More options that customise the viewer are grouped under Options dropdown.

Protein sequence viewer options
  1. Colour By allows to change the colours in which the InterPro entries and signatures bars based on accession, member database or domain relationship.

  2. The labels on the right side of the viewer can be customised. The Accession labels are shown by default. To see names and/or short names along with accession, the name/short name checkboxes should be ticked or if the user prefers to see the names/short names alone, the respective options should be selected.

  3. Snapshot has two options: Save as image allows to take a snapshot of the viewer and is saved as an image (.png). Print allows the user to print the viewer, thus supporting the download in PDF format.

  4. Collapse All allows to collapse all the signatures bars displayed in the viewer at once to only display the InterPro entries bars.

Protein sequence viewer collapsed

Collapsed categories view.

  1. The tooltips are shown when hovering over each bar. They can be disabled by unchecking the Tooltip Active option.

Protein sequence viewer tooltip

Tooltip example.

  1. Residues annotations are provided by the CDD, SFLD and PIRSR databases.

5. On the Protein entry page, clicking on the Fetch conservation button, will display the conservation information based on the PANTHER signatures. The conservation scores are generated using the following process:

  • The HMM model from the PANTHER database is run against the SwissProt database using hmmsearch, generating an HMM profile and a logo (graphical representation of the amino acid conservation).

  • The conservation score for each residue is determined, from the logo data, using the following formula: \(\frac {\sum (height\_arr)} {max\_height\_theory} \times 10\)

  • The model is aligned against the protein sequence.

Protein sequence viewer conservation track
  1. Clicking on the header of a category (say Unintegrated) hides the bars for the entire category.

When zoomed in, panning can be achieved by either dragging the scale at the top or by dragging any bar in the desired direction (see figure below).

Protein sequence viewer panning

For some proteins, additional information are provided by resources other than the member database consortium, they are displayed under the Other features category of the viewer. Available data include:

  • Disordered regions from MobiDB

  • Transmembrane regions from Phobius and/or TMHMM

  • Coiled regions from COILS

  • Cytoplasmic/non-cytoplasmic domains from Phobius

  • Signal peptide regions from SignalP and/or Phobius

  • Spurious protein from AntiFam

  • CATH-FunFams is an automatically generated profile HMM database, with FunFams entries segregated by an entropy-based approach that distinguishes different patterns of conserved residues, corresponding to differences in functional determinants

  • Pfam-N annotations result from a deep learning methodology developed by the Google Research team led by Dr Lucy Colwell to increase the Pfam coverage of protein sequences

When available, 3D structure and domain predictions from the Genome3D consortium are displayed in the Predicted 3D Structures and Predicted Domains categories respectively.

Protein sequence viewer Other features and Genome3D annotations for the protein O75069

Protein sequence viewer Other features and Genome3D annotations for O75069