Browsing entries in the InterPro website

You can get to entry pages in InterPro in lots of different ways. Commonly this will involve clicking on a link to an entry from one of the search methods. This section describes the different types of entries and what you will find for each of their pages.

There are 7 categories of entry pages in InterPro:

The following entry data tabs are available when appropriate. We describe each in detail in the first entry page it appears in. Most entry data tabs will be described within the InterPro entry page.

InterPro entry page

An InterPro entry represents a unique protein homologous superfamily, family, domain, repeat or important site based on one or more signatures provided by the InterPro member databases.

InterPro entry page

InterPro entry page for IPR000562.

InterPro entry pages give a brief description of the entry, name and unique InterPro identifier. The InterPro entry type (homologous superfamily, family, domain, repeat or site) is also indicated by an icon (e.g. a D with a green background for a domain).

Clicking on the star symbol next to the entry name will save the entry as a Favourite. The full list of saved entries is available in the Favourites Entries component in the homepage. More information about the data provided in an

On the right hand side of the Add your annotation button on the right hand side allows the user to suggest updates to the InterPro annotation and the page member databases contributing signatures to the entry are shown in a box.

Overlapping homologous superfamilies and/or Relationships to other entries are indicated where available.

InterPro entry page can be found in the InterPro Entries : essential information section of the documentation.

Additional tabs in the left-hand side menu provide further information about the entry, and are displayed when the data is available. Types of data that may be available in the menu of an InterPro entry page include: Proteins, Domain architectures, Taxonomy, Proteomes, Structures, AlphaFold, Pathways and Interactions.

Proteins

List of proteins that are included in this entry displayed in a table. There is an the option to display only proteins that have been manually curated in UniprotKB (reviewed), only proteins that have been automatically annotated (unreviewed), or all proteins (both, default).

Domain architectures

Provides information about the different domains arrangements for the proteins matching this entry based on Pfam signatures. For InterPro entries, it provides information about where the domain is located in protein sequences and what, if any, combinations arise with other domains. Domain architectures can be downloaded in JSON and TSV formats through the Export button.

Taxonomy

List of species this entry is matching, based on data from UniProt taxonomy. The information can be displayed in 4 different ways through the view options menu:

Taxonomy subpage view options
  • Table with the list of all the species the proteins matching this entry are found in.

  • Taxonomy tree of all the species the proteins matching this entry are found in.

  • Sunburst view displays the taxonomy distribution of the proteins matching the entry, from the least specific at the centre to more specific going towards the outside.

  • Table with the number of proteins found for key species, these are 12 model organisms commonly used in scientific research: Oryza sativa subsp. japonica, Arabidopsis thaliana, Homo sapiens, Danio rerio, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli, Escherichia virus T4, Halobacterium salinarum.

Sunburst is the default view of the subpage. A range of options can be selected to customise the view:

  • The segment size can be adjusted based on the number of sequences matching a taxon (default) or by the number of species per taxon.

  • The sunburst depth can be adjusted between 2 to 8 rings.

Taxonomy sunburst view

Taxonomy sunburst view for PF00120

In the table views, for each organism, the taxonomy identifier and protein count information are provided. The ACTIONS column offers the possibility to:

  • View all the protein matches in the Proteins tab

  • Download a FASTA file of the protein matches

  • View the taxonomy information in the Taxonomy entry page

Proteomes

List of proteomes whose members are represented by proteins matching this entry. A proteome represents a set of proteins whose genomes have been fully sequenced. A given taxonomy node may have one or more proteomes, for example, to reflect different assemblies of a genome. Proteome data is imported from UniProt proteomes. For each proteome, the same set of actions are available than the ones in Taxonomy, the taxonomy information being replaced by proteome information in the Proteome entry page.

Structures

List of structures from the PDBe database that match to protein sequences included in this entry.

AlphaFold

AlphaFold protein structure predictions are generated by DeepMind [4].

At the top of the page a 3D viewer (powered by Mol*) shows an interactive view of the predicted structure for one of the proteins matching the InterPro entry. The structure is coloured by per-residue plDDT score, it can be zoomed in and out, and rotated. Clicking on a residue induces a zoom in effect and displays contacts with surrounding residues, clicking on the blank area around the structure zooms out.

The protein accession and organism are displayed on the left hand side, together with links to the corresponding AlphaFold and UniProt websites. The model confidence colour scale, determined using the plDDT score, is also displayed, varying from dark blue (very high confidence) to orange (very low confidence).

The data can be downloaded in PDB or mmCIF format, by clicking on the corresponding buttons below the 3D viewer.

AlphaFold page

AlphaFold structure predictions tab for IPR000562, UniProt O60449.

On an InterPro entry page, below the 3D viewer, a table containing the list of UniProt accessions matching the InterPro entry for which structure predictions have been generated is shown. For each protein it is possible to:

  • Access the Protein entry page by clicking on the UniProt accession or name

  • Access the Taxonomy entry page by clicking on the species

  • Display the structure prediction on the current page by clicking on the Show prediction button

On a protein entry page, below the 3D viewer, the protein sequence viewer displays the member database signatures and InterPro entries matching the protein. Hovering over a match highlights the corresponding section in the predicted structure 3D view.

Pathways

List of pathways identified for protein sequences included in this entry. This information is provided by the MetaCyc Metabolic Pathway Database and the Reactome database.

Interactions

List of proteins characterised in experimentally proven data in which the proteins matching an entry are involved in protein:protein interactions.

Member database page

InterPro provides entry pages for each signature that a member database holds. This includes signatures that have not yet been, or can’t be, integrated into InterPro (unintegrated signatures).

Member database signature entries provide information about which database the signature is from, the signature identifier, the type of entry as defined by the member database (e.g. family, domain or site), and the short name given to the entry by the member database.

Some member databases provide a description giving information about the family/domain or site function, when this is not the case and the signature is integrated in an InterPro entry, the InterPro description is displayed.

To address the absence of annotations for certain member database signatures, we’ve employed AI to automatically generate descriptions by extracting information from Swiss-Prot. It’s important to note that these descriptions have not undergone curator review, and we advise regarding them as preliminary sources of information. Read more on AI-generated descriptions.

Some member databases create groups of families that are evolutionary related. Pfam calls them clans, CDD uses the term superfamily and, for PIRSF and Panther the concept is associated with the parent families of their hierarchy. We use the umbrella term Clan to refer to Pfam groups and Set to refer to the other groups. When available, the set/clan to which the signature belongs to is indicated.

The right hand side of the page provides links to the InterPro entry in which this signature has been integrated, and an external link to the signature on the member database’s website when available. For Pfam signatures, the Add your annotation button allows the user to suggest updates to the Pfam annotation.

For signatures provided by the Pfam member database, a short extract of the wikipedia page is also displayed when available to complete the description.

Member database page

InterPro member database page for Pfam signature PF00040.

In addition to the Proteins, Taxonomy, Proteomes and Structures tabs, member database pages may also display information in the following additional tabs: Domain architectures, AlphaFold, Signature, Alignment and Curation.

Signature

The signature representing the model that defines the entry is visualised in this page as a logo, using Skylign. The logo data is displayed for the NCBIfam, Pfam, PANTHER, PIRSF, and SFLD member databases.

The visualisation displays the amino acid conservation for each residue in the model. To navigate large logos, the user can drag the rendered area to a desired position. Alternatively, the user can input a residue number to be viewed. When selecting a particular residue in the logo, the probabilities of each amino acid are displayed in the bottom part.

Member database signature tab

Alignment

This section allows users to view and download any available alignment file that is associated with the current member database signature. Currently, the alignment files are only available for the Pfam member database, but hopefully we will be able to include alignments for other member databases in the future.

First, one of the available alignments has to be selected. For example in the image below the user has selected the “seed” alignment. If the selected alignment has more than 1000 sequences, a warning message appears to inform users that big alignments can cause memory issues in the browser. A compressed file (gzip) of the current alignment is available by clicking on the Download button.

Interacting with the grey navigation bar over the sequences allows users to navigate the alignment; dragging the left and right limits of the navigation bar allows users to zoom to a particular position or adjust the zoom level. Alternatively, the zoom level can also be defined by scrolling up/down while holding the [ctrl] key. Scrolling up/down allows to move other sequences in the alignment into the visible area of the viewer.

Member database alignment tab

Curation

This section provides information about the curation of the signature. Currently, it is only available for the Pfam member database. It is divided into 2 subsections:

  • Curation: details about Pfam curators and Sequence ontology

  • HMM information: displays the HMM building command used and offers the possibility to download the HMM profile defining the signature

Member database curation tab

Subfamilies

This section provides a list of subfamilies derived from the signature and a link to get more information in the member database website. Currently, this list is available for the PANTHER and CATH-Gene3D member databases. For PANTHER subfamilies, the GO terms associated to them are also displayed.

Protein entry page

The Protein entry page contains information on a specific protein provided by UniProt. Protein pages can be accessed either by entering a UniProt accession or identifier in a Text search or by clicking on a protein accession from the Proteins tab in an entry page.

The protein page provides the protein accession, the short name (identifier) given to the protein by Uniprot, the length of the protein sequence, species in which the protein is found, the proteome it belongs to, the gene encoding for the protein and a brief description of the protein’s function where known. All the InterPro family entries this protein is matching are listed under “Protein family membership”. An external link to the protein entry in Uniprot, as well as the export of the matches in TSV format and the possibility to perform a HMMER search or an InterProScan search are provided on the right hand side of the page.

Protein entry page

Protein entry page for O00167.

The protein entry page also displays the protein sequence viewer to show the associated domains, sites etc.

When available, different isoforms of the protein can be selected to compare their InterPro matches with the consensus protein sequence. When an isoform is selected, a new protein sequence viewer corresponding to the selection is displayed and the url is update to reflect the change. The isoform matches can also be viewed side by side with the consensus protein sequence by clicking on the split icon Split icon after selecting an isoform.

When available, GO terms associated to InterPro entries and PANTHER families are displayed at the bottom of the page. GO terms provide information about Biological processes, Molecular function and Cellular components.

The following tabs may be available: Entries, Structures, Sequence, Similar proteins and AlphaFold.

Entries

List of InterPro entries that include this entity. The results can be filtered by member databases using the dropdown box located on the left side of the header of the result table. This functionality is available for all the tables presenting InterPro entries in the website.

InterPro matches corresponding to the protein

Sequence

This tab shows the protein FASTA sequence. The full sequence or part of the sequence (by selecting the region of interest) can be used to perform two types of search, available on the right side of the screen: InterProScan search or HMMER search, which redirects to the corresponding pages.

Similar proteins

List of proteins that have the same domain architecture as this protein, including the Pfam/InterPro accession for each domain. The list can be filtered to either show all the protein matches or only the reviewed proteins from UniProt.

Structure entry page

InterPro provides entries for all the structures available in the Protein Data Bank in Europe (PDBe). A structure search can be performed by clicking on a structure provided in a results list or by entering the protein structure identifier in the Quick search box (magnifying glass symbol) or by performing a Text search.

At the top of the structure page, general information about the structure is displayed: the structure’s accession number (PDB ID), resolution, release date, the method used to determine the structure (e.g. “Xray”) and the chains composing the structure. External links to PDBe, RCSB PDB, PDBsum, CATH, SCOP, ECOD and Proteopedia are provided on the right hand side of the page.

Following, the general information section, a 3D viewer (powered by Mol*) shows an interactive view of the 3D structure. Hovering over a residue displays the name of the entry, the chain and residue information below the viewer. Clicking on a residue in the viewer induces a zoom in effect and displays contacts with surrounding residues, clicking on the blank area around the structure zooms out. Below it, the protein sequence viewer with the InterPro matches is displayed for each chain. It has an extra category representing the secondary structure information. Hovering over one of the tracks highlights the corresponding region of the protein structure in the 3D structure viewer.

Structure entry page

Structure entry page for 1t2v.

More information is available on the corresponding train online section.

The following tabs may be available: Entries and Proteins.

Taxonomy entry page

Taxonomy pages display the name, taxonomy ID, lineage and children nodes for a particular taxon. Any reference to this taxon from another page throughout the website will link to this page.

The overview also includes a graphical representation of the lineage of the selected taxon. The nodes in the visualisation are also links, so you can jump to the page of a particular taxon of interest.

Taxonomy entry page

Taxonomy entry page for Caenorhabditis elegans.

The following tabs may be available: Entries, Proteins, Structures and Proteomes.

Proteome entry page

The proteome entry page displays general information provided by UniProt: its ID, strain, and a link to the related species.

The following tabs may be available: Entries, Proteins and Structures.

Proteome entry page

Proteome entry page for UP000001940.

The image shows the proteome page for C. elegans, whose proteome ID is UP000001940, and as you can see from the counters in the tabs, has 10K related InterPro entries, 27K proteins and 469 structures. Links to the corresponding proteome pages in UniProt and Rfam can be found at the right hand side and a description of the organism (provided by UniProt) is displayed below. Notice this data is for InterPro version 99.0, and it is used here just as an example.

When selecting the tab Entries, a list of the InterPro entries matching any sequence in the proteome is displayed. The list of entries of any of the member databases is shown instead by selecting the database (provided it contains any instance) in the dropdown list that appears after clicking the box on top of the list.

Set/Clan entry page

Some InterPro member databases create groups of families that are evolutionary related, called sets/clans. This page offers an overview of a specific set/clan provided by a member database, it includes a short description and an interactive view of the signatures included in the set/clan. For the interactive view, different label types can be chosen through the Label Content menu: Accession, Name and Short name. For clans provided by the Pfam member database, an additional section provides literature references, when available.

Set entry page

Set entry page for cl00011 (CDD)

The following tabs may be available: Entries, Proteins, Structures, Taxonomy, Proteomes and alignment_clan.

Entries

Provides the list of signatures included in the set/clan (accession, name and short name).

For Pfam clans, the Entries tab contains the list of Pfam entries included in the clan and links to the entries SEED alignment and domain architectures pages.