How to search the InterPro website?¶
A search can be performed on the InterPro homepage using the Search box component, by clicking on the Search tab in the navigation menu, or by clicking on the magnifying glass in the navigation banner. There are five different types of search available in InterPro:
The magnifying glass in the navigation banner allows a quick search for a specified keyword. A search can be triggered by entering some text and pressing the enter/return key or clicking the magnifying glass. If the keyword is text, the results will be displayed as described in the Text search. If the keyword entered is an accession, it automatically redirects to the corresponding InterPro page under the Browse tab in the navigation menu.
A sequence or a batch of sequences can be submitted in FASTA format in the dedicated text area or by uploading a fasta file. The “Advanced options” allows users to select the InterPro member databases of interest to search against (by default they are all selected). The sequence search is performed using the InterProScan software. While the sequence search is running, the user can continue to navigate through the website, other browser tabs or applications and will get a pop-up notification when the job has been completed (this requires the browser notifications to be allowed).
Sequence search results¶
Results of a protein sequence search are available under the Results tab in the navigation menu under Your InterProScan Searches section. This page displays the protein sequence searches you have performed in the last seven days, with the most recent one being displayed at the top. The status column gives an indication of whether or not the search has completed (green tick symbol / searching), if the search has been saved locally (the results will still be available even after the seven days limit set up on InterPro servers), or if the results have been imported (file symbol). Clicking on the job id or on the text in the results column opens a page where the results are summarised in a protein sequence viewer (more detailed information is provided for the Protein sequence viewer).
Previously ran searches can be imported either by typing the job ID in the Import text box, for searches performed in the last seven days on our servers, or by uploading an InterProScan output file in JSON format, the job is added to the Results table. If the second option is choosen and InterProScan was run using nucleotide sequences, a job result is created for each Open Reading Frame (ORF) and ORFs from the same nucleotide sequence are grouped accordingly. This import feature can be used by users requiring to have InterProScan graphic output formats for publications and other uses.
When a search has been run using a previous version of InterProScan, it can be re-run using the latest version of the software. When a batch of sequences has been submitted, group actions allow to Delete All, Re-run All, and Download All the submitted sequences at once.
On the search results page, some general information on the submitted sequence is provided, followed by the predicted InterPro protein family membership when available ( in the figure above). The search can be saved by clicking on the Save in Browser button. The status will be changed to “Imported file”. This means that the results will be available behind the usual seven days limit on the browser and machine the save has been done, and will only be deleted if the user deletes the job by clicking on the bin icon.
The sequence submitted is shown in its full length at the top of the protein sequence viewer (grey bar) . The purple/grey bar below indicates the predicted hydrophobicity of the sequence residues . This is followed below by InterPro entries and signatures matches, displayed in categories classified by InterPro entry types. Each coloured bar represents a domain, protein family, or important site that has been matched to part or all the length of the submitted protein sequence.
The top coloured bar represents the InterPro entry [4a, 5a].
Directly below the InterPro entry, additional coloured bars display the member database signatures that contributed to that InterPro entry [4b, 5b].
In the example above, four InterPro entries (1 family and 3 domain entries) have been found matching the submitted sequence. The first InterPro entry is for a protein family [4a], containing one member database signature, in this case from Prosite (PR01022) [4b]. The following three InterPro matches are domains. The top InterPro domain entry [5a] contains signatures from 3 member databases (Pfam, CDD and Prosite) [5b] which all represent the same domain. The remaining two InterPro domains contain one member database signature.
Additionally to the InterPro matches, information about the GO terms associated to the InterPro entries matching the protein are displayed below the sequence viewer when available. These GO terms are assigned manually to InterPro entries using on the Gene Ontology and reflect the Biological process, Molecular function or Cellular location the protein may have.
The text search is available by selecting the “text search” section under the Search tab in the website menu. The text search will search the following information in the database:
InterPro, protein, protein structure or member database signature accession
Entering a name, or keywords, retrieves a list of all the InterPro entries and InterPro member database signatures that contain these searched words in their title or description. By default the term searched is highlited in the results list and the description is shortened, clicking on the symbol located on the left hand side of the Export button removes the highlight and shows the full description text. The setting is saved and also applied to other text searches throughout the website.
Entering an accession number or an identifier (e.g. IPR020422 (InterPro), O00167 (UniProt), PF02932 (member database), GO:0007165 (GO term), 1t2v (structure), UP000005640 (proteome), cl00011 (set), A4 (gene)) gives an exact match and a quick access to the corresponding InterPro page. It also displays the list of the InterPro entries and any member database signatures linked to that accession number/identifier.
Selecting the accession number or name of any entry in the list of entries opens the corresponding InterPro page (e.g. member database signature, InterPro entry) under the Browse tab in the navigation menu. An overview of the entry is provided and tabs on the left hand-side menu allow specific information for the entry to be viewed, for example the species in which a protein has been found, or structures matching an entry. More information on the browsing an InterPro page section.
Domain architecture search¶
This search option allows the retrieval of protein sequences that contain specific Pfam/InterPro domains in a particular arrangement referred to as a “domain architecture”. For example, protein sequences containing both a SH2 domain and SH3 domain can be retrieved. Domains that the proteins should or should not contain can be included or excluded from the domain architecture respectively. Selecting “Order of domain matters” offers the possibility to arrange the domains in a particular order. Selecting “Exact match” performs the search to find proteins containing the selected domains only (no extra domain in the proteins). Domains can be selected by entering a domain name, a Pfam accession, or an InterPro accession if a Pfam entry is integrated in it.
Once a search is performed the corresponding results are displayed below the search component and show the number of proteins followed by the corresponding domain architecture. For each domain architecture, the domain size is displayed based on the real length of the domain, using a protein of reference. When hovering over a domain, more details are available in a tooltip, including the domain’s position. Clicking on the number of proteins redirects to the Browse tab in the navigation menu under the protein section, showing the list of proteins which can be filtered to a specific member database, if required, as described in the browse feature.
By default, Pfam entries are shown in the results. This can be changed to show InterPro entries by toggling the Pfam checkbox to InterPro and vice versa.
Using Browse feature to search and filter InterPro¶
The browse search page can be accessed by clicking on the Browse tab in the navigation menu. The browse search provides a powerful functionality to select subsets of data available in InterPro by selecting filters according to the results required. For example, this page can be used to browse all entries which have a contributing signature from a particular member database e.g. HAMAP, or to retrieve all proteins from a certain taxon, e.g. Escherichia coli, that contain a specific domain eg OmpA-like domain.
Below we describe how to use the browse search feature:
Select a data type
The browse page opens up with 7 data types to allow browsing of InterPro entries, Member databases signatures, Proteins, Structures, Taxonomies, Proteomes or Sets.
Select any additional filters
The filters options displayed for each data type will vary as appropriate.
Member database filter¶
The “Select your database” option is available when Browsing by Member DB, Protein, Structure, Taxonomy and Set. It allows results to be retrieved from all or a selection of InterPro member databases. Only the databases that contain signatures for the chosen data type are displayed as options. By default all the member databases are selected, expect when Browsing by Member DB, where Pfam is the default option selected.
The “Search entries” box allows results to be filtered to match the text entered. For example, the text could be a keyword that might be found in entry names. It also allows specific protein names or taxa to be entered. By default the term searched is highlighted in yellow in the results list, this can be disabled by clicking on the symbol appearing between the text box and Export button once the search has started, the setting is saved and also applied to other text searches throughout the website.
Data-type specific filters¶
InterPro entry filters¶
When Browse by InterPro is selected, two filter types can be applied:
InterPro Type: limits the data in the data views to the selected InterPro entry types.
Go Terms: filters by selected Go terms from InterPro2GO.
Member database filters¶
When Browse by Member DB is selected and a member database has been chosen, subsequent filters can be applied:
Member Database Entry Type: select the types of signatures required. This is dependent on the database type selected. For example, if a database contains both domains and family signatures you can filter the results for a specific type.
InterPro state: select all signatures from the selected database or only those signatures that have been integrated into InterPro.
Just as with the Member DB data type, Protein filters change based on the selection in the member database filter component. The basic filters are displayed irrespective of the selection made and an extra filter when the “All Proteins” option is selected.
If a member database has been selected, the following filters are displayed:
UniProt Curation: the UniProtKB is split into two sections. The reviewed set is manually curated (SwissProt) and the unreviewed set is derived from public databases automatically integrated into UniProt (TrEMBL).
Taxonomy: this filter allows the displayed list of proteins to be limited to certain organisms.
Sequence Status: this filter allows proteins to be limited to complete proteins or fragments.
Additionally to the filters mentioned above, when the “All Proteins” option is selected in the member database filter component, the Matching Entries filter is displayed. This filter allows the selection of proteins which do or do not contain matches to entries in the InterPro dataset.
Structure filters do not vary depending on which option has been selected in the member database filter component.
Experiment Type: this filter allows selection of structures based on the type of experimental data the structure is based on.
Resolution: this filter allows structures to be selected based on the resolution of the structure.
Data Display Options¶
The data display is the main part of the results section in the browse page and shows the data selected in the data type menu. The actual details shown will also be dependent on the selected data type.
The tabular view is the default view and is available for all InterPro data types. The table view icon formats data into a tabular view composed of rows representing individual entities. The table header describes the contents of each column. Clicking on one of the rows redirects to the corresponding InterPro page.
The grid view is available for all InterPro data types. It displays a series of cards summarising details of the entities being viewed. Clicking on one of the cards redirects to the corresponding InterPro page.
The tree view is currently only enabled for taxonomy data. The tree view icon is only shown where a tree view is possible. The taxonomy tree viewer can be navigated by clicking on nodes or using keyboard arrow keys. This component is also used in the Taxonomy entry page.