17
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub- families Homeomorphic Family: Homologous proteins sharing full-length similarity and common domain architecture Significance Improve sensitivity of protein identification and functional inference Detect and correct genome annotation errors systematically Provide basis for evolutionary and comparative genomics research Provide basis for automated annotation of protein features: annotate generic biochemical and specific biological functions Protein Classification and Functional Annotation Discovery of New Knowledge by Using Information Embedded within Families of Homologous Sequences and Their Structures

PIRSF Classification System

  • Upload
    duc

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

PIRSF Classification System. Protein Classification and Functional Annotation. PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family : Homologous proteins sharing full-length similarity and common domain architecture Significance - PowerPoint PPT Presentation

Citation preview

Page 1: PIRSF Classification System

PIRSF Classification System

PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing full-length

similarity and common domain architecture Significance

Improve sensitivity of protein identification and functional inference Detect and correct genome annotation errors systematically Provide basis for evolutionary and comparative genomics research Provide basis for automated annotation of protein features: annotate

generic biochemical and specific biological functions

Protein Classification and Functional Annotation

Discovery of New Knowledge by Using Information Embedded within

Families of Homologous Sequences and Their Structures

Page 2: PIRSF Classification System

A protein may be assigned to only one homeomorphic family, which may have zero or more child nodes and zero or more parent nodes. Each homeomorphic family may have as many domain superfamily parents as its members have domains.

Page 3: PIRSF Classification System

Creation and Curation of PIRSFs

Computer-Generated (Uncurated) Clusters

Preliminary Curation Membership Signature

Domains

Full Curation Family Name,

Description, Bibliography

PIRSF Name Rules

UniProtKB proteins

Preliminary Homeomorphic Families

Orphans

Curated Homeomorphic Families

Final Homeomorphic Families

Add/remove members

Name, refs, abstract, domain arch.

Automatic clustering

Computer-assisted Manual Curation

Automatic Procedure Unassigned proteins

Au

tom

atic

pla

ce

me

nt

Create hierarchies (superfamilies/subfamilies)

Map domains on Families

Merge/split clusters

New proteins

Protein name rule/site rule

Build and test HMMs

Page 4: PIRSF Classification System

PIRSF family classification system

http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml

Page 5: PIRSF Classification System

PIRSF Text Search

Add extra input boxes for advanced search

Select field

Ways to get to PIRSF text search

Page 6: PIRSF Classification System

PIRSF Text Search Result (I)Things you can do from the result table:1. Add search terms or start search over

2. Customize the table columns

3. Save your results as table or FASTA format

4. Select entries using check boxes and perform analysis using tool bar options

5. Links to PIRSF records, PIRSF hierarchy, to protein domains (Pfam) 1

23

4

5

Page 7: PIRSF Classification System

PIRSF Text Search Result (II)2. How to customize the table columns: Display KEGG pathway ID column

a- Select KEGGPathway ID in the “Fields not in display” box

c- Now KEGG ID should be in the “Fields in display”. Press apply button for the changes to take place.

b- Use the > to add item into the “Fields in display” box

Page 8: PIRSF Classification System

PIRSF Text Search Result (III)3. Save your results as table or FASTA format

a- Select Entries using check boxes in the PIRSF column. To select all, check the box in the column heading.

b- Click on “Save Result As: Table” to store the information in the result table. This file can be opened in Excel as shown below.

c- Click on FASTA to save protein sequences.

Page 9: PIRSF Classification System

PIRSF Text Search Result (IV)

a- Select families using check boxes in the PIRSF ID column. To select all, check the box in the column heading. Then select tool, e.g., Taxonomy Distribution

4. Select entries using checkboxes and perform analysis using tool bar options

Display taxonomic distribution for the selected families. In this case, PIRSF001501 and PIRSF017318 contain members of the AroQ class from prokaryotes and eukaryotes, respectively, which is also reflected in the family name.

Page 10: PIRSF Classification System

PIRSF Text Search Result (V)4. Note on selecting families for analysis for Multiple Alignment and Domain

Display:

• If more than one family is selected the chosen tool will perform the operation on representative members of the selected families. Example: multiple alignment PIRSF001501, PIRSF500251, PIRSF026640 and PIRSF029775.

• If one family is selected the chosen tool will perform the operation on the seed members. Example: multiple alignment PIRSF001501

Page 11: PIRSF Classification System

PIRSF Text Search Result (VI)5. The result table contains summarized information about family size, domain architecture, level of curation. Additional data can be viewed by using the Display Option.

PIRSF Name: The names assigned to PIRSF predominantly reflect the membership.  The main source of PIRSF names is the literature. Fully curated families have a name accompanied, in most cases, by an evidence tag:[Validated]: to indicate that at least one member in the family has experimentally determined function. [Predicted]: for families whose functions are inferred computationally based on sequence similarity and/or functional associative analysis.[Tentative]: cases where experimental evidence is not decisive.

Curation Status: Indicates the level of manual curation of the PIRSF.Uncurated: Computer-generated protein clusters, no manual curation. The clusters are computationally defined using both pairwise based parameters (% sequence identity, sequence length ratio and overlap length ratio) and cluster-based parameters (% matched members, distance to neighboring clusters and overall domain arrangement).Preliminary: Computer-generated clusters are manually curated for membership (do proteins belong to the assigned cluster?) and domain architecture (Pfam domains listed from N- to C- termini). Full/Full (with description): A name is assigned to the protein family, and accompanying references are listed when available. In many cases, brief descriptions are also provided.

Hfam/Superfam/Subfam: Indicates the hierarchical level for the PIRSF: homeomorphic, superfamily or subfamily level, respectively. Selecting the button will show the PIRSF hierarchy in a DAG view with Pfam as the top node.

Page 12: PIRSF Classification System

5. PIRSF hierarchy in DAG view (cont.)

Pfam level

Hfam level

Subfam level

Page 13: PIRSF Classification System

PIRSF Family Report (I): Curated Protein Family Information

Level of manual curation

Hierarchy with Pfam domain at the highest node

Taxonomic distribution of PIRSF can be used to infer evolutionary history of the proteins in the PIRSF

Phylogenetic tree and alignment view allows further sequence analysis

See graphical display of Pfam domains assigned with high confidence

Page 14: PIRSF Classification System

PIRSF Family Report (II)

Integrated value-added information from other databases

Mapping to other protein classification databases

Page 15: PIRSF Classification System

PIRSF: Batch RetrievalRetrieve PIRSF families by selecting a specific identifier or a combination of identifiers.

List IDs

Define IDs

Display the list of query/PIRSF matches

Page 16: PIRSF Classification System

PIRSF SCAN (sequence search)

Page 17: PIRSF Classification System

PIRSF SCAN (sequence search)

UniProtKB sequence Q8Y5X7 is automatically classified as chorismate mutase of the AroH classPIRSF005965

Returns only matches to fully curated PIRSFs