23
Automate Function Prediction

Automate Function Prediction

  • Upload
    kalkin

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Automate Function Prediction. Outline. Goal How function is defined Why Gene Ontology Methods for protein function prediction End points. GOAL. A) You find a new protein B) You sequence the whole genome of your favorite organism Obtained gene (s) should be annotated - PowerPoint PPT Presentation

Citation preview

Page 1: Automate Function Prediction

Automate Function Prediction

Page 2: Automate Function Prediction

Outline

• Goal• How function is defined• Why Gene Ontology• Methods for protein function prediction• End points

Page 3: Automate Function Prediction

GOAL

• A) You find a new protein• B) You sequence the whole genome of your

favorite organism• Obtained gene(s) should be annotated

• A can be solved manually. B needs automatic tools

Page 4: Automate Function Prediction

How function is defined

• Functional description as text• Linking gene to Key Words (Uniprot)• Linking gene Gene Ontology • Linking gene to Signalling Pathways or

Biochemical Pathways (KEGG)

Page 5: Automate Function Prediction

Why Gene Ontology (GO)

• GO represents a popular standard currently in the gene annotation

• GO represents categories that represent gene function

• Creates an union for genes in same process• Easy summary for genes with similar function

Page 6: Automate Function Prediction

Why Gene Ontology (GO)

• 3 sub-parts: Biological Process, Molecular Function, Cellular Localization – Molecular Function => chemical activity– Biological Process => Biology, cellular process– Cellular localization => Location of gene

• Hierarchical structure– Categories with very precise function– Categories with less precise function– Categories with very broad function

Page 7: Automate Function Prediction

How GO helps

• End user: Summary categories for genes with various functions

• Computer programs: Classifier algorithms can be taught to predict the categories for genes

Page 8: Automate Function Prediction

Understanding GO• Amigo server

(http://amigo.geneontology.org/cgi-bin/amigo/go.cgi)

Page 9: Automate Function Prediction

Function Prediction: What can we use to predict function

• Sequence homology (BLAST result list)• Phylogenetic tree of sequences• Protein Domains (PFAM domains)• Short sequence patterns – motifs• Sequence features (sec. struct., low compl.

regions)

Page 10: Automate Function Prediction

Sequence Homology Methods

• Do a BLAST search with a query sequence• Collect GO classes for genes in the BLAST

result hit• Give a weight to each BLAST hit – often log(E-value)

• Combine the scores from the genes that belong to same GO class

• Report the top best / significant GO classes

Page 11: Automate Function Prediction

Sequence Homology Methods

• Simple methods• Programs– BLAST2GO (http://www.blast2go.com/b2ghome)

– GOTCHA (http://www.compbio.dundee.ac.uk/gotcha/gotcha.php)

– ARGOT(http://www.medcomp.medicina.unipd.it/Argot2/form.php)

– PFP (http://kiharalab.org/web/pfp.php)

Page 12: Automate Function Prediction

Phylogenetic tree methods

• Create the pair-wise distances for the set of genes• Do a hierarchical clustering of genes• Map the know GO functions to cluster tree• Look for unknown genes in a cluster with many

genes from the same GO class• Report the top best / significant GO classes

• More => http://genome.cshlp.org/content/8/3/163.full

Page 13: Automate Function Prediction

Phylogenetic tree methods

• These should outperform sequence homology methods (CAFA 2011?)

• Require a set of related genes• Often much heavier calculations• Programs:– Sifter

(http://genome.cshlp.org/content/early/2011/07/22/gr.104687.109)

Page 14: Automate Function Prediction

Prediction with Protein domains

• Look what protein domains there are in query protein (PFAM)

• Map the functions that are linked to domains to your query sequence– PFAM2GO

• Programs: InterProScan + PFAM2GO • Drawbacks: – This mapping is same in plant, mammal, bacteria– Many domains to specific function

Page 15: Automate Function Prediction

Prediction with Protein domains

• Benefits:– Can create annotation from separate domains– Similar seq:s do not have to be in database

• Programs (?): InterProScan (http://www.ebi.ac.uk/InterProScan/)

• Drawbacks: – The mapping is same in plant, mammal, bacteria– Many domains to specific function

Page 16: Automate Function Prediction

Prediction with patterns and motifs

• Same principle as before, but we look sequence patterns and motifs

• Map the functions that are linked to patterns to your query sequence

• Programs: – InterProScan – IBM BioDictionary (http://cbcsrv.watson.ibm.com/Tpa.html)

• Drawbacks and benefits appr. same as before

Page 17: Automate Function Prediction

Prediction with sequence features

• Again same principle as before • We look seq. features (see pict.)• These are given as an input to classifier

algorithm (Support Vector Machine)

Page 18: Automate Function Prediction

Prediction with sequence features

Page 19: Automate Function Prediction

Prediction with sequence features

• Benefits: – No actual seq. similarity needed– Info collected from vague similarities– Use of classifier => feature weighting

• Program: FFPred (http://bioinf.cs.ucl.ac.uk/ffpred/)

• Drawbacks: • Calculations probably quite heavy• No use of nearby sequence similarities (domains etc.)

Page 20: Automate Function Prediction

Our contribution: PANNZER

• Use BLAST result list• Add Taxonomic information• Score GO classes using a score that takes the

frequency of GO class in seq. DB into account• Method is used to predict:– GO Classes– Description line

Page 21: Automate Function Prediction

Our contribution: PANNZER

• Benefits:– Taking the species taxonomy into account– Improved use of statistics

• Not public yet

Page 22: Automate Function Prediction

Our contribution: No Name Yet

• Take PFAM domain predictions, BLAST similarities and Taxonomic information

• Feed this to feature selection and to classifier algorithm

• …Wait…• Method is used to predict GO-classes• Not public + testing is ongoing

Page 23: Automate Function Prediction

Conclusion

• These methods increasingly needed• Some methods exist• Unfortunately no clear evaluation (my

opinion)• Remember: These are predictions. No certain

info until they are tested in wet lab…