Automate Function Prediction

Outline

• Goal• How function is defined• Why Gene Ontology• Methods for protein function prediction• End points

• A) You find a new protein• B) You sequence the whole genome of your

favorite organism• Obtained gene(s) should be annotated

• A can be solved manually. B needs automatic tools

How function is defined

• Functional description as text• Linking gene to Key Words (Uniprot)• Linking gene Gene Ontology • Linking gene to Signalling Pathways or

Biochemical Pathways (KEGG)

Why Gene Ontology (GO)

• GO represents a popular standard currently in the gene annotation

• GO represents categories that represent gene function

• Creates an union for genes in same process• Easy summary for genes with similar function

Why Gene Ontology (GO)

• 3 sub-parts: Biological Process, Molecular Function, Cellular Localization – Molecular Function => chemical activity– Biological Process => Biology, cellular process– Cellular localization => Location of gene

• Hierarchical structure– Categories with very precise function– Categories with less precise function– Categories with very broad function

How GO helps

• End user: Summary categories for genes with various functions

• Computer programs: Classifier algorithms can be taught to predict the categories for genes

Understanding GO• Amigo server

(http://amigo.geneontology.org/cgi-bin/amigo/go.cgi)

Function Prediction: What can we use to predict function

• Sequence homology (BLAST result list)• Phylogenetic tree of sequences• Protein Domains (PFAM domains)• Short sequence patterns – motifs• Sequence features (sec. struct., low compl.

regions)

Sequence Homology Methods

• Do a BLAST search with a query sequence• Collect GO classes for genes in the BLAST

result hit• Give a weight to each BLAST hit – often log(E-value)

• Combine the scores from the genes that belong to same GO class

• Report the top best / significant GO classes

Sequence Homology Methods

• Simple methods• Programs– BLAST2GO (http://www.blast2go.com/b2ghome)

– GOTCHA (http://www.compbio.dundee.ac.uk/gotcha/gotcha.php)

– ARGOT(http://www.medcomp.medicina.unipd.it/Argot2/form.php)

– PFP (http://kiharalab.org/web/pfp.php)

Phylogenetic tree methods

• Create the pair-wise distances for the set of genes• Do a hierarchical clustering of genes• Map the know GO functions to cluster tree• Look for unknown genes in a cluster with many

genes from the same GO class• Report the top best / significant GO classes

• More => http://genome.cshlp.org/content/8/3/163.full

Phylogenetic tree methods

• These should outperform sequence homology methods (CAFA 2011?)

• Require a set of related genes• Often much heavier calculations• Programs:– Sifter

(http://genome.cshlp.org/content/early/2011/07/22/gr.104687.109)

Prediction with Protein domains

• Look what protein domains there are in query protein (PFAM)

• Map the functions that are linked to domains to your query sequence– PFAM2GO

• Programs: InterProScan + PFAM2GO • Drawbacks: – This mapping is same in plant, mammal, bacteria– Many domains to specific function

Prediction with Protein domains

• Benefits:– Can create annotation from separate domains– Similar seq:s do not have to be in database

• Programs (?): InterProScan (http://www.ebi.ac.uk/InterProScan/)

• Drawbacks: – The mapping is same in plant, mammal, bacteria– Many domains to specific function

Prediction with patterns and motifs

• Same principle as before, but we look sequence patterns and motifs

• Map the functions that are linked to patterns to your query sequence

• Programs: – InterProScan – IBM BioDictionary (http://cbcsrv.watson.ibm.com/Tpa.html)

• Drawbacks and benefits appr. same as before

Prediction with sequence features

• Again same principle as before • We look seq. features (see pict.)• These are given as an input to classifier

algorithm (Support Vector Machine)

Prediction with sequence features

• Benefits: – No actual seq. similarity needed– Info collected from vague similarities– Use of classifier => feature weighting

• Program: FFPred (http://bioinf.cs.ucl.ac.uk/ffpred/)

• Drawbacks: • Calculations probably quite heavy• No use of nearby sequence similarities (domains etc.)

Our contribution: PANNZER

• Use BLAST result list• Add Taxonomic information• Score GO classes using a score that takes the

frequency of GO class in seq. DB into account• Method is used to predict:– GO Classes– Description line

Our contribution: PANNZER

• Benefits:– Taking the species taxonomy into account– Improved use of statistics

• Not public yet

Our contribution: No Name Yet

• Take PFAM domain predictions, BLAST similarities and Taxonomic information

• Feed this to feature selection and to classifier algorithm

• …Wait…• Method is used to predict GO-classes• Not public + testing is ongoing

Conclusion

• These methods increasingly needed• Some methods exist• Unfortunately no clear evaluation (my

opinion)• Remember: These are predictions. No certain

info until they are tested in wet lab…

Automate Function Prediction

Documents

10/24/05 Promoter Prediction RNA Structure & Function Prediction

Consistent probabilistic outputs for protein function prediction

Protein Function Prediction Based on Domain Content

Method for Prediction of Protein Function from Sequence ...brd/Teaching/Bio/asmb/...Method for Prediction of Protein Function from Sequence using the Sequence-to-Structure-to-Function

Towards computational prediction of microRNA function and ...repro.ucsd.edu/Laurent/SiteAssets/PDF/Primary...Towards computational prediction of microRNA function and activity Igor

Quantitative assessment of protein function prediction ... · PDF fileQuantitative assessment of protein function prediction ... the development of new techniques for protein ... Assessment

Protein Homology Analysis for Function Prediction with ...Protein Homology Analysis for Function Prediction with Parallel Sub-Graph Isomorphism Alper Küçükural1,2, Andras Szilagyi1,

Protein Homology Analysis for Function Prediction with

COFACTOR: improved protein function prediction by ...COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information Chengxin

DNA/Protein structure-function analysis and prediction

Hierarchical ensemble methods for protein function prediction

Protein Molecular Function Prediction by Bayesian Phylogenomicsbee/pubs/sifter-plos.pdf · 2012-09-17 · Protein Molecular Function Prediction by Bayesian Phylogenomics ... Muratore

Prediction and Fuzzy Logic at ThomasCook to automate price ... › conferences › useR-2009 › ... · automate price settings of last minute offers Jan Wijffels: jwijffels@bnosac.be

Protein Function Prediction Studies Ppts

The CAFA challenge reports improved protein function prediction … · The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds

AUTOMATE THE EDGE - Ansible Automates...NETWORK FUNCTION VIRTUALIZATION Next Generation Mobile Networks 9 AUTOMATE THE EDGE Responsible for: Global Infrastructure Mobile connectivity

Structure-Based Function Prediction of Functionally Unannotated

Protein Function Prediction - University of Missouricalla.rnet.missouri.edu/cheng_courses/infoinst8010_2009/... · 2010. 8. 16. · Protein Function Prediction Jianlin Cheng, PhD

Protein Molecular Function Prediction

Protein function prediction with semi-supervised ... · PDF fileProtein function prediction with semi-supervised classiﬁcation based on evolutionary multi-objective optimization