Functional Annotation Strategies -...


Citation preview

Lecture 3

Functional Annotation Strategies


Blast2GO is a flexible framework for functional annotation Many parameters involved in the B2G annotation rule:

Blast vs. InterPro approach Annotation Score threshold

Abstraction Evidence Code Weights...

Different annotation strategies can be envisaged How they do behave? Which is the “best”?

Evaluation Strategy

32 annotation strategies

8 datasets

EVALUATE Annotation Intensity Annotation Accuracy

cis-annotation Impact on Functional Genomics

Annotation Guidelines

Different Annotation Strategies



DataSets evaluated

Different EST and Protein DataSets

Results I: Blast result

The key factor for annotation success is a successful Blast results

Results II: Length effect

Annotation success is dependent on sequence length: Best > 400 nts (this is correlated with the chance to obtain positive Blast result)‏

Result III: Number of annotations

The number of annotated sequences increases with more

permissive annotation stiles But when adding InterPro, differences decrease

Annex does not increase # sequences


Result III: Number of annotations

The # GO/sequence also increases from strict to permissive annotations. Annex increases #GO/seqs

Result III : Number of annotations

Once electronic annotations are enabled (ECw for IEA >0.7), annotation styles stabilize

ECw(IEA) = 0 ECw(IEA) > 0.7

Result III: : Number of annotations

Mean GO level stays practically the same thought all styles

The abstraction term GOw has a small but significant effect on GO level.

Results IV: InterPro & Annex

The more restrictive the Blast strategy the stronger the augmentation by InterProv

Augmentation by Annex is practically constant

Result V: Manual Curation

- Default parameters give the best accuracy - Strict annotation is less informative (less GO terms)‏ - Generous and all-mapping is more informative but also more error-prone - InterPro alone annotates less sequences and with less GO terms

Results VI: Functional Genomics Enrichment analysis for 3 datasets

The more GO terms, the more enriched GO terms but semantically there were not many differences and the number of branches within each dataset was similar at different annotation styles

Results V: cis-annotation

By varying the %hit filter at annotation step, one could control possible cis-annotation errors

The major effect of setting a high %hit filter when annotation is a dramatic reduction in the number of annotated sequences, but changes on successfully annotated sequences are not high

# seqs in dataset



Take-home messages A positive Blast result and considering electronic evidences are the

key factors to sucessful annotation. Sequence length and quality is very important!!!

InterPro and Annex can increase annotation by 10-15% B2G default parameters are in general good and equivalent to what

one would annotate by a computational reviewed annotation procedure

The effect of annotation stringency on functional genomics tests is difficult to predict, but the more GO terms you have the more

enriched terms you can find. Allways a core functional message was found

Do not worry too much about erroneous cis-annotation!!

Our recommended annotation strategy

Use first B2G default settings

If many green sequences: annotate these with threshold 45

Add InterPro and Annex (in this order)‏

Check some protein families you might be interested in by keyword searching.

Improve this sequences manually (you can use the merge .annot

function for this)
