Upload
neka
View
42
Download
7
Tags:
Embed Size (px)
DESCRIPTION
Modeling Functional Genomics Datasets CVM8890-101. Lesson 2 13 June 2007Teresia Buza. Lesson 2: Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available. - PowerPoint PPT Presentation
Citation preview
Modeling Functional Genomics Modeling Functional Genomics DatasetsDatasets
CVM8890-101CVM8890-101
Lesson 2Lesson 2
13 June 200713 June 2007 Teresia BuzaTeresia Buza
Lesson 2: Introduction to Lesson 2: Introduction to functional annotation. functional annotation.
Orthologs and homologs; Orthologs and homologs; clusters of orthologous clusters of orthologous
genes (COGs) and the gene genes (COGs) and the gene ontology (GO); and how to ontology (GO); and how to
find what functional find what functional annotation is available. annotation is available.
1.Introduction to Functional Annotation
ATGTCCTATCCATGTCGTACAGATTGACGAGAT
Genomic hypothesisGenome
Protein
mRNA transcript
Gene
Transcriptome
Proteome
Central Dogma New technology
Genome sequencing
Transcript profiling
Protein quantification
What next?
Where are we?Where are we?
Functional annotation
Structural annotation
What is all this?
Genome Annotation
Biologists refer to both the annotation of the genome Biologists refer to both the annotation of the genome
and functional annotation of gene products:and functional annotation of gene products:
““Structural” AnnotationStructural” Annotation
& &
““Functional” AnnotationFunctional” Annotation
Structural annotation
Identification of genomic elements.• ORFs predicted during genome assembly• Location of ORFs • Gene structure • Coding regions • Location of regulatory motifs etc
Functional annotation
Attaching biological information to genomic elements.• Biochemical function • Biological function • Involved regulation and interactions • Expression etc
These steps may involve both biological experiments and in
silico analysis.
Structural & Functional AnnotationStructural & Functional Annotation
http://en.wikipedia.org/wiki/Genome_annotation#Genome_annotation (with modifications)
Why Functional Annotation?Why Functional Annotation?
Enables you to take large “laundry lists” of genes/proteins and turn them into a biologically useful model
• Annotation of gene products = Annotation of gene products = Gene OntologyGene Ontology (GO)(GO) annotation annotation
• Initially, predicted ORFs have no functional Initially, predicted ORFs have no functional
literature and GO annotation relies on literature and GO annotation relies on computational methods computational methods (rapid but ?Quantity vs Quality)(rapid but ?Quantity vs Quality)
• Functional literature exists for many genes/proteins Functional literature exists for many genes/proteins
prior to genome sequencing prior to genome sequencing (slow but provide high (slow but provide high quality annotations)quality annotations)
• GO annotation does not rely on a completed GO annotation does not rely on a completed genome sequence! genome sequence!
Functional AnnotationFunctional Annotation
Types of Functional annotationTypes of Functional annotationBased in direct experimental evidence of function Experiments in the same ORGANISM example:• Enzyme assays• Binding experiments• Pathway analysis• Synthetic lethals• Functional complementation• Gene mutations• RNAi• 2-hybrid interactions etc
Indirect Evidence of function• Expression analysis• Structure analysis• Sequence analysis
Problem:• Many genes/proteins have no annotation• Some have unknown functions Challenge:• We want to get the maximum functional
annotation for modeling our data
Solution:• Read papers (Pubmed etc) • Search for homologs/orthologs of known function• Homologs and orthologs help assign function….
Functional AnnotationFunctional Annotation
2. Finding Function: orthologs and homologs
What are Homologs, Orthologs, Paralogs?
Homolog Is a relationship between genes separated by the event of speciationor genetic duplication
Ortholog
Orthologs are homologous genes in different species that evolved from a common ancestor gene by speciation. Normally (not always), orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.
Paralog Paralogs are homologous genes related by duplication within a genome. Paralogs evolve new functions, even if these are related to the original one.
http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
http://www.ensembl.org/info/data/compara/tree_example1.jpg
Orthologs & Paralogs
orthologs
Paralogs
How to search for Orthology?How to search for Orthology?
BLAST : BLAST : http://www.ncbi.nlm.nih.gov/BLAST/http://www.ncbi.nlm.nih.gov/BLAST/• Sequence alignment search tool• Utilizes heuristic algorithm
MPsrch: http://www.ebi.ac.uk/MPsrch/• Sequence comparison tool• Implement Smith & Waterman algorithm• Utilizes exhaustive algorithm
Domain analysis: Domain analysis: http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml• Analysis of regions of sequence homology among sets of proteins that are not all full-
length homologs.• Homology domains often, but not always, correspond to recognizable protein folding
domains
Protein family databases Protein family databases (e.g. COGs & KOGs)(e.g. COGs & KOGs)• Superfamily: Complete set of proteins having sequence homology over essentially their
full length.• Subfamilies: Incomplete set of homologous proteins which yet encompass proteins of
diverse function
Systems for Functional AnnotationSystems for Functional Annotation
1.1. C Clusters of lusters of OOrthologous rthologous GGroups (COGs)roups (COGs)
ProkaryotesProkaryotes
2. eu2. euKKaryote aryote OOrthologous rthologous GGroups (KOGs)roups (KOGs)
EukaryotesEukaryotes
3.3. G Gene ene OOntology (GO)ntology (GO)
COGs & KOGs
Both are based on orthology. Both are based on orthology. Genes are assigned to broad Genes are assigned to broad
categories (A-Z)categories (A-Z) Each category corresponds to an Each category corresponds to an
ancient conserved domain ancient conserved domain
COGs - prokaryotesCOGs - prokaryotes KOGs - eukaryotesKOGs - eukaryotes
1.1. Information storage and processingInformation storage and processing
2.2. Cellular processes and signalingCellular processes and signaling
3.3. MetabolismMetabolism
4.4. Poorly characterizedPoorly characterized
COGs has 25 functional categories (A – Z) in four broad groups
Text search:Text search:
Clusters of Orthologous Groups (COGs)Clusters of Orthologous Groups (COGs)http://www.ncbi.nlm.nih.gov/COG/
INFORMATION STORAGE AND PROCESSING
[J] Translation, ribosomal structure and biogenesis [A] RNA processing and modification [K] Transcription [L] Replication, recombination and repair [B] Chromatin structure and dynamics
CELLULAR PROCESSES AND SIGNALING
[D] Cell cycle control, cell division, chromosome partitioning [Y] Nuclear structure [V] Defense mechanisms [T] Signal transduction mechanisms [M] Cell wall/membrane/envelope biogenesis [N] Cell motility [Z] Cytoskeleton [W] Extracellular structures [U] Intracellular trafficking, secretion, and vesicular transport [O] Posttranslational modification, protein turnover, chaperones
COGs CategoriesCOGs Categories
ftp://ftp.ncbi.nih.gov/pub/COG/COG/fun.txt
METABOLISM [C] Energy production and conversion [G] Carbohydrate transport and metabolism [E] Amino acid transport and metabolism [F] Nucleotide transport and metabolism [H] Coenzyme transport and metabolism [I] Lipid transport and metabolism [P] Inorganic ion transport and metabolism [Q] Secondary metabolites biosynthesis, transport and catabolism
POORLY CHARACTERIZED
[R] General function prediction only [S] Function unknown
COGs CategoriesCOGs Categories
ftp://ftp.ncbi.nih.gov/pub/COG/COG/fun.txt
Tatusov et al., 2000: The COG database: a tool for genome-scale analysis of protein functions and evolution
Classification of COGs by functional categories
Example 1
Effects of Antibiotics on Pasteurella multocida transcriptome
Nanduri et al 2006
Example 2
AMX
CTC
ENR
DecreaseIncrease
COG categories
05
1015202530
3540
05
10152025303540
0
5
10
15
20
25
30
35
40
- C D E F G H I J K L M N O P Q R S T U V
The Gene Ontology (GO)The Gene Ontology (GO)• The Gene Ontology (GO) is the de facto Standard for
functional annotation
• GO functional annotation is based on orthology AND direct experimental evidence
• GO terms allow much more detailed functional analysis (> 24,000 terms) than COGs & KOGs (25 broad terms)
• GO is a controlled vocabulary of terms split into three related ontologies covering basic areas of molecular biology:
molecular function: 8,123 terms biological process: 13,960 terms cellular component: 2,071 terms
GO Report 2007- 04
0 50 100 150 200 250 300 350
NucleusCell
CytoplasmMitochondrion
Plasma membraneCytosol
CytoskeletonExtracellular matrix
NucleoplasmEndoplasmic
Golgi apparatusIntracellularEndosome
CytoplasmicChromosome
NucleolusLysosome
Nuclear envelopeExtracellular spaceExtracellular region
Cellular_componentCilium
Nuclear chromosomeRibosome
PeroxisomeMicrotubule
VacuoleUnlocalized protein
Number of GO terms
Cellular Component
Functional Annotation of Chicken Proteomic data
Example 3
Use GO for…….Use GO for…….
• Modeling function in high-throughput datasets (arrays!) started by Fly, Yeast, Mouse (Ashburner et al 2000, 2001)
• Grouping gene products by biological functionGrouping gene products by biological function
• Determining which classes of gene products are Determining which classes of gene products are over-represented or under-representedover-represented or under-represented
• Focusing on particular biological pathways and Focusing on particular biological pathways and functions functions ((hypothesis-drivenhypothesis-driven))
• Relating a protein’s location to its functionRelating a protein’s location to its function
Annotating to the Annotating to the GOGO
• Need to show type of evidence of
function Literature curation: read and interpret
reviewed literature (IDA, IGI, IMP, IPI, IGC)
(TAS, NAS) Computational analysis (RCA, ISS, IEA)
http://www.geneontology.org/GO.evidence.shtml
4. How to find functional 4. How to find functional annotation for your speciesannotation for your species
How to find functional annotationHow to find functional annotation
For quick search you need to know:
Name of your species (e.g Sus scrofa, Aspergillus flavus) Taxonomy ID (e.g 9823 – S. scrofa, 5059 – A. flavus etc) Database to look in (e.g. NCBI, Uniprot, EBI-GOA, GOC, AgBase
etc)
Not all functional annotation for a species will be in one database!
Not very many species have a broad coverage of GO annotation…
BUT do not worry Search for their homologs might help May rely on manual annotation from literature (Refer Manual annotation Course on by Fiona McCarthy)
Functional annotationAre the genes/proteins in GenBank? Check by Taxon ID
GOA make GO annotations (IEA) usingautomated methods
Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes)
GOA collect all GO annotations& submit to GOC
GOA maintain annotation file
AgBase maintains annotation file
UniProtKB
Known?NM_, NP_
Fill in GO association file
Annotate by structural/sequence similarity ORTHOLOGS (ISS code)
Submit to AgBase(Agricultural Species)
GOC maintain annotation files• unfiltered GOA• filtered GOA
Yes
YesNo GO Manual annotations from literature
(IDA, IMP, IPI, IGI, IEP codes)
UniParc/IPI Annotate by structural/sequence similarity ORTHOLOGS (ISS code)
No
No GO Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes)
Annotate by structural/sequence similarity ORTHOLOGS (ISS code)
No GO
DemonstrationDemonstration