Upload
quon-hubbard
View
32
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Genomics of Microbial Eukaryotes. Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA . Large and Complex Eukaryotes. Outline. Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm. Started with Human Genome Project. - PowerPoint PPT Presentation
Citation preview
Genomics of Microbial Eukaryotes
Igor Grigoriev,Fungal Genomics Program Head
US DOE Joint Genome Institute, Walnut Creek, CA
2
Large and Complex Eukaryotes
3
Outline
Eukaryotic Genome Annotation
Fungal Genomics Program
MycoCosm
4
Started with Human Genome Project
5genome.jgi.doe.gov
IMG
MycoCosm
150+ annotated eukaryotic genomes
6
Genomic assembly and ESTs
An
no
tati
on
Pip
elin
e
Gene predictions Gene predictions
Protein annotationsProtein annotations
Reference data mappingReference data mapping
Repeat maskingRepeat masking
Manual curation (optional)Manual curation (optional)
Annotation Pipeline
Analysis
Gene familiesGene expressionPhylogenomicsProteomicsProtein targetingetc
Annotation
ValidationsValidations
7
Protein-based methods build CDS exons around known protein alignments.(Fgenesh, GeneWise)
GenBank protein
Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest)
EST contig
Predict model
Predict model
Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark)
Train on known genes
ATG TGA
GT AG
exons introns5’UTR5’UTR3’UTR3’UTR
Promoter PolyA
Gene model
Eukaryotic Gene Prediction
8
More Gene Prediction
Use ESTs/cDNAs to extend, correct or predict gene models
• ESTEXT
Predicted model
ESTs
Extended model
5’UTR5’UTR 3’UTR3’UTR
ATG TGA
ATG TGADetect orthologs with poor alignments and refine with synteny based methods • FGENESH2
Genome A
Genome B
FGENESH
Representative set
GENEWISE
EXTERNAL MODELS
Non-redundant gene set is built from “the best” models from each locus according to homology and ESTs, followed by manual curation
9
Combine Gene Predictors for Better Quality
Eugene Genemark Fgenesh JGI Pipe
Number of gene models 11,547 9,609 8,409 12,270
Models with partial EST support 5544 3829 4567 5248
with full length EST support 2538 1182 2896 3073
EST coverage per gene 77.7% 68.2% 80.8% 79.1%
supported splice sites 41,581 40,808 45,498 47,671
Models with homology support 6758 6043 5750 7214
with strong homology support (80+%ide, 80+%cov.)
112 109 174 187
model coverage 64% 60% 68% 69%
Models with homology and EST support
2894 2172 2720 2953
Heterobasidion annosum v1.0
10
Re-annotation Using Comparative Genomics
MAKER JGI pipeline Re-annot
# of predicted gene models
9,940 12,290 12,802
with Swissprot hits 6,521 7,356 7,900
With non-repeat PFAM domains
5,365 6,010 6,353
with EST support 9,252 10,796 11,105
with >90% EST support
7,729 9,178 9,444
# of unique PFAM domains
2,207 2,245 2,322
EST coverage per gene
93.0% 93.3% 93.3%
# EST-supported splice sites
99,627 102,200 104,246
Asaf Salamov
11
Predicted protein
Protein Annotation
Higher order assignments:
Gene Ontology terms
EC numbers --> KEGG pathways
Gene families, with and without other species
Possible orthologs
(in nr, SwissProt, KEGG, KOG)
Possible paralog
(Blastp+MCL)
Domain
(InterPro, tmhmm)
Signal peptide
(signalP)
12
Validation with Transcriptomics
0%10%20%30%40%50%60%70%80%90%
100%
Other Genes
Supported by ESTs
Sanger 454 Illumina
5531
34
EST profile
Processing RNA-Seq with CombEST
models
ESTs
Old Sanger Days
Transformation of EST sequencing
13
Validation with Proteomics
Wright et al, BMC Genomics (2009)
14
Gene Cluster Analysis
Comparative analysis
15
Genome Portal Framework
16
Many Genes of Eco-responsive Daphnia pulex
First crustacean, aquatic animal sequenced, new model organism30,940 predicted D.pulex genes in ~200Mb genome85% supported by 1+ lines of evidence Colbourne et al, Science, 2011
17
Half of Daphnia Genes: no Homologs, Experessed Under Environmental Stress
With Evgeny Zdobnov’s group (Univ. Genève)
* Of 716 highly conserved single copy orthologs, Daphnia is missing only two
Colbourne et al, 2011
18
Outline
Eukaryotic Genome Annotation
Fungal Genomics Program
MycoCosm
19
Fungal Genomics for Energy & Environment
Grow Grow DegradeDegrade
Lignocellulose degradation
Plant symbiontsand pathogens
SugarFermentation
FermentFerment
Bio-refinery
GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications
20
13%10%
31%
5%
41%
DOE Joint GenomeInstitute
Broad Institute
Sanger Institute
Washington Univ
other
GOLD (October 2011)
758 fungal projects
21
• Chapter 1: Plant health• Symbiosis
• Plant Pathogenicity
• Biocontrol
• Chapter 2: Biorefinery• Lignocellulose degradation
• Sugar fermentation
• Industrial organisms
• Chapter 3: Diversity• Phylogentics
• Ecology
Genomic Encyclopedia of FungiGenomic Encyclopedia of Fungi
22
Genome-Centric View
Comparative View
http://jgi.doe.gov/fungi100+ fungal genomes5000+ visitors/month
23
Comparative Genome Analysis
24
Strategy: 1000 Fungal Genomes
Goal: Sequencing 1000 fungal genomes from across the Fungal Tree of Life will provide references for research on plant-microbe interactions and environmental metagenomics.
68%
23%
Ascomycota
Basidiomycota
Blastocladiomycota
Chytridiomycota
Glomeromycota
Microsporidia
Neocallimastigomycota
Unknown
Zygomycota
25
Strategy: Fungal Systems
Lichen: alga+
fungus
ECM:plant+
fungus
T.terrestris
Forest soil metagenomesS.commune
Model fungi Simple systems Complex environments
26
Model Mushroom Development
Ohm et al, 2010
SEQUENCE FUNCTION MODEL
WT
S.commune
<Transcriptomics>
Gene knock-outs
Modeling regulatory cascades
27
Summary
Eukaryotic Annotation Recipe:
Combine gene predictors, experimental data, and community expertise
Fungal Genomics: we aim to
scale-up sequencing & comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)
28
Enjoy Algae as well!
http://genome.jgi.doe.gov/Algae
29
AcknowledgementsJGI Staff
Our Users
30
Outline
Eukaryotic Genome Annotation
Fungal Genomics Program
MycoCosm