30
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA <[email protected]>

Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

Embed Size (px)

Citation preview

Page 1: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

Genomics of Microbial Eukaryotes

Igor Grigoriev,Fungal Genomics Program Head

US DOE Joint Genome Institute, Walnut Creek, CA

<[email protected]>

Page 2: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

2

Large and Complex Eukaryotes

Page 3: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

3

Outline

Eukaryotic Genome Annotation

Fungal Genomics Program

MycoCosm

Page 4: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

4

Started with Human Genome Project

Page 5: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

5genome.jgi.doe.gov

IMG

MycoCosm

150+ annotated eukaryotic genomes

Page 6: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

6

Genomic assembly and ESTs

An

no

tati

on

Pip

elin

e

Gene predictions Gene predictions

Protein annotationsProtein annotations

Reference data mappingReference data mapping

Repeat maskingRepeat masking

Manual curation (optional)Manual curation (optional)

Annotation Pipeline

Analysis

Gene familiesGene expressionPhylogenomicsProteomicsProtein targetingetc

Annotation

ValidationsValidations

Page 7: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

7

Protein-based methods build CDS exons around known protein alignments.(Fgenesh, GeneWise)

GenBank protein

Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest)

EST contig

Predict model

Predict model

Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark)

Train on known genes

ATG TGA

GT AG

exons introns5’UTR5’UTR3’UTR3’UTR

Promoter PolyA

Gene model

Eukaryotic Gene Prediction

Page 8: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

8

More Gene Prediction

Use ESTs/cDNAs to extend, correct or predict gene models

• ESTEXT

Predicted model

ESTs

Extended model

5’UTR5’UTR 3’UTR3’UTR

ATG TGA

ATG TGADetect orthologs with poor alignments and refine with synteny based methods • FGENESH2

Genome A

Genome B

FGENESH

Representative set

GENEWISE

EXTERNAL MODELS

Non-redundant gene set is built from “the best” models from each locus according to homology and ESTs, followed by manual curation

Page 9: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

9

Combine Gene Predictors for Better Quality

Eugene Genemark Fgenesh JGI Pipe

Number of gene models 11,547 9,609 8,409 12,270

Models with partial EST support 5544 3829 4567 5248

with full length EST support 2538 1182 2896 3073

EST coverage per gene 77.7% 68.2% 80.8% 79.1%

supported splice sites 41,581 40,808 45,498 47,671

Models with homology support 6758 6043 5750 7214

with strong homology support (80+%ide, 80+%cov.)

112 109 174 187

model coverage 64% 60% 68% 69%

Models with homology and EST support

2894 2172 2720 2953

Heterobasidion annosum v1.0

Page 10: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

10

Re-annotation Using Comparative Genomics

MAKER JGI pipeline Re-annot

# of predicted gene models

9,940 12,290 12,802

with Swissprot hits 6,521 7,356 7,900

With non-repeat PFAM domains

5,365 6,010 6,353

with EST support 9,252 10,796 11,105

with >90% EST support

7,729 9,178 9,444

# of unique PFAM domains

2,207 2,245 2,322

EST coverage per gene

93.0% 93.3% 93.3%

# EST-supported splice sites

99,627 102,200 104,246

Asaf Salamov

Page 11: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

11

Predicted protein

Protein Annotation

Higher order assignments:

Gene Ontology terms

EC numbers --> KEGG pathways

Gene families, with and without other species

Possible orthologs

(in nr, SwissProt, KEGG, KOG)

Possible paralog

(Blastp+MCL)

Domain

(InterPro, tmhmm)

Signal peptide

(signalP)

Page 12: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

12

Validation with Transcriptomics

0%10%20%30%40%50%60%70%80%90%

100%

Other Genes

Supported by ESTs

Sanger 454 Illumina

5531

34

EST profile

Processing RNA-Seq with CombEST

models

ESTs

Old Sanger Days

Transformation of EST sequencing

Page 13: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

13

Validation with Proteomics

Wright et al, BMC Genomics (2009)

Page 14: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

14

Gene Cluster Analysis

Comparative analysis

Page 15: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

15

Genome Portal Framework

Page 16: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

16

Many Genes of Eco-responsive Daphnia pulex

First crustacean, aquatic animal sequenced, new model organism30,940 predicted D.pulex genes in ~200Mb genome85% supported by 1+ lines of evidence Colbourne et al, Science, 2011

Page 17: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

17

Half of Daphnia Genes: no Homologs, Experessed Under Environmental Stress

With Evgeny Zdobnov’s group (Univ. Genève)

* Of 716 highly conserved single copy orthologs, Daphnia is missing only two

Colbourne et al, 2011

Page 18: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

18

Outline

Eukaryotic Genome Annotation

Fungal Genomics Program

MycoCosm

Page 19: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

19

Fungal Genomics for Energy & Environment

Grow Grow DegradeDegrade

Lignocellulose degradation

Plant symbiontsand pathogens

SugarFermentation

FermentFerment

Bio-refinery

GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications

Page 20: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

20

13%10%

31%

5%

41%

DOE Joint GenomeInstitute

Broad Institute

Sanger Institute

Washington Univ

other

GOLD (October 2011)

758 fungal projects

Page 21: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

21

• Chapter 1: Plant health• Symbiosis

• Plant Pathogenicity

• Biocontrol

• Chapter 2: Biorefinery• Lignocellulose degradation

• Sugar fermentation

• Industrial organisms

• Chapter 3: Diversity• Phylogentics

• Ecology

Genomic Encyclopedia of FungiGenomic Encyclopedia of Fungi

Page 22: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

22

Genome-Centric View

Comparative View

http://jgi.doe.gov/fungi100+ fungal genomes5000+ visitors/month

Page 23: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

23

Comparative Genome Analysis

Page 24: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

24

Strategy: 1000 Fungal Genomes

Goal: Sequencing 1000 fungal genomes from across the Fungal Tree of Life will provide references for research on plant-microbe interactions and environmental metagenomics.

68%

23%

Ascomycota

Basidiomycota

Blastocladiomycota

Chytridiomycota

Glomeromycota

Microsporidia

Neocallimastigomycota

Unknown

Zygomycota

Page 25: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

25

Strategy: Fungal Systems

Lichen: alga+

fungus

ECM:plant+

fungus

T.terrestris

Forest soil metagenomesS.commune

Model fungi Simple systems Complex environments

Page 26: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

26

Model Mushroom Development

Ohm et al, 2010

SEQUENCE FUNCTION MODEL

WT

S.commune

<Transcriptomics>

Gene knock-outs

Modeling regulatory cascades

Page 27: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

27

Summary

Eukaryotic Annotation Recipe:

Combine gene predictors, experimental data, and community expertise

Fungal Genomics: we aim to

scale-up sequencing & comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)

Page 28: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

28

Enjoy Algae as well!

http://genome.jgi.doe.gov/Algae

Page 29: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

29

AcknowledgementsJGI Staff

Our Users

Page 30: Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

30

Outline

Eukaryotic Genome Annotation

Fungal Genomics Program

MycoCosm