94
Phylogenomic Approaches to Functional Prediction Automated Function Prediction SIG ISMB 2012 July 13, 2012 Jonathan A. Eisen University of California, Davis @phylogenomics Saturday, July 14, 12

Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Embed Size (px)

DESCRIPTION

Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Citation preview

Page 1: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Phylogenomic Approaches to Functional Prediction

Automated Function Prediction SIGISMB 2012

July 13, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Saturday, July 14, 12

Page 2: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP

Automated Function Prediction SIGISMB 2012

July 13, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Saturday, July 14, 12

Page 3: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP

AFP SIGISMB 2012

July 13, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Saturday, July 14, 12

Page 4: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP AFP SIG ISMB 2012

July 13, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Saturday, July 14, 12

Page 5: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acknowledgements

• $$$• DOE• NSF• GBMF• Sloan• DARPA• DSMZ• DHS

• People, places• DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides• UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell

Neches, Jenna Morgan-Lang• Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak,

Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter Klenk, Phil Hanawalt

Saturday, July 14, 12

Page 6: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Phylogenomics of Novelty

Mechanisms of Origin of New

Functions

Species Evolution

Variation in Mechanisms:

Patterns, Causes and Effects

Saturday, July 14, 12

Page 7: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Origin of Novelty

• How does novelty originate?• What are the constraints on evolvability?• What leads to variation within the genome

and within and between species in evolvability

• This information helps interpret the past, understand the present and (maybe) predict the future

Saturday, July 14, 12

Page 8: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

History

Saturday, July 14, 12

Page 9: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Whatever the History: Trying to Incorporate it is Critical

from Lake et al. doi: 10.1098/rstb.2009.0035

Saturday, July 14, 12

Page 10: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP AFP SIG ISMB 2012 I:

Predicting Functions with Evolutionary Trees

Saturday, July 14, 12

Page 11: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

SNF2 Family of Proteins (1995)

• SNF2 family defined by presence of conserved DNA-dependent ATPase domain

• 100s of proteins• Diversity of functions:

• transcriptional activation (SNF2)• transcriptional repression (MOT1)• Recombination (RAD54)• transcription-coupled repair (CSB)• post-replication repair (RAD5)• chromosome segregation (lodestar)• Many with unknown functions

• Some species have 15+ representatives

Bork and Koonin 1993

Saturday, July 14, 12

Page 12: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

SNF2 Alignment BRM

hBRM

hBRG1

mBRG1

STH1

SNF2

YB95

F37A4

ISWI

SNF2L

CHD1

SYGP

ETL1

FUN30

MOT1

ERCC6

RAD26

YB53

DNRPPX

hNUCP

mNUCP

RAD5

spRAD8

HIP116

RAD16

LODE

NPH42

HepA

B.cereus ORF

I Ia Ib II III V VI

C

C

R

R

R

R

B r

CHD1

SNF2

SNF2L

ETL1

RAD16

ERCC6

RAD54

RAD54

B r

B r

B r

B r

B r

ProteinSub-

Family

SCALE (aa)

0 500

Helicase Motifs --

MOT1

IV

Saturday, July 14, 12

Page 13: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Saturday, July 14, 12

Page 14: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

SNF2 Subfamilies

Subfamily Function

SNF2 Transcription activation (Swi/Snf complex)SNF2L Transcription activation (NURF complex)CHD1 Chromatin remodellingETL1 UnknownMOT1 Transcription repressionCSB Transcription-coupled repairRad54 Recombinational repairRad16 Chromatin access for DNA repairHepA Bacterial RNA polymerase subunit

Saturday, July 14, 12

Page 15: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

SNF2 Tree and F(x) Prediction

• Function conserved within but not between subfamilies/orthology groups

• Therefore, assignment of genes to subfamilies can be used to predict functions of unknowns

• Grouping into subfamilies helps identify motifs conserved within groups

• Phylogeny recovers subfamilies better than similarity searches

Saturday, July 14, 12

Page 18: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

MutL??

From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html

Saturday, July 14, 12

Page 19: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Phylogenetic Tree of MutS Family

Aquae Trepa

FlyXenlaRatMouseHumanYeastNeucrArath

BorbuStrpyBacsu

SynspEcoli Neigo

ThemaTheaqDeira

Chltr

SpombeYeast

YeastSpombeMouseHumanArath

YeastHumanMouseArath

StrpyBacsu

CelegHumanYeast MetthBorbu

AquaeSynspDeira Helpy

mSaco

YeastCelegHuman

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Saturday, July 14, 12

Page 20: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

MutS Subfamilies

Aquae Trepa

FlyXenlaRatMouse

HumanYeast

NeucrArath

BorbuStrpy

BacsuSynsp

EcoliNeigo

ThemaTheaqDeira

Chltr

SpombeYeast

YeastSpombe

MouseHumanArath

YeastHumanMouse

Arath

StrpyBacsu

CelegHumanYeast

MetthBorbu

AquaeSynsp

Deira Helpy

mSaco

YeastCeleg

Human

MSH4

MSH5 MutS2

MutS1

MSH1

MSH3

MSH6

MSH2

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Saturday, July 14, 12

Page 21: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Overlaying Functions onto Tree

Aquae Trepa

Rat

FlyXenla

MouseHumanYeast

NeucrArath

BorbuSynsp

Neigo

ThemaStrpy

Bacsu

Ecoli

TheaqDeiraChltr

SpombeYeast

YeastSpombe

MouseHuman

Arath

YeastHumanMouseArath

StrpyBacsu

HumanCeleg

YeastMetthBorbu

AquaeSynsp

Deira Helpy

mSaco

YeastCeleg

Human

MSH4

MSH5MutS2

MutS1

MSH1

MSH3

MSH6

MSH2

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Saturday, July 14, 12

Page 22: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

MutS Subfamilies

• MutS1 Bacterial MMR• MSH1 Euk - mitochondrial MMR• MSH2 Euk - all MMR in nucleus• MSH3 Euk - loop MMR in nucleus• MSH6 Euk - base:base MMR in nucleus

• MutS2 Bacterial - function unknown• MSH4 Euk - meiotic crossing-over• MSH5 Euk - meiotic crossing-over

Saturday, July 14, 12

Page 23: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Functional Prediction Using Tree

Aquae Trepa

FlyXenlaRatMouse

HumanYeast

NeucrArath

BorbuStrpy

BacsuSynsp

EcoliNeigo

ThemaTheaqDeira

Chltr

SpombeYeast

YeastSpombe

MouseHumanArath

YeastHumanMouseArath

MSH1MitochondrialRepair

MSH3 - Nuclear RepairOf Loops

MSH6 - Nuclear RepairOf Mismatches

MutS1 - Bacterial Mismatch and Loop Repair

StrpyBacsu

CelegHumanYeast

MetthBorbu

AquaeSynsp

Deira Helpy

mSaco

YeastCeleg

Human

MSH4 - Meiotic CrossingOver

MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions

MSH2 - Eukaryotic NuclearMismatch and Loop Repair

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Saturday, July 14, 12

Page 24: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Ancient MutS Duplication

Saturday, July 14, 12

Page 25: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Table 3. Presence of MutS Homologs in Complete Genomes Sequences

Species # of MutSHomologs

WhichSubfamilies?

MutLHomologs

BacteriaEscherichia coli K12 1 MutS1 1Haemophilus influenzae Rd KW20 1 MutS1 1Neisseria gonorrhoeae 1 MutS1 1Helicobacter pylori 26695 1 MutS2 -Mycoplasma genitalium G-37 - - -Mycoplasma pneumoniae M129 - - -Bacillus subtilis 169 2 MutS1,MutS2 1Streptococcus pyogenes 2 MutS1,MutS2 1Mycobacterium tuberculosis - - -Synechocystis sp. PCC6803 2 MutS1,MutS2 1Treponema pallidum Nichols 1 MutS1 1Borrelia burgdorferi B31 2 MutS1,MutS2 1Aquifex aeolicus 2 MutS1,MutS2 1Deinococcus radiodurans R1 2 MutS1,MutS2 1

ArchaeaArchaeoglobus fulgidus VC-16, DSM4304 - - -Methanococcus janasscii DSM 2661 - - -Methanobacterium thermoautotrophicum ΔH 1 MutS2 -

EukaryotesSaccharomyces cerevisiae 6 MSH1-6 3+Homo sapiens 5 MSH2-6 3+

MutS1,2 vs MutL

Saturday, July 14, 12

Page 26: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Saturday, July 14, 12

Page 27: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PHYLOGENENETIC PREDICTION OF GENE FUNCTION

IDENTIFY HOMOLOGS

OVERLAY KNOWNFUNCTIONS ONTO TREE

INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST

1 2 3 4 5 6

3 5

3

1A 2A 3A 1B 2B 3B

2A 1B

1A

3A

1B2B

3B

ALIGN SEQUENCES

CALCULATE GENE TREE

12

4

6

CHOOSE GENE(S) OF INTEREST

2A

2A

5

3

Species 3Species 1 Species 2

1

1 2

2

2 31

1A 3A

1A 2A 3A

1A 2A 3A

4 6

4 5 6

4 5 6

2B 3B

1B 2B 3B

1B 2B 3B

ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)

Duplication?

EXAMPLE A EXAMPLE B

Duplication?

Duplication?

Duplication

5

METHOD

Ambiguous

Based on Eisen, 1998 Genome Res 8: 163-167.

Saturday, July 14, 12

Page 28: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Evolutionary Rate Variation

2

3

14

5

6

Saturday, July 14, 12

Page 30: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

A single tree with everything?

Phylogenetic Challenge

Saturday, July 14, 12

Page 31: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Phylosift/ pplacer

Saturday, July 14, 12

Page 32: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGCTA

GCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2rRNA2

5’..TACAGTATAGGTGGAGCTAGCGACGATCGA... 3’

rRNA3 5’...ACGGCAAAATAGGTGGATT

CTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATT

CTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

rRNA Phylotyping

Saturday, July 14, 12

Page 33: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Eisen et al. 2002

Eisen et al. 1992

Saturday, July 14, 12

Page 34: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP AFP SIG ISMB 2012 II:

Every gene family is unique ...

Saturday, July 14, 12

Page 35: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Saturday, July 14, 12

Page 36: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Steps in Phylogenomics

• Create database of genes of interest

• Presence/absence of homologs in complete genomes

• Phylogenetic trees of each gene family

• Infer evolutionary events (gene origin, duplication, loss and transfer)

• Refine presence/absence (orthologs, paralogs, subfamilies)

• Functional predictions and functional evolution

• Analysis of pathways

Saturday, July 14, 12

Page 37: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Photoreactivation/Photolyases

• All photoreactivation is carried out by enzymes in the photolyase family

• Two main classes of photolyases – class I and class II – are distantly related to each other and likely the result of an ancient duplication

• PhrI and PhrII missing from most species for which complete genomes are available.

• Many cases of functional change (e.g., CPD -> 6-4) and some are not even involved in DNA repair

• Many of the eukaryotic proteins appear to be of an organellar ancestry

Saturday, July 14, 12

Page 38: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Photoreactivation• All known enzymes that perform photoreactivation are part of

a single large photolyase gene family

• Some members of the family do not function as photolyases, but instead work as blue-light receptors

• If a species does not encode a member of the photolyase gene family, it likely does not have photoreactivation capability

• If a species encodes a photolyase, one cannot conclude it has photolyase activity

• Position of photolyase homologs within photolyase tree helps predict what activities they have

Saturday, July 14, 12

Page 39: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Alkyltransferases

• All known alkyltransferases are members of a single gene family

• Found in most but not all species

• Likely present in LUCA

• Ada protein in E. coli originated by fusion between an alkyltransferase and a transcription-regulatory domain

• Gram-positive bacteria have the Ada domain fused to an alkylation glycosylase instead of alkyltransferase

Saturday, July 14, 12

Page 40: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

BER Glycosylases

• Distribution patterns highly uneven but some glycosylases have been found in all species

• Some are ancient enzymes, probably presence in LUCA (e.g., MutY-Nth), others more recent (e.g., TagI).

• Many families are distantly related to each other (e.g., Ogg, AlkA, MutY-Nth)

• Many cases of gene duplication, loss and possibly transfer, especially from organellar genomes to nucleus

• Orthologs frequently have different specificity

Saturday, July 14, 12

Page 41: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

AP Endonucleases

• All species encode either Nfo or Xth homologs. Some encode both.

• Only Nfo: mycoplasmas, Aquifex, M. jannascii, yeast

• Only Xth: many bacteria, A. fulgidus, humans (so far)

• Both: E. coli, B. subtilis, M. tuberculosis, M. thermoautotrophicum

• Both Nfo and Xth are likely ancient.

• Many cases of gene loss of one or the other, but never both

Saturday, July 14, 12

Page 42: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Uracil Glycosylase

• Many non-homologous proteins have uracil-DNA glycosylase activity (Ung, GPADH, MUG, cyclin)

• Therefore, absence of homologs of these genes should not be used to infer likely absence of activity

• However, presence of homologs of Ung and MUG genes can be used to indicate presence of activity because all homologs of these genes have this activity

Saturday, July 14, 12

Page 43: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Not Open Access

Saturday, July 14, 12

Page 44: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Saturday, July 14, 12

Page 45: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP AFP SIG ISMB 2012 III:

When phylogeny is not enough ...

Saturday, July 14, 12

Page 46: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

But ...

• Many powerful and automated similarity based methods for assigning genes to protein families• COGs• PFAM HMM searches

• Some limitations of similarity based methods can be overcome by phylogenetic approaches

• Automated methods now available• Sean Eddy• Steven Brenner• Kimmen Sjölander

• But …

Saturday, July 14, 12

Page 47: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Example: Recent Changes

E.coli gi1787690

B.subtilis gi2633766Synechocystis sp. gi1001299Synechocystis sp. gi1001300Synechocystis sp. gi1652276Synechocystis sp. gi1652103H.pylori gi2313716H.pylori99 gi4155097C.jejuni Cj1190cC.jejuni Cj1110cA.fulgidus gi2649560A.fulgidus gi2649548B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609B.subtilis gi2635610B.subtilis gi2635882E.coli gi1788195E.coli gi2367378E.coli gi1788194

E.coli gi1789453C.jejuni Cj0144C.jejuni Cj0262c

H.pylori gi2313186H.pylori99 gi4154603C.jejuni Cj1564

C.jejuni Cj1506cH.pylori gi2313163H.pylori99 gi4154575H.pylori gi2313179H.pylori99 gi4154599C.jejuni Cj0019cC.jejuni Cj0951cC.jejuni Cj0246cB.subtilis gi2633374T.maritima TM0014

T.pallidum gi3322777T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522T.pallidum gi3322296B.burgdorferi gi2688521T.maritima TM0429T.maritima TM0918T.maritima TM0023T.maritima TM1428T.maritima TM1143T.maritima TM1146P.abyssi PAB1308P.horikoshii gi3256846P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884D.radiodurans DRA00354D.radiodurans DRA0353D.radiodurans DRA0352P.abyssi PAB1189P.horikoshii gi3258414B.burgdorferi gi2688621M.tuberculosis gi1666149

V.cholerae VC0512V.cholerae VCA1034V.cholerae VCA0974V.cholerae VCA0068V.cholerae VC0825V.cholerae VC0282V.cholerae VCA0906V.cholerae VCA0979V.cholerae VCA1056V.cholerae VC1643V.cholerae VC2161V.cholerae VCA0923V.cholerae VC0514V.cholerae VC1868V.cholerae VCA0773V.cholerae VC1313V.cholerae VC1859V.cholerae VC1413V.cholerae VCA0268V.cholerae VCA0658V.cholerae VC1405V.cholerae VC1298V.cholerae VC1248V.cholerae VCA0864V.cholerae VCA0176V.cholerae VCA0220V.cholerae VC1289V.cholerae VCA1069V.cholerae VC2439V.cholerae VC1967V.cholerae VCA0031V.cholerae VC1898V.cholerae VCA0663V.cholerae VCA0988V.cholerae VC0216V.cholerae VC0449V.cholerae VCA0008V.cholerae VC1406V.cholerae VC1535V.cholerae VC0840

V.cholerae VC0098V.cholerae VCA1092

V.cholerae VC1403V.cholerae VCA1088

V.cholerae VC1394

V.cholerae VC0622

NJ

**

*****

******

****

***

****

**

*

****

**

**

******

******

*

****

******

***

***

***

****

**

*

****

*

• Phylogenomic functional prediction may not work well for very newly evolved functions

• Can use understanding of origin of novelty to better interpret these cases?

• Screen genomes for genes that have changed recently

– Pseudogenes and gene loss– Contingency Loci– Acquisition (e.g., LGT)– Unusual dS/dN ratios– Rapid evolutionary rates– Recent duplications

Saturday, July 14, 12

Page 48: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Non-Homology Predictions: Phylogenetic Profiling

• Step 1: Search all genes in organisms of interest against all other genomes

• Ask: Yes or No, is each gene found in each other species

• Cluster genes by distribution patterns (profiles)

Pelligrini et al. 1999. PNAS 96: 4285.Saturday, July 14, 12

Page 49: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Correlated gain/loss of genes

• Microbial genes are lost rapidly when not maintained by selection

• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire

pathways/processes• Thus might be able to use correlated presence/

absence information to identify genes with similar functions

Saturday, July 14, 12

Page 50: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Carboxydothermus hydrogenoformans

• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO • Produces hydrogen gas• Low GC Gram + (Firmicute)• Genome Determined

Wu et al. 2005 PLoS Genetics 1: e65.

Saturday, July 14, 12

Page 54: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PG Profiling Works Better with Families

Saturday, July 14, 12

Page 55: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

PAFP AFP SIG ISMB 2012 IV:

Knowing What You Don’t Know

Saturday, July 14, 12

Page 56: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 57: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 58: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

• Some studies in other phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 59: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 60: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 61: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

TIGR TOL 2002

Saturday, July 14, 12

Page 63: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

GEBA Lesson 1: Improves genome annotation

• Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes

• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”

based predictions• Conversion of hypothetical into conserved hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction

Saturday, July 14, 12

Page 64: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

GEBA Lesson 2:Metadata Important

Saturday, July 14, 12

Page 65: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

GEBA Lesson 3:Improves discovering new genetic diversity

Saturday, July 14, 12

Page 66: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Protein Family Rarefaction

• Take data set of multiple complete genomes

• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Saturday, July 14, 12

Page 73: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Families/PD not uniform

� �

�������6���

3����1�����

Saturday, July 14, 12

Page 74: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Structural Novelty

• Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu)

• Structural modeling suggests many are structurally novel too (D'haeseleer)

• 372 being crystallized by the PSI (Kerfeld)

Saturday, July 14, 12

Page 75: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Needed Reference Tree

Saturday, July 14, 12

Page 76: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

GEBA Lesson 4:Much diversity untouched

Saturday, July 14, 12

Page 77: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Saturday, July 14, 12

Page 80: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Phylogenetic Diversity: Isolates

From Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12

Page 81: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Haloarchaeal GEBA-like

Saturday, July 14, 12

Page 82: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Phylogenetic Diversity: All

From Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12

Page 83: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Uncultured Lineages:

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Saturday, July 14, 12

Page 84: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

80

Number of SAGs from Candidate Phyla

OD

1

OP

11

OP

3

SA

R4

06

Site A: Hydrothermal vent 4 1 - -Site B: Gold Mine 6 13 2 -Site C: Tropical gyres (Mesopelagic) - - - 2Site D: Tropical gyres (Photic zone) 1 - - -

Sample collections at 4 additional sites are underway.

Phil Hugenholtz

GEBA uncultured

Saturday, July 14, 12

Page 85: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

GOS 1

GOS 2

GOS 3

GOS 4

GOS 5

RecA, RpoB in GOS

Wu et al PLoS One 2011

Saturday, July 14, 12

Page 86: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

GEBA Lesson 6:Experimental diversity

Saturday, July 14, 12

Page 87: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 88: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 89: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

• Some studies in other phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 90: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 91: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 92: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Need experimental studies from across the tree too

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 93: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Adopt a Microbe

Based on Hugenholtz, 2002

Saturday, July 14, 12

Page 94: Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

Acknowledgements

• $$$• DOE• NSF• GBMF• Sloan• DARPA• DSMZ• DHS

• People, places• DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides• UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell

Neches, Jenna Morgan-Lang• Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak,

Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter Klenk

Saturday, July 14, 12