46
Nothing in ( computational ) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970 arative genomics, genome context and genome annotat

Nothing in ( computational ) biology makes sense except in the light of evolution

Embed Size (px)

DESCRIPTION

Comparative genomics, genome context and genome annotation. Nothing in ( computational ) biology makes sense except in the light of evolution. after Theodosius Dobzhansky (1970). Genome context analysis and genome annotation. Using information other than homologous relationships - PowerPoint PPT Presentation

Citation preview

Page 1: Nothing in ( computational ) biology makes sense except in the light of evolution

Nothing in (computational) biology makessense except in the light of evolution

after Theodosius Dobzhansky (1970)

Comparative genomics, genome context and genome annotation

Page 2: Nothing in ( computational ) biology makes sense except in the light of evolution

Genome context analysis and genome annotation

Using information other than homologous relationshipsbetween individual gene/proteins for functional prediction(guilt by association)

•phyletic patterns•domain fusion (“Rosetta Stone” proteins)•gene order conservation•co-expression•….

Types of context analysis:

Page 3: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 4: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 5: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 6: Nothing in ( computational ) biology makes sense except in the light of evolution

Goals: • Using gene sets from complete genomes, delineate families of orthologs and paralogs - Clusters of Orthologous Groups (of genes) (COGsCOGs) • Using COGs, develop an engine for functional annotation of new genomes

• Apply COGs for analysis of phylogenetic patterns

Page 7: Nothing in ( computational ) biology makes sense except in the light of evolution

COG:

- group of homologous proteins such that all proteins from different species are orthologs (all proteins from the same species in a COG are paralogs)

Page 8: Nothing in ( computational ) biology makes sense except in the light of evolution

Complete set of proteins from the analyzed genomes

FULL SELF-COMPARISON (BLASTPGP, no cut-off)

Collapse obvious paralogs

Merge triangles with common edges

CONSTRUCTION OF COGs FOR 8 COMPLETE GENOMES

Detect all interspecies Best Hits (BeTs) between individual proteins or groups of paralogs

1

2

3

Detect all triangles of consistent BeTs

4

5

Detect groups with multidomain proteins

and isolate domains

REPEAT STEPS 3-5

6

COGs

Page 9: Nothing in ( computational ) biology makes sense except in the light of evolution

A TRIANGLE OF BeTs IS A MINIMAL, ELEMENTARYCOG

Page 10: Nothing in ( computational ) biology makes sense except in the light of evolution

A RELATIVELY SIMPLE COG PRODUCED BY MERGING ADJACENT TRIANGLES

Page 11: Nothing in ( computational ) biology makes sense except in the light of evolution

A COMPLEX COG WITH MULTIPLE PARALOGS

Page 12: Nothing in ( computational ) biology makes sense except in the light of evolution

Current status of the COGs

11 Archaea + 1 unicellular eukaryote + 46 bacteria = 58 complete genomes

149,321 proteins 105,861 proteins in 4075 COGs(71%)

4 animals + 1 plant + 2 fungi + 1 microsporidium = 8 complete genomes

142,498 proteins 74,093 proteins in 4822 COGs (52%)

Prokaryotes

Eukaryotes

Page 13: Nothing in ( computational ) biology makes sense except in the light of evolution

COGnitor...

Page 14: Nothing in ( computational ) biology makes sense except in the light of evolution

…IN ACTION

Page 15: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 16: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 17: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 18: Nothing in ( computational ) biology makes sense except in the light of evolution

The Universal COGs

Page 19: Nothing in ( computational ) biology makes sense except in the light of evolution

Search for genomic determinants of hyperthermophily

Page 20: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 21: Nothing in ( computational ) biology makes sense except in the light of evolution

Search for uniquearchaeo-eukaryoticgenes

Page 22: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 23: Nothing in ( computational ) biology makes sense except in the light of evolution

A complementary pattern:search for unique bacterial genes

Page 24: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 25: Nothing in ( computational ) biology makes sense except in the light of evolution

Essential function…but holes in the phyleticpattern

Strict complementary pattern

Page 26: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 27: Nothing in ( computational ) biology makes sense except in the light of evolution

Relaxed complementary pattern

Page 28: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 29: Nothing in ( computational ) biology makes sense except in the light of evolution

Relaxed complementary pattern with extra restrictions

Page 30: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 31: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 32: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 33: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 34: Nothing in ( computational ) biology makes sense except in the light of evolution

Conservation of gene order in bacterial species of the same

genus1

101

201

301

401

501

601

1 101 201 301 401

M. genitaliumvs

M. pneumoniae

Page 35: Nothing in ( computational ) biology makes sense except in the light of evolution

Conservation of gene order in closely related bacterial genera

C. trachomatisvs

C. pneumoniae

1

101

201

301

401

501

601

701

801

901

1001

1 101 201 301 401 501 601 701 801

Page 36: Nothing in ( computational ) biology makes sense except in the light of evolution

Lack of gene order conservation - even in “closely related” bacteria of the same

Proteobacterial subdivision

P. aeruginosavs

E. coli

1101201301401501601701801901

1001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501

1 101

201

301

401

501

601

701

801

901

1001

1101

1201

1301

1401

1501

1601

1701

1801

1901

2001

2101

2201

2301

2401

2501

2601

2701

2801

2901

3001

3101

3201

3301

3401

3501

3601

3701

3801

3901

4001

4101

4201

ecoli

paer

<0.3

0.3-0.8

0.8-1.3

>1.3

Page 37: Nothing in ( computational ) biology makes sense except in the light of evolution

Genome Alignments - MethodProtein sets from completely genomes

BLAST cross-comparison

Pairwise Genome AlignmentLocal alignment algorithmLamarck (gap opening penalty,gap extension penalty); statisticswith Monte Carlo simulations

Table of Hits

Template-Anchored Genome Alignment

Page 38: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 39: Nothing in ( computational ) biology makes sense except in the light of evolution

Genome Alignments - Statistics

0.0

0.1

0.2

0.3

0.4

0.52 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

>20

cpneu-ctra

mjan-mthe

bsub-ecoli

drad-aero

Distribution of conserved gene string lengths

Page 40: Nothing in ( computational ) biology makes sense except in the light of evolution

Genome Alignments - StatisticsPairwise No. No. % in % inalignments: strings genes Gen1 Gen2

all homologsecoli-hinf 138 566 13% 33%ecoli-bsub 89 322 8% 8%ecoli-mjan 10 30 1% 2%

probable orthologsecoli-hinf 105 482 11% 28%ecoli-bsub 34 168 4% 4%ecoli-mjan 12 33 1% 2%

Page 41: Nothing in ( computational ) biology makes sense except in the light of evolution

Genome Alignments - Statistics

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

aero af

ul

mjan

mth

epy

ro

aqua

ebb

urbs

ub cac

cjej

cpne

uct

radr

adec

oli hinf

hpyl

mge

n

mpn

eum

tub

nmen

rpxx

syne

cho

tmar tp

aluu

re

Not in gene strings

In non-conserved gene strings (directons)

In conserved gene strings

Breakdown of genesin the genome

Page 42: Nothing in ( computational ) biology makes sense except in the light of evolution

Genome Alignments - StatisticsFraction of the genome in conserved gene strings - from

template-anchored alignments

Minimum Synechocystis sp. 5%

Aquifex aeolicus 10%Archaeoglobus fulgidus 13%Escherichia coli 14%Treponema pallidum 17%

Maximum Thermotoga maritima 23%Mycoplasma genitalium 24%

Page 43: Nothing in ( computational ) biology makes sense except in the light of evolution

Context-Based Prediction of Protein Functions

A Novel Translation Factor (COG0536)

L21 L27 GTPase?GTP-bindingtranslation

factor

Page 44: Nothing in ( computational ) biology makes sense except in the light of evolution

Context-Based Prediction of Protein Functions

A Novel Translation Factor (COG0012)

TGS domaincontainingGTPase?

Peptidyl-tRNAhydrolase

GTP-bindingtranslation

factor

Page 45: Nothing in ( computational ) biology makes sense except in the light of evolution
Page 46: Nothing in ( computational ) biology makes sense except in the light of evolution