39
Turning genomics data into Turning genomics data into Biology Biology Martijn Huynen Martijn Huynen Nijmegen Center for Molecular Life Nijmegen Center for Molecular Life Sciences, Sciences, Centre for Molecular and Centre for Molecular and Biomolecular Informatics Biomolecular Informatics

Turning genomics data into Biology

  • Upload
    darrin

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Turning genomics data into Biology. Martijn Huynen Nijmegen Center for Molecular Life Sciences, Centre for Molecular and Biomolecular Informatics. Comparative genomics. The (somewhat) intelligent comparative genomics meat grinder. Method development. Prediction of protein function, pathways. - PowerPoint PPT Presentation

Citation preview

Page 1: Turning genomics data into Biology

Turning genomics data into Biology Turning genomics data into Biology

Martijn HuynenMartijn Huynen

Nijmegen Center for Molecular Life Sciences,Nijmegen Center for Molecular Life Sciences,Centre for Molecular and Biomolecular InformaticsCentre for Molecular and Biomolecular Informatics

Page 2: Turning genomics data into Biology

Comparative genomicsComparative genomics

Prediction of protein function, pathways

The (somewhat) intelligent comparative genomics meat grinder

Evolution of biosystems

Method development

Page 3: Turning genomics data into Biology

A phosphomannomutase (A phosphomannomutase (pmmpmm) is predicted to ) is predicted to have acquired a phosphoribomutase (deoB) have acquired a phosphoribomutase (deoB)

functionfunction

deoD deoC deoA cdd pmm

M.genitalium

M.tuberculosis

deoxyribose-1-P

deoxycitidine

deoxyuridine, deoxythimidine

purine deoxyribonucleosides

deoxyribose-5-P

Glyceraldehyde-3-p,acetaldehyde

Cdd

DeoA

DeoD

deoCdeoB

deoB ?

Page 4: Turning genomics data into Biology

Predicting functional relations between genes Predicting functional relations between genes using (conserved) genomic contextusing (conserved) genomic context

Conserved NeighborhoodConserved NeighborhoodConserved NeighborhoodConserved Neighborhood Gene FusionGene FusionGene FusionGene Fusion

Co-occurrenceCo-occurrenceCo-occurrenceCo-occurrenceGenomic Context Types:Genomic Context Types:Genomic Context Types:Genomic Context Types:

http://string.embl.de

Snel et al., NAR 1999von Mering et al., NAR 2002von Mering et al, NAR 2005

Dandekar Dandekar et alet al., 1998., 1998Overbeek Overbeek et alet al., 1999., 1999

Marcotte Marcotte et alet al., 1999., 1999Enright Enright et alet al., 1999., 1999

Huynen and Bork 1998Huynen and Bork 1998Pellegrini Pellegrini et alet al., 1999., 1999

Page 5: Turning genomics data into Biology

PyrAB

CarB

MJ1378 & MJ1381

MTH997 & MTH996

EC0033

HP0919

AQ2101 & AQ1172

AF1274

sll0370

Rv1384

YJL130C

D2085.1

YJR109C

83

93

100

88

96

100100

88

92

Gene fission in the evolution of carbamoyl phosphate Gene fission in the evolution of carbamoyl phosphate synthase B (synthase B (carBcarB))

Page 6: Turning genomics data into Biology

Predicting functional interactions between proteins by Predicting functional interactions between proteins by the co-occurrence of their genes in genomesthe co-occurrence of their genes in genomes.

Distribution of four M.genitalium genes among 25 genomes

MG299 (pta) 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1MG357(ackA) 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1MG019(dnaJ) 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1MG305(dnaK) 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1

Using the mutual information between genes as a scoring heuristic for their co-occurrence.

M(pta, ackA)=0.69 (phospotransacetylase, acetate kinase)M(dnaJ, dnaK)=0.55 (heat shock proteins)M(dnaJ, ackA)=0.19

Page 7: Turning genomics data into Biology

0 0.2 0.4 0.6 0.8 1Evolutionary conservation score

0

0.2

0.4

0.6

0.8

1

FusionGene OrderCo-occurrence

Fra

ctio

n sa

me

path

wa y

(K

EG

G)

Evolutionary conservation of genomic context Evolutionary conservation of genomic context increases the likelihood of functional interactionincreases the likelihood of functional interaction

Page 8: Turning genomics data into Biology

1

10

100

1000

10000

0 3 6 9 12 15 18 21 24 27 30

co-occurrences in operons

num

ber

of C

OG

s

0

1

2

3

4

5

6

aver

age

met

abol

ic

dist

ance

number of COGS

average metabolicdistance

Correlation between the strength of the Correlation between the strength of the genomic and functional associationsgenomic and functional associations

Page 9: Turning genomics data into Biology

30%

33%

7%

6%

10%

10%4% physical interaction

complex

metabolic pathway

non-metabolic pathway

process

hypothetical

unknown interaction

Gene Order Conservation

Gene Fusion

7%

15%

22%

56%

Co-occurrence in Genomes23%

4%

25%

14%

23%

11%

Genomic associations correlate with a wide array Genomic associations correlate with a wide array of functional interactionsof functional interactions

Huynen et al, Genome research 2000

Page 10: Turning genomics data into Biology

Repeated occurrence of Repeated occurrence of MG009MG009, a phosphohydrolase, , a phosphohydrolase, with thymidilate kinase (tmk) suggests a role of with thymidilate kinase (tmk) suggests a role of MG009MG009 in pyrimidine metabolism.in pyrimidine metabolism.

Combining homology information with genomic Combining homology information with genomic association for function predictionassociation for function prediction

Page 11: Turning genomics data into Biology

Conservation of gene order of the hypothetical gene Conservation of gene order of the hypothetical gene MG134MG134 with with dnaXdnaX, , RecRRecR suggests physical interaction between their suggests physical interaction between their gene productsgene products

Page 12: Turning genomics data into Biology

Phylogenomics for protein function predictionPhylogenomics for protein function prediction

An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, implicating a possible role in Complex Iimplicating a possible role in Complex I

Gabaldon et al. (2005) J. Mol. Biol.

Page 13: Turning genomics data into Biology

Experimental confirmation of a role of the N7BM paralog in Complex IExperimental confirmation of a role of the N7BM paralog in Complex I

J. Clin. Invest. (2005)

Page 14: Turning genomics data into Biology

Mt-Ku gene order physical interaction double-stranded DNA repair [56]GnlK gene order physical interaction signal transduction for ammonium transport [57,58]PH0272 gene order metabolic pathway methylmalonyl-CoA racemase [59]PrpD gene order metabolic pathway 2-methylcitrate dehydratase [22,60]arok gene order metabolic pathway shikimate kinase [61]ComB gene order metabolic pathway 2-phosphosulfolactate phosphatase [62]KynB gene order metabolic pathway kynurenine formamidase

[63]PvlArgDC gene order metabolic pathway arginine decarboxylase [64]FabK gene order metabolic pathway enoyl-ACP reductase [65]FabM gene order metabolic pathway trans-2-decenoyl-ACP isomerase [66]COG0042 gene order tRNA modification tRNA-dihydrouridine synthase [67]Yfh1 co-occurrence process iron-sulfur cluster assembly [68,69]YchB co-occurrence metabolic pathway terpenoid synthesis [70]SmpB co-occurrence process trans-translation [5,71]ThyX complementary enzymatic activity thymidilate synthase [14,72]ThiN complementary enzymatic activity thiamine phosphate synthase [73,74]ThiE complementary enzymatic activity thiamine phosphate synthase [74]Prx fusion pathway peroxiredoxin [75]YgbB fusion/ gene order metabolic pathway terpenoid synthesis [76]SelR fusion./order/co-o. enzymatic activity methionine sulfoxide reductase [14,22,77]FadE reg. sequence metabolic pathway acyl CoA dehydrogenase [78,79]TogMNAB reg. sequence metabolic pathway Oligogalacturonide transport [80,81]MetD reg. sequence metabolic pathway Methionine transport [82]

ProteinProtein Context Context type of interactiontype of interaction functionfunction refref

Verified function predictions: Making predictions Verified function predictions: Making predictions is easy, testing them is another matter.is easy, testing them is another matter.

Huynen et al., Curr Op. Cell Biol. 2003

Page 15: Turning genomics data into Biology

4compl.

distribution

3 gene fusion

13 gene order

4co-occur-

rence

3regulatoryelement

Experimentally confirmed protein functions, predicted with various Experimentally confirmed protein functions, predicted with various types of contexttypes of context

Page 16: Turning genomics data into Biology

Predicting gene function by conservation of Predicting gene function by conservation of co-expressionco-expression

Page 17: Turning genomics data into Biology

Evolutionary conservation of co-expression increases Evolutionary conservation of co-expression increases the likelihood of functional interactionthe likelihood of functional interaction

Page 18: Turning genomics data into Biology

  Total # of pairs

# of pairs > 0.6

Observed fraction > 0.6

Expected fraction > 0.6

Observed/Expected

Gene-pairs with an orthologous gene-pair > 0.6

Worm 18161 803 0.0442* 0.00379 12

Yeast 36548 1215 0.0332* 0.00216 15

Gene-pairs with a paralogous gene-pair > 0.6

Worm 207214 29031 0.1401* 0.00379 37

Yeast 38253 2167 0.0566* 0.00216 26 

Low but significant levels of conservation of co-Low but significant levels of conservation of co-expressionexpression(see Teichmann et al, TIBS 2002, Stuart et al., Science 2003)

van Noort et al, TIG, 2003

Page 19: Turning genomics data into Biology

Conservation of protein-protein interaction measured by Conservation of protein-protein interaction measured by yeast-2-hybrid increases the likelihood of interactionyeast-2-hybrid increases the likelihood of interaction

Comparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactionsComparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactions

Page 20: Turning genomics data into Biology

GTPase XAB1/CG3704 hypothetical, GTPase YOR262/CG10222

XAB1 interacts with the DNA repair protein XPA1, inferred to be required for XPA1’s XAB1 interacts with the DNA repair protein XPA1, inferred to be required for XPA1’s import in the nucleusimport in the nucleus..

A “new”, conserved interaction:

Fraction hypothetical proteins in conserved Y2H interactions relatively lowFraction hypothetical proteins in conserved Y2H interactions relatively low

Hypotheticals:Hypotheticals:In conserved interactionsIn conserved interactions 13 13 5% 5% In complete genomeIn complete genome ~1600 ~1600 27%27%

Page 21: Turning genomics data into Biology

 

Dataset Comparison

Protein interactions, both proteins in the other dataset

Conserved interactions

Fraction conserved interactions

Average fraction conserved interactions

Ito / UetzYeast vs. Yeast

858 / 697 201 23.4% / 28.8% 26.1%

Ito / GiotYeast vs. Fly

229 / 394 45 19.6% / 11.4% 15.5%

Uetz / GiotYeast vs. Fly

120 / 168 33 27.5% / 19.6% 23.5%

 

Physical interaction is reasonably well conserved between Physical interaction is reasonably well conserved between (…..compared to the “conservation” within a species…)(…..compared to the “conservation” within a species…)

Huynen et al, TIG, 2004

Conservation of protein-protein interaction between speciesConservation of protein-protein interaction between species

Page 22: Turning genomics data into Biology

Is the low level of conservation between Is the low level of conservation between S. S. cerevisiaecerevisiae and and C. elegansC. elegans of co-expression ( < of co-expression ( < 5%) “real”, reflecting evolution and species-5%) “real”, reflecting evolution and species-

specific interactions, or are we just comparing specific interactions, or are we just comparing noisy datasets ?noisy datasets ?

Species specific (idiosyncratic) coregulation:

“Efficient expression of the Saccharomyces cerevisiae

glycolytic gene ADH1 is dependent upon a cis-acting

regulatory element UASRPG found initially in genes

encoding ribosomal proteins.” Tornow and Santangelo,

Gene, 1990

Page 23: Turning genomics data into Biology

Low (but significant) correlation between ChIP-on-chip data (sharing Transcription Factor Low (but significant) correlation between ChIP-on-chip data (sharing Transcription Factor Binding Sites) and expression data in S.cerevisiaeBinding Sites) and expression data in S.cerevisiae

Noisy genomics data Noisy genomics data

Page 24: Turning genomics data into Biology

Filtering out the noise by combining ChIP-Filtering out the noise by combining ChIP-on-chip and co-expression in yeaston-chip and co-expression in yeast

Correlation of co-regulation with functional interactions

Data set of gene pairs Percent same pathway Number of gene pairs

r > 0.5 43 169,768 r > 0.6 52 65,430 r > 0.7 51 22,459 Sharing 1 TFBS 50 356,947 Sharing 2 TFBS 77 39,818 Sharing 1 TFBS and r > 0.3 86 19,386 Sharing 1 TFBS and r > 0.4 88 11,434 Sharing 1 TFBS and r > 0.5 90 6,687 Sharing 1 TFBS and r > 0.6 90 3,382 Sharing 1 TFBS and r > 0.7 86 1,156

Page 25: Turning genomics data into Biology

High level of conservation of co-High level of conservation of co-regulation after speciationregulation after speciation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

co-expression correlation (r)

freq

uen

cy d

istr

ibu

tio

n

worm orthologous gene pairs ofyeast gene pairs with r > 0.6and sharing TFBSall worm gene pairs

76 %

Page 26: Turning genomics data into Biology

Comparing co-regulation in Bacteria indicates a level of conservation of 80%Comparing co-regulation in Bacteria indicates a level of conservation of 80%(operons in B. subtilis versus regulons in E.coli)

NB: NB: 1)1) Based on operon conservation is only 50%Based on operon conservation is only 50%

2)2) Disregard cases of gene loss Disregard cases of gene loss

Page 27: Turning genomics data into Biology

Noisy genomics data lead to drastic underestimations of conservation of interactions

Page 28: Turning genomics data into Biology

Conclusions co-regulation Conclusions co-regulation conservationconservation

• Gene co-regulation tends to be Gene co-regulation tends to be conserved in Eukaryotes (76%) and conserved in Eukaryotes (76%) and in prokaryotes (80%)in prokaryotes (80%)

• In the case of gene duplication one In the case of gene duplication one gene tends to maintain the co-gene tends to maintain the co-regulatory link regulatory link there appears to there appears to be one functionally equivalent be one functionally equivalent orthologortholog

Snel et al, Nucleic Acids Res 2004

Page 29: Turning genomics data into Biology

Exploiting genomics data to predict the function for a Exploiting genomics data to predict the function for a hypothetical protein: BolAhypothetical protein: BolA

Page 30: Turning genomics data into Biology

An interaction of BolA with a mono-thiol glutaredoxin ?An interaction of BolA with a mono-thiol glutaredoxin ?(STRING) (STRING)

BolABolA

Page 31: Turning genomics data into Biology

BolA and Grx occur as neighbors in a number of genomesBolA and Grx occur as neighbors in a number of genomes

Bola

Grx

Page 32: Turning genomics data into Biology

BolA and Grx have an (almost) identical phylogenetic distributionBolA and Grx have an (almost) identical phylogenetic distribution

Page 33: Turning genomics data into Biology

BolA and Grx have been shown to interact in Y2H in S.cerevisiae BolA and Grx have been shown to interact in Y2H in S.cerevisiae and D.melanogaster, and in Flag tag in S.cerevisiaeand D.melanogaster, and in Flag tag in S.cerevisiae

BolA phylogeny

Page 34: Turning genomics data into Biology

BolA does have (predicted) interactions with cell-division / cell-wall proteins. Those appear secondary to the link with GrX

Genomic context analyses have obtained a higher resolution in function prediction than phenotypic analyses

Cell division / Cell wallCell division / Cell wall (oxidative) stressoxidative) stress

Page 35: Turning genomics data into Biology

BolA is homologous to the peroxide reductase OsmC, suggesting a similar BolA is homologous to the peroxide reductase OsmC, suggesting a similar functionfunction

Page 36: Turning genomics data into Biology

 

Protein Family (PDB entry)

3D similarity to BolA. DALI, Z-scores

Sequence profile similarity to BolA. COMPASS, SW-score (E-value)

OsmC (1ml8A/1lqlA) Ohr (1n2fA)

5.8 / 5.5 5.2

73 (2.4 E-5)

KH 1 (1hnxC) 5.3 46 (9.4 E-3)

DUF150 (1ib8A) 3.7 44 (4.2 E-2)

GMP synthase C (1gpmA) 2.9 57 (7.0 E-4)

KH 2 (1egaB) 3.8 35 (2.7 E-1)

RBFA (1kkgA) 4.2 40 (9.6 E-2)

BolA is, relative to other class II KH folds and sequences, most similar to OsmCBolA is, relative to other class II KH folds and sequences, most similar to OsmC

Page 37: Turning genomics data into Biology

OsmC uses thiol groups of two, evolutionary conserved cysteines to OsmC uses thiol groups of two, evolutionary conserved cysteines to reduce substratesreduce substrates

Problem: The BolA family does not have conserved cysteines. Problem: The BolA family does not have conserved cysteines.

……It would have to obtain its reducing equivalents from elsewhere…It would have to obtain its reducing equivalents from elsewhere…

BolA family alignmentBolA family alignment

Page 38: Turning genomics data into Biology

BolA is (homologous to) a reductaseBolA interacts with GrX?

GrX provides BolA with reducing equivalents !?

Prediction of interaction partner and molecular function complement each otherPrediction of interaction partner and molecular function complement each other

Page 39: Turning genomics data into Biology

There is a wealth of functional and structural There is a wealth of functional and structural genomics data that can be related to the genomics data that can be related to the

function of individual proteins. function of individual proteins. Exploiting that data is becoming a trade in Exploiting that data is becoming a trade in

itselfitself(biochemistry by other means)(biochemistry by other means)