101
Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour: elucidate organelle- related pathways

Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Embed Size (px)

Citation preview

Page 1: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Towards the virtual organism

• PART I: Databases and tools for biochemical pathways

• PART II: Relating expression data and pathways

• PART III: Guided Tour: elucidate organelle-related pathways

Page 2: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway diagramWIT database

Page 3: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Major contributions of Pathways databases

• Information Resource - Literature compilation

• Gene Ontology

• Sequence and Genome Annotation

• Relationship between pathways (function) and chromosomal position

• Analysis of Gene Expression Arrays

• Understanding Cellular Dynamics

• Disease Process Modeling

Without context and purpose, information is mere data . - Clement Mok

Page 4: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

As when a highly connected node in the internet breaks down, the disruption of p53 has severe consequences.

Jeong et al. 2001 Nature

Page 5: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Towards the virtual organism

Introduce biochemical pathways resources• What Is There (WIT/PUMA/EMP/ERGO)

• Kyoto Enzyclopedia of Genes and Genomes (KEGG)

• Signalling Databases

• Pathways Database (PathDB)

Focus on• Accessability

• Database contents and models

• Query features

• Gene/Protein/Pathway analysis

• Visualization

Why do all these projects the same thing?

Page 6: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Why do all these projects seem to do the same thing?

• Data model is a view of the world

– Different database management systems

– Tools particular to data model and database management systems

– Different content

• Analogous to model system approach to biology

– E.coli, yeast, C.elegans, Drosophila, Mouse, etc. are all used to provide

understanding of human biology

• No one system does everything, but concepts and data can often be shared

He may have stole that song from me, but I steal from everybody. - Woody Guthrie

Page 7: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

WIT/PUMA/EMP System

• Argonne National Lab and Integrated Genomics Inc, USA • http://wit.mcs.anl.gov/WIT2/

• Ross Overbeek, Evgeni Selkov, Natalia Maltsev• Team: 7

• WIT is freely downloadable (ftp://ftp.mcs.anl.gov/pub/Genomics/WIT2/)

Page 8: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

WIT/PUMA/EMP System

• Annotation/Literature database• Blast, PSI-Blast• ClustalW• COG• ProtScale• Transmembrane helices/topology• Prodom• ProSite• Operons (Pairs of close bidirectional best hits)

Focus on: sequence analysis, annotation of genomes with respect to metabolism

Page 9: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Ways to go: from genes to pathways

Starting from -• Gene/protein sequence• Gene/protein name• Organism/Genome (‘Metabolic reconstruction’)

To Pathways of -– Metabolism– DNA– Regulation of metabolism

Page 10: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

From Blast results to genes

Page 11: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

From genes to pathways

Page 12: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

WIT Pathway diagrams:Tabular format

Page 13: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

WIT Pathway Diagrams:Picture

Links to furtherinformation

Page 14: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

WIT Detail pages:Enzyme

Name, ReactionEC, Description

SpecificActivity

PreparativeProtocol

Substrates, Coenzymes,Inhibitors, Modification, Kinetics, Genomes ….

4788

3304

6502

6306

9500

6914

39

Page 15: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Kyoto Encyclopedia of Genes and GenomesKEGG

• Institute for Chemical Research, Kyoto University• http://www.genome.ad.jp/kegg/

• Minoru Kanehisa • System development: 9• Data entry and curation: 18

• Academic users may freely download the package• ftp://kegg.genome.ad.jp/mirror/

Page 16: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

KEGG: Data content and statistics

• 3705 EC numbers

• 11132 Enzyme names

• 3794 Substrates

• 5284 Metabolic reactions

• 113 Pathways

– mostly metabolic

• 36 Organisms

Page 17: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

KEGG: Query capabilities

• Reconstruct pathway maps using blast

• Search and color genes, enzymes and compounds in pathway diagrams

and ortholog tables

• Sequence: blast and fasta

• Genome Maps

• Generate reaction paths between compounds

Focus on: display gene-centric data

in the context of predefined pathways

Page 18: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

´State of the Art´

KEGG picture of the glycolysis

genes present in E. coli

static Network

manually compiled

manually drawn

textbook knowledge

Page 19: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

versus

static Network

manually compiled

manually drawn

textbook knowledge

Representation of Networks

dynamic Network

features complete knowledge

restriction of content is up to the user

experimental data can be reflected in net structure

include user-owned data

Page 20: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway related projects

• KEGG Metabolic Pathways

• EMP - Enzymes and Metabolic Pathways

• WIT - Metabolic Reconstruction

• UM-BBD - Microbial Biocatalysis/Biodegradatation

• EcoCyc - E. coli Genes and Metabolism

• SoyBase - Soybean Metabolism

• Metalgen - Genes and Metabolism

• Boehringer Mannheim - Biochemical Pathways

• IUBMB-Nicholson Minimaps

• PathDB - Plant Metabolic Pathways

Metabolic Pathways

Protein-Protein Interactions

• BRITE Database for Biomolecular Relations

• DIP - Database of Interacting Proteins

• BIND - Biomolecular Interaction Network Database

• KEGG Regulatory Pathways

• SPAD - Signal Transduction

• CSNDB - Cell Signaling Networks

• Yeast Pathways in MIPS

• Interactive Fly - Drosophila Genes

• GIF_DB - Drosophila Gene Interactions

• FlyNets - Drosophila Molecular Interactions

• GeNet - Gene Networks Database

• HOX-Pro - Homeobox Genes Database

• Wnt Signaling Pathway

• TRANSPATH - Gene Regulatory Pathways

• GenMapp - Mostly mouse pathways

Regulatory Pathways

Page 21: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

• LIGAND - Chemical Database for Enzyme

Reactions

• ENZYME - Enzymes

• BRENDA - Comprehensive Enzyme

Information System

• Worthington Enzyme Manual

• Klotho - Biochemical Compounds

• ChemFinder - Searching Chemicals

• ChemIDplus at NLM

• PROMISE - Prosthetic Groups and Metal Ions

• GlycoSuiteDB - Glycan Structure Database

• CarbBank - Complex Carbohydrate Structure

Database

• WebElements - Periodic Table

Enzymes, Compounds

• TRANSFAC - Transcription Factor Database

• RegulonDB - E. coli Transcriptional Regulation

• DBTBS - B. subtilis Transcription Factors

• DPInteract - DNA binding proteins

Transcription Factors

• IUBMB - Nomenclature

• IUPAC - Nomenclature

• SWISS-PROT - Documents

• GO - Gene Ontology

(FlyBase/SGD/MGD/TAIR/WormBase)

Nomenclature - General

Page 22: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Simulation of biochemical reactions and cellular process

• BioKin - Enzyme kinetic software

• BioQuest - Metabolic Simulation

• BioSpice - still in progess

• Bioxml.org - a site collecting together a number of biologically-oriented open-source projects

• DBsolve - Software for metabolic, enzymatic and receptor-ligand binding simulation

• DMSS - Scalable, Discrete Event Metabolic Simulation System

• E-Cell - A simulation platform for the modelling of cells at a molecular level

• Electronic Arc - experimental visual simulator

• Elementary Modes - has a Java simulation

• Gepasi - A software package for modelling systems of biochemical reactions

• Jarnac - A language for describing and manipulating cellular system models

• StochSim - A general-purpose stochastic simulator of biological reaction networks.

• Systems Biology Workbench - An XML based integration system

• Virtual Cell - A general computational framework for modeling cell biological processes

Page 23: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Signal transduction browser (Transpath)

Page 24: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Signal transduction browser (Transpath)

Page 25: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Signal transduction browser (Transpath)

Page 26: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

PathDB

• National Center for Genome Resources• http://www.ncgr.org/software/pathdb/

• Jeff Blanchard• Software Development: 5• Literature Curation: 4

• The software is freely available (Client)• The database server can be installed at the site of cooperation

partners

Page 27: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

PathDB data model

• Compounds• Macromolecules: lipids, polysaccharides• Information molecules: DNA, RNA• States: development, disease, genotype, phenotype,

environment

• metabolic reactions• protein modifications and interactions• Regulation: transcriptional, translational, posttranslational• Transport• biological hierarchies, ontologies

• incomplete and conflicting knowledge

Page 28: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

PathDB datamodel

Mediator

Substrate

Product

BiochemicalEntity

Step

Transitionof Entities

Constructionof Entities

Protein

Subunit

Compound

DNA

BuildingBlocks

RNA

Location BiolProcess GenotypeAttributes

Phenotype Environment

Page 29: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Platform for Network Analysis

Focus on: building custom networks, compare to large scale experiments

Relational database for metabolic reactions, regulation and states

(disease, genotype, phenotype)

QueryTool

Query the database, e.g. to collect a set of reactions

transform between types: proteins, compounds, steps

restrict to attributes: organism, location, states

PathwayViewer

Visualize the results of the search

Page 30: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Query window showing“Proteins involved in Biological process DNA repair”

Page 31: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

• Transform to ‘Phenotype’• Select ‘Caffeine Sensitivity’ and get all Proteins• Do Intersection and get all Steps

Page 32: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

PathwayViewer

• Inspect and manipulate pathways or routes between

metabolites.

• Alternate topological representations of a pathway: primary and

secondary metabolites

• Manipulate layout on screen

• Control how much data is displayed

• Automatically lays out pathways

– hierarchical or circular algorithm

• Visualization of gene expression and metabolic profiling data

Page 33: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Visualize Steps involved in DNA synthesis and

Caffeine sensitivity

Page 34: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

1

2

3

Exploring the network neighborhood- build pathways on the fly

Page 35: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Large-Scale Experiments

SequencesAnnotation

What datasources are out there ?

SW

GenBank

MIPS Gene expression

Protein-Protein

Metabolic profiling

Protein-SmallMolEMBL

Protein expressionOntologies

GOUMLS/MESH

MBO EcoCyc

RegulationMetabolism

KEGG

WIT

BRENDA

PathDB

CSNdb

BIND

aMAZE

BRITE

DIP

Knowledge

Medline

Page 36: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Ontology: Bind genes to hierarchies

GO Gene Ontology, 2000

Translation/Mapping between:

Cellular LocationAnatomy

Biological Process

Molecular Function

Page 37: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Browsing the ontology

Page 38: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Hierarchy of Complexity

Entities or States

Processes

molecular

molecular

micro

micro

macro

macro

metabolic reactionsprotein-protein Interactions

conformation change

protein, RNA, DNA, compounds

mitosisapoptosis

transcription

organellescell types, tissues

diseasedevelopmentenvironment

disease statesdevelopment states

phenotype

Page 39: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

AnnotationSequences

RegulationMetabolism

Processes/Entities and experimental support

Gene expression

Protein-Protein

Metabolic profiling

Protein-SmallMol

Large-Scale Experiments

Protein expression

Knowledge

Ontologies

PathDBComplete Wiring Diagram

Reference experimentalsupport

Page 40: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

How well does my set of gene expression arrays support my model of cellular processes?

Questions

What is the difference between between a normal and a cancer cell?

What is the effect of a knockout mutation on the cellular network?

What “classical” pathways are up or down regulated in my gene expression data?

How does a drug perturb a cellular network as judged through gene expression data?

What experiment promises to distinguish between contradictory hypotheses?

Page 41: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

PART II

Relating gene expression and pathways

Page 42: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Analysis of Expression Data

Clustering of time coursesIyer et.al., Science, 1999

„Scatter plot“ comparingtwo experimentsRoberts et.al., Cell, 2000

Page 43: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Using pathways to contextualize gene expression arrays

Miki et al. PNAS, 2001

Page 44: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Expression Pattern Clustering

J-Express B. Dysvik / I. Jonassen, U.Bergen, Norway

Page 45: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Mapping of Jexpress Cluster onto Pathways

sce00051 Fructose and mannose metabolism EC 3.1.3.46 Fructose-2,6-bisphosphate 2-phosphatase; Fructose-2,6-bisphosphatasesce00190 Oxidative phosphorylation EC 1.9.3.1 Cytochrome-c oxidase; Cytochrome oxidase; Cytochrome a3; Cytochrome aa3 EC 3.6.1.34 H+-transporting ATP synthase; H+-transporting ATPase; Mitochondrial ATPase; Coupling facotrs (F0-F1 and C0-F1); Chloroplast ATPase; Bacterial Ca2+/Mg2+ ATPase EC 3.6.1.38 Ca2+-transporting ATPase; Calcium pumpsce00251 Glutamate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00252 Alanine and aspartate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00410 beta-Alanine metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00640 Propanoate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00650 Butanoate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminase sce03110 ATP Synthase EC 3.6.1.34 H+-transporting ATP synthase; H+-transporting ATPase; Mitochondrial ATPase; Coupling facotrs (F0-F1 and C0-F1); Chloroplast ATPase; Bacterial Ca2+/Mg2+ ATPase

Cluster represents genes

of different contexts

Page 46: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Clustering and Incremental Pathway Construction

• Genes mapped to reactions• dynamically build networks from reaction DB and clustered genes

A pathway (10 genes) from five clusters with 57 EC-annotated genes

24 (out of 54) gene clusters(6153 ORFs, 694 EC-annotated)

Fellenberg&Mewes, 99

Pathway represents 10 genes out of 500

Page 47: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Principal Component Analysis (PCA)

• Eigen Analysis• solve for eigenvalues and eigenvectors of a square symmetric

matrix– pure sums of squares and cross products (SSCP)– scaled sums of squares and cross products (Covariance)– sums of squares and cross products (Correlation)

w1 arg maxw 1

E wT x 2 wk arg max

w 1E wT x wiwi

T xi1

k 1

2

Page 48: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Principal componentsand visualization

J-Express B. Dysvik / I. Jonassen, U.Bergen, Norway

Page 49: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Data driven vs hypotheses driven approach

• Erroneous and noisy expression data• Many genes, measurements• Many spurious hits/clusters of expression patterns• Incomplete data (measurements, kinetic parameters) • Cost of regulation: partially regulated pathways

The data driven approach to Genome and Expression Analysis

Basic Assumptions ( Pathways Cluster )

• Expression time courses for pathways do not necessarily

cluster together

• Clustered genes do not necessarily form pathways

Expression Data and Pathways

Page 50: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Biological Knowledge

Outline of a Hypothesis Driven Approach

GPE-Score(Pathway)

Page 51: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Different Questions - different Scoring Functionscorrelated

combined : correlated + conspicuous

conspicuous

Diauxic shift data, DeRisi et al, Science, 1997

Page 52: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Distribution of Relative Expression Levels: Error Model

Page 53: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Distribution of Relative Expression Levels: Null Model

err

gtt sd

mgP

0*2:)( ,0

Measurement error Null model

Page 54: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Conspicuousness Score: Gene and Pathway Score

0

)(1

1:)(

tTtt gscore

Tgscore

Pg

gscoreP

Pscore )(1

:)(

)(log:)( 0 gPgscore tt

Gene score

Pathway score

g

t

Diauxic shift data, DeRisi et al, Science, 1997

Page 55: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

GPPathways

Pathway model

Pg

gscoreP

Pscore )(1

:)(Pathway score

gphg

p hgccp

gscore ),(1

:)(Gene score

hg

hg

sdsdhgcc ,cov

:),(

)(,| 0, gggtg msdsdtTtmm

Covariance/Synchrony

Normalization/Gene Variability

otherwiseP

PggPPg

Normalization/Conspicuousness

errerr

hg

sdsdhgcc ,* cov

:),(

Combined Score

/ Combined Score

Statistical Evaluation of Expression Data: CorrelationCorrelation

Page 56: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Combinatorial Mapping of Proteins to Genes

Each row isone possiblepathwayin gene space

Each columnspecifies a genefor one nodein pathway

3

2

3

2

Total: 36

Page 57: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathways, Functions, EC-numbers, Proteins, Genes

Nodes are labeled with(sets of) proteins

Nodes ={ <2.7.1.2: YCL040W, YDR516c>, <2.7.1.1: YFR053C, YGL253W> <3.1.3.1: YMR105C, YHR215W, YDL024C, ...> <3.1.3.58: YCL040W, ...>, <3.1.3.8: YNL141W, ...>, <3.1.3.9: YBR011C, YJL130C> }

Nodes are labeled with(sets of) EC-numbers

Pathway = (Nodes, Edges)

Nodes = { 2.7.1.2, 2.7.1.4, 2.7.1.59, 2.7.1.61, 2.7.1.63, 2.7.1.7, 2.7.1.1, 3.1.3.1, 3.1.3.58, 3.1.3.8, 3.1.3.9 }

Extract frommetabolic DB +Systematic Generationof Pathways

Nodes are labeled withreactions 900 different ORF pathways

Page 58: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Glycolysis pathways: Combined Score

---- 10000 random “pathways”---- 900 putative glycolysis pathways

---- 10000 random “pathways”

---- 10000 random “pathways”---- 900 putative glycolysis pathways ---- 36 valid glycolysis pathways---- 36 valid glycolysis pathways

BiologicallyMeaningful

Page 59: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Identifying Genes via Pathway Scores

ORFs fitting into a given pathway according to specific scoring function

ORFs related to a given pathway according to specific scoring function

Page 60: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

High scoring genes(correlated with TCA cycle genes)

Score correlated w.r.t TCA cycle

Score not correlated

Score negatively correlated

Example 1: Oxidative Phosphorylation

Page 61: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Low scoring genes(anti-correlated with TCA cycle genes)

Score correlated w.r.t TCA cycleScore not correlatedScore negatively correlated

Example 2: Biosynthesis of Aminoacids

Page 62: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Average scoring genes(not correlated with TCA cycle genes)

Score correlated w.r.t TCA cycleScore not correlatedScore negatively correlated

Example 3: Urea cycle

Page 63: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway Scores ...

… are suitable for ...

– interpreting time series

– coping with erroneous data

– ranking pathways with respect to plausibility

– interpreting how well pathway genes fit to the pathway

– go fishing for further genes correlated to the pathway (with great care)

– posing different questions by defining new scoring functions

Page 64: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

• Boolean Models– logical relationships between variables

• Differential equations– continuous dynamics of biological reactions

• Bayesian Networks– statistical testing of hypotheses– use gene/protein annotations as priors to represent

biological knowledge

Genetic Network Reconstruction

BayesianScore (S) log p(S |D)

logp(S)logp(D|S)c

Page 65: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

• Variables: Gal80p, Gal4m, Gal4p genes

• Binary quantization using maximum likelyhood

• Compare all models possible in the system

• Experiment: reproduce currently accepted model of galactose

regulation

Bayesian Networks

Comparing two models for the control of thegalactose metabolism in yeast

Page 66: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Gal80p represses Gal4m Gal80p inhibits Gal4m posttranslationally

Gal2 independent of Gal80m Gal4m independent of Gal80m

Page 67: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:
Page 68: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Edge annotation as Bayesian priors

• No annotation between X and Y• Positive stimulation: X increases activity of Y• Negative stimulation• Undefined

Constraints on the dependence between the genes

Permits scoring of annotated models as unannotated models

Page 69: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Bayesian Networks:Evaluate and extend networks

• Retrieve network from database

• Curate part of the network

• Automatically generate hypotheses on the rest

• Quantify hypotheses using Bayesian metric

• Present high-scoring hypotheses to the user

• Present scores for single genes/edges in the network

• Manual investigation of high scoring hypotheses: New facts

• Generate another iteration of hypotheses

Page 70: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

PART III

Guided Tour:

Eluciadate organelle related pathways

Page 71: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Compartments in the eukaryotic cell

Voet & Voet, Biochemistry

Page 72: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Construction of models in yeast

The Network• Yeast2Hybrid interactions 5774 ( 6121)• Other protein-protein interactions 2347 ( 4384)• Other gene product interactions (MIPS) 6654 (15245)• Protein complexes 934 ( 1020)• Metabolism 2135 ( 4258)

17529 (31028)

Subcellular Localization• Subcellular localization catalogue (MIPS) 2800 (2300)• Prediction: TargetP 720• Custom motiv search

Page 73: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Generate Networks fromExperiments: Yeast2Hybrid

Page 74: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

‘The protein-protein interaction network of yeast’, Uetz et al., 2000

Page 75: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:
Page 76: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Map of protein-protein interactions

red, lethal; green, non-lethal; orange, slow growth; yellow, unknown

Jeong et al. 2001

Page 77: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Gray et al. 1999 Science

Origin of mitochondria

Page 78: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Subcellular location in yeast

• MIPS– localization for 2300 gene products– wide range of subcellular compartments

ER 159Peroxisome 39Transport vesicles 48Vacuole 54Nucleus 820Cytoskeleton 113Cell wall 37

Golgi 81Endosome 11Cytoplasm 583Plasma membrane 153Mitochondria 376Chromosome struct. 23

Page 79: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Subcellular localization of gene products

• Based on N-Terminal sequence• Except Peroxisome: C-Terminal sequence

• TargetP– neural network based– distinguishes between mitochondrion, chloroplast, secretory

pathways– estimated accuracy: 85%– plants: 10% mitochondrion, 15% chloroplast

• Results for yeast: – 383 mitochondrion– 294 secretory pathways

Page 80: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Model of yeast mitochondrion

Mitochondr.Ribosom

OM import

F1F0 ATP synthaseMitochondr.

Ribosom

ExonucleaseCydochrome red

Cydochrome ox

Succinate DH

Isocitrate DH

Citrate cycle

Aminoacids

Liponamide

IM transloc

Glycolysis

Page 81: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Evolution of peroxisomal import

Olsen et al. 2001

Page 82: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

C-termN-term

Subcellular localization: Peroxisome

PTS 1PTS 2

S K LC R MA H

P N IG Q VT S FK S YN A

SmallUnchargedHydrophobic

nonpolarHydrophobic

Basicpos. charged

R L X5 H LK I Q A

QV

10 .. 40

most

common

acceptable

46126

657

Page 83: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Model of yeast peroxisome

Page 84: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Communication between peroxisomes and the cell

Page 85: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Peroxisomes and phenotype

Page 86: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

ISYS - a platform for the integration of software tools and databases

• "plug and play" tools of interest.

• separately developed and independently evolving

• DynamicDiscovery - an exploratory environment to pass objects among components

• Supports visual synchronization among components.

• Integrates web-based resources with desktop applications

Page 87: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Double edged intergration problem - technology and IP/licensing

(at least for non-profits)

ISYS

NCGRStanford

BerkeleyWash. U

Manchester

Web

Other thirdparty software

Your organization’s tools

PathDBCMD Tool

Table Viewer Sequence ViewerSimilarity Search

Viewer

X-Cluster

GO Browser

ATV

MaxD

Entrez - NCBIBLAST - NCBI

GeneScan - MITGoogle

TAIR - NCGRGeneX - NCGR

Page 88: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Compare regulated genes with Gene Ontology and MaxD

MaxD: David HancockUniversity of Manchester

GO: Michael AshburnerThe Gene Ontology Cons.

Page 89: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Perform statistical analysis:MaxD and Pathway Scoring

MaxD: David HancockUniversity of Manchester

Page 90: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Rosetta: Compendium of expression arrays

• 300 Yeast expression arrays: Hughes et al., 2000, Cell– 280 gene knockout mutants

– 20 titration experiments

• Nutritients

• Antibiotics

• Choose 40 experiments– Pex12

– 5 genes: human expert knowledge

• involved in gluconeogenesis, ER, vacuolar transport

– 36 more: contained in peroxisomal network

• Cellular organization

• Transcriptional control

• Metabolism, focus energy

• Protein destination, cellular transport

Page 91: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Regulation of metabolismInterface peroxisome and cytoplasm

Page 92: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:
Page 93: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Regulation of metabolismInterface peroxisome and cytoplasm

Page 94: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway scores:Comparing network and expression

YOR184W 39.68YER090W 37.98YPR145W 36.61YGL062W 36.18YKL211C 35.90YLR027C 35.09

Pathway 36.90

YIR034C 86.46YIL116W 76.63YOL058W 75.38YMR062C 72.87YDL182W 58.25YDL066W 52.03YER052C 49.59YOL140W 47.78YDL131W 43.24YGL202W 39.86YHR208W 38.35YJR148W 36.32YNL220W 36.06YCR005C 35.38YHR137W 35.21YNL037C 35.14YNR050C 34.85YMR300C 32.54YLL018C 29.77YAR015W 28.77

Iteration 1

Page 95: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway scores:Distribution of correlated genes

Page 96: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway scores:Distribution of correlated genes

Page 97: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Pathway scores:Regulatory proteins with correlated expression

Page 98: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Oleate response element (ORE)

Page 99: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Transcription and threonine pool

Page 100: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Non-classical protein export

Page 101: Towards the virtual organism PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour:

Conclusion

• Organism-wide virtual experiments can be performed• Comprehensive models can be constructed and evaluated

– complete sequence– abundance of edges between genes/geneproducts or genes

and phenotype– abundance of information from annotations, large scale

expression experiments and Yeast2Hybrid

• What we do not yet understand (well enough)– relationship between proximity in the network and protein

sequence– networks properties: high degree of interconnectivity yet

limited effects from gene disruption– translational and posttranslational regulation– How to apply large scale experiments to regulation

In case of simple eukaryotes the ‘virtual organism’ is in reach