SymBioSys K.U.Leuven Center for Systems Biology. Topics to be addressed International trend Project...

Preview:

Citation preview

SymBioSys

K.U.Leuven Center for Systems Biology

Topics to be addressed

International trend Project concept Project structure 3 problems and 3 cases Computational methodology leads to user-

friendly tools and real biological impact Strategic importance internationally Strategic importance K.U.Leuven Coherence of the consortium

Systems biology

Biostatistics

Genetics

Sequenceanalysis

Expression analysis

Personalized

medicine

Nutraceuticals

Post-genomicdrug

development(new targets,

toxicogenomics)GMOs

Systems biology

Biological question& model

High-throughputtechnology

Computers& databases

Mathematicalmodels

The Human Genome Project has catalyzed striking paradigm changes in

biology - biology is an information science. [...] Systems biology will play a

central role in the 21st century; there is a need for global (high

throughput) tools of genomics, proteomics, and cell biology to decipher

biological information; and computer science and applied math will play a

commanding role in converting biological information into knowledge.

Leroy Hood, Institute for Systems Biology, Seattle, WA, 2002

Center of Excellence

Become a world-leading bioinformatics center for systems biology

Bioinformatics & microarrays Three topics of excellence

Gene prioritization by integrative genomics Graphical models of regulatory motifs and modules Inference of regulatory networks

We will achieve this goal through Further build-up of existing expertise Symbiosis between computational and biological

partners Concrete cases for real biological relevance Diverse cases for generic applicability in biology

Systems biologyG

enes

Module

sN

etw

ork

s

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Case

Case

Project concept

Cas

eCase

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics

Endocrinology

Salmonella genomics

Biological problem

Research concept & consortium

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics

Endocrinology

Salmonella genomics

Biological problem

Experiment design

Research concept & consortium

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics

Endocrinology

Salmonella systems biology

Biological problem

Experiment design

Biological data

Research concept & consortium

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics

Endocrinology

Salmonella genomics

Biological problem

Experiment design

Biological data

Data analysis

Research concept & consortium

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics

Endocrinology

Salmonella genomics

Biological problem

Experiment design

Biological data

Data analysis

Biological validation

Research concept & consortium

Probabilisticmodels

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics Endocrinology

Salmonella genomics

Biological problem

Experiment design

Biological data

Data analysis

Biological validation

Improved method

Research concept & consortium

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics Endocrinology

Salmonella genomics

Biological problem

Experiment design

Biological data

Data analysis

Biological validation

Improved method

New biology

Probabilisticmodels

Research concept & consortium

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics Endocrinology

Salmonella genomics

Biological problem

Experiment design

Biological data

Data analysis

Biological validation

Improved method

New biology

Probabilisticmodels

Research concept & consortium

Inte

grat

ive

geno

mic

sRegulatory

modules

Cellularnetworks

Gen

etic

al g

enom

ics Endocrinology

Salmonella genomics

DME-VIB

Prometa

KUL &DME-VIB

World

Probabilisticmodels

Peripheral groups & visibility

Yeast(CMPG & Bio)

Project structure

WP1. Candidate genes

WP2. Regulatory modules

WP3. Cellular networks

Humangenetics

Glucoseregulation

VitDmodes

of action

Salmonellasystemsbiology

Networkinference

Motifanalysis

Primaryanalysis

CGHChIPchip

Proteomics

Metabolomics

Candidate genes

Regulatory modules

Cellular networks

cDNA/

Affy

Geneprioritization

Data analysis Data generation

Project structure (SysBio -> 3 partners)

Geneticalgenomics

Endocrinology

Salmonellagenomics

WP1. Candidate gene prioritization

High-throughputgenomics

Statistics& data mining

Candidategenes

?

Human genetics identifies key genes in monogenic and multifactorial diseases

Moduleanalysis

Statisticalanalysis CGH

cDNA/

Affy

Geneprioritization

Algorithms Technologies

1

23

4

5

WP2. Module discovery

ACTC

MYLA

MYL1

MYOG

MYF6

CHRM2

MEF2

MYOD

SRF

Bayesiannetworks

Motifanalysis

Statisticalanalysis CGH ChIP Proteomi

csMetabolomics

cDNA/

Affy

Geneprioritization

Algorithms Technologies

OH

OHHO

H

Cells/tissues treated with 1,25-(OH)2D3

Identification of signalling cascades and transcription factors important for the effects of 1,25-(OH)2D3

TF

Validation of transcription

factor binding to detected motifs

12

3

4

5

VitD affects bone and calcium homeostasis and has potent anti-proliferative effects

mRNA expression analysis in pancreatic beta cells: finding mechanisms of diabetes

Motifanalysis

Statisticalanalysis

Generation of

antibodies

Functionalanalysis of beta cells

AffymetrixGene

System

Geneprioritization

Algorithms Technologies

Discovery of new modules for post-transcriptional gene regulation

1

3

4

5

Beta non brain pitui lung kidney fat liver musclCells beta

cells

musclepituitarynon-beta cells

<-2.5 >2.5

Signal Log Ratio of mRNA in beta -cells versus other tissues

mRNA expression profiles of normal

& diabetic beta cells

2

Mouse models for a common human disease

Microarray-data

ChIP-chip-data

Library of strains, eachwith a tagged regulator

Chromatin IP toenrich promoters

bound by regulatorin vivo

Microarray to identifypromoters bound by

regulator in vivo

Regulator Tag

Library of strains, eachwith a tagged regulator

Chromatin IP toenrich promoters

bound by regulatorin vivo

Microarray to identifypromoters bound by

regulator in vivo

Regulator Tag Sequence data

Network inferenceREMODISCOVERY

Functional Class: p-value Seed Profile

10 CELL CYCLE AND DNA PROCESSING: 0 10.03 cell cycle: 2.7e-5 10.01 DNA processing: 1.3e-4 42.04 cytoskeleton: 4.2e-3

40 CELL FATE : 5.2e-4 40.01 cell growth / morphogenesis: 2.6e-3 43 CELL TYPE DIFFERENTIATION: 5.2e-3 43.01 f ungal/microorganismic cell type differentiation: 5.2e-3 34.11 cellular sensing and response: 5.3e-3 01.05.01 C-compound and carbohydrate utilization: 6.8e-3 10.03.04.03 chromosome condensation: 9.4e-3

43 CELL TYPE DIFFERENTIATION: 3.6e-3 43.01 fungal/microorganismic cell type differentiation: 3.6e-3 10.03.03 cytokinesis (cell division) /septum formation : 4.8e-3

32.01 stress response: 3.2e-3 10.03 cell cycle: 8.7e-3

Combinatorial algorithm

WP3. Network inference

Salmonella is a powerful model for systems biology (illustration size)

Networkinference

Moduleanalysis

Statisticalanalysis CGH ChIP Proteomi

csMetabolomics

cDNA/

Affy

Geneprioritization

Algorithms Technologies

Library of strains, eachwith a tagged regulator

Chromatin IP toenrich promoters

bound by regulatorin vivo

Microarray to identifypromoters bound by

regulator in vivo

Regulator Tag

Library of strains, eachwith a tagged regulator

Chromatin IP toenrich promoters

bound by regulatorin vivo

Microarray to identifypromoters bound by

regulator in vivo

Regulator Tag

0

TF1

Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6

Gene n

TF2 TF3 TF4 … TFm

1 0 0 11 0 1 0 01 0 1 0 01 1 1 0 11 0 1 0 00 1 1 0 0

1 0 1 1 0

0

TF1

Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6

Gene n

TF2 TF3 TF4 … TFm

1 0 0 11 0 1 0 01 0 1 0 01 1 1 0 11 0 1 0 00 1 1 0 0

1 0 1 1 0

0

M1

Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6

Gene n

M2 M3 M4 … Mp

1 0 0 00 0 1 1 11 0 0 1 11 1 1 0 11 0 1 1 10 1 1 0 0

1 0 1 1 1

0

M1

Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6

Gene n

M2 M3 M4 … Mp

1 0 0 00 0 1 1 11 0 0 1 11 1 1 0 11 0 1 1 10 1 1 0 0

1 0 1 1 1

E1

Gene 1

Gene 2

Gene 3

Gene n

E2 E3 E4 … Ex

Gene 4

Gene 5

E1

Gene 1

Gene 2

Gene 3

Gene n

E2 E3 E4 … Ex

Gene 4

Gene 5

Preprocessing

Heterogeneous data

Motif compendium

Inferred network

Toucan 2

CGHGate

Endeavour

Real biological impact

Screenshots of titles of papers demonstrating a real biological impact of bioinformatics methods?

Bioi@SCD growth

Turnover since 1998

0

200000

400000

600000

800000

1000000

1200000

1400000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Omzet verloop per financieringskanaal 1998-2009

IWT

FWO

EU

DWTC

BOF

CMPG• J. Vanderleyden• J. Michiels• B. Cammue

Dept. of Mol. Microbiology

• J. Thevelein

CME-MG

• B. Hassan

• P. Marynen

• B. De Strooper

• W. Van de Ven

Lab of Clin. & Evolut.

Virology

• A. Vandamme

Dept. of Transgene Tech. &

Gene Therapy

• P. Carmeliet

CME-UZ

• JJ. Cassiman (CME-KUL)

• J. Vermeesch

Intensive Care

• G. Van Den Berghe

Obstetrics & Gynaecology

• I. Vergote

• T. D‘Hooghe

• D. Timmerman

PaperPaper

Paper

PaperPaper

Paper

Paper

Paper

Lab of Functional Biology

• J. Winderickx

LEGENDO

• C. Mathieu

CMPG• J. Vanderleyden• J. Michiels• B. Cammue

Lab of Clin. & Evolut.

Virology

• A. Vandamme

QuantPsy

• I. Van Mechelen

Lab of Functional Biology

• J. Winderickx

LEGENDO

• C. Mathieu

Mol.Cell Biology

BioChemistry

• F. Schuit

BioStat

• G. Verbeke

Dept. of Mol. Microbiology

• J. Thevelein

Dept. of Transgene Tech. &

Gene Therapy

• P. Carmeliet

CME-MG

• B. Hassan

• P. Marynen

• B. De Strooper

•W. Van de Ven

CME-UZ

• JJ. Cassiman

• J. Vermeersch

Intensive Care

• G. Van Den Berghe

Obstetrics & Gynaecology

• I. Vergote

• T. D‘Hooghe

• D. Timmerman

CoE

CoE

CoE

CoE

CoE

CoE

CoE

European bioinformatics landscape

Integration bioinformatics & stats Algorithmic methodologiesz

Three topics of excellence

Bioinformatics & microarrays1. Gene prioritization by integrative genomics2. Graphical models of regulatory motifs and modules3. Bayesian networks for prokaryotic systems biology

(1) Genomic data fusion

After an experiment, many sources of information are available to select the best candidates for modeling and validation

Probabilistic methods can optimize the prioritization Known genesrelated to a disease

or pathway Candidate genes Locus Screening

Multiple data sources Sequence Expression Function

Endeavour [Methodological impact]

http://www.esat.kuleuven.ac.be/endeavour

(2) Regulatory modules [what is a module? What is transcript. regulation?]

© Davidson EH et al. Science. 2002 Mar 1;295(5560):1669-78.

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

MotifSampler & TOUCAN

(3) Network inference

Reconstruction of the regulatory network underlying the phenotypic behavior

High throughput data

Benchmarking network inference methods

Realistic network structures

Realistic network dynamics

Simulated networks

Inferred networks

Graphical models

System identification

AK

Av

sv

max

1max ifAvv

Netw

ork

simula

tion

Netw

ork

Infe

rence

Workpackages

WP1: Candidate genes Preliminary data analysis

Microarrays (xM1.1) Generic

CGH microarrays (gWP1) Genetical genomics

Dealing with noise (xM2.1) Knowledge mining (gWP2)

& Combined modeling of different data sets (xM2.3) Genetical genomics Generic -> WP3: Salmonella

Software & databases (xM1.4)

Workpackages WP2: Regulatory modules

Motif and module discovery (xM1.2) Expression profiling in vitD and analogs pathways (xM3.1,

xM3.2) Beta cell regulation

Transcriptional regulation Post-transcriptional regulation

Genetic modules Multiple genome scans and gene modifiers?

Software & databases (xM1.4) WP3: Cellular networks

Network inference (xM1.3) Salmonella high-throughput technologies (xM4.1) Salmonella high-throughput data and analysis (xM4.2) VitD pathway modeling? Glucose sensing?

Detection of dependence relations (xM2.2) Software & databases (xM1.4)

Bioi@SCD growth

Personnel since 1998

0

5

10

15

20

25

Jul-98

Oct-98

Jan-99

Apr-99

Jul-99

Oct-99

Jan-00

Apr-00

Jul-00

Oct-00

Jan-01

Apr-01

Jul-01

Oct-01

Jan-02

Apr-02

Jul-02

Oct-02

Jan-03

Apr-03

Jul-03

Oct-03

Jan-04

Apr-04

Jul-04

Oct-04

Jan-05

Personeelsverloop 1998-2005

PhD

Postdoc

ZAP

Bioi@SCD growth

Publications since 1998

0

2

4

6

8

10

12

14

16

18

20

1999 2000 2001 2002 2003 2004 2005

Aantal publicaties van 1999-2005

Books

Conference

Journal

Bio@SCD growth

5 successful PhDs Gert Thijs (juni 2003) : Probabilistic methods to search

for regulatory elements in sets of coregulated genes Frank De Smet (mei 2004) : Microarrays : algorithms for

knowledge discovery in oncology and molecular biology Stein Aerts (mei 2004): Computational discovery of cis-

regulatory modules in animal genomes Geert Fannes (juni 2004): Bayesian learning with expert

knowledge : Transforming informative priors between Bayesian networks and multilayer perceptrons

Patrick Glenisson (juni 2004) : Integrating scientific literature with large scale gene expression analysis

Bioi@SCD growth

Software portal http://www.esat.kuleuven.ac.be/~dna/Bioi/

Number of user on a monthly basis

0

200

400

600

800

1000

1200

1400

Nov-0

0

Feb-0

1

May

-01

Aug-0

1

Nov-0

1

Feb-0

2

May

-02

Aug-0

2

Nov-0

2

Feb-0

3

May

-03

Aug-0

3

Nov-0

3

Feb-0

4

Toucan 2

Endeavour

CMPG• J. Vanderleyden• J. Michiels• B. Cammue

Dept. of Mol. Microbiology

• J. Thevelein

CME-MG

• B. Hassan

• P. Marynen

• B. De Strooper

•W. Van de Ven

Intensive Care

• G. Van Den Berghe

Obstetrics & Gynaecology

• I. Vergote

• T. D‘Hooghe

• D. Timmerman

IDO, BOF PostDoc

GBOU, PhD

Project, PhD, PostDoc

CAGE

Bruges

Kortrijk

Ghent

Antwerp

Brussels

Leuven

Turnhout

2005

Geel

Hasselt

Mechelen

BrugesGenencor International

GhentAblynxAlgoNomics Applied Maths Bayer BioScience Bioin4matrixBioMARIC CropDesigndeVGen

Innogenetics Maize Technologies Int’lMethexis Genomics XcellentisYakultPeakadilly

AntwerpDCI-labsFlen PharmaHistogenexMemo Bead Technologies

TurnhoutDiaMed EuroGenJanssen Pharmaceutica

GeelBarrier TherapeuticsGenzyme FlandersMaia Scientific

MechelenBio-ArtCryoSaveGalapagos Genomics TibotecVirco

BrusselsBeta-cell DentechEggCentrisR.E.D. Laboratories

Leuven4AZA Bioscience DiatosNeurogeneticsPharmaDMreMynd RNA-TEC Thromb-X Tigenix Vivactis

Flemish biotech companies

Bayesiannetworks

Motifanalysis

Statisticalanalysis

CGHChIPchip

Proteomics

Metabolomics

Candidate genes

PI:

Regulatory modules

PI:

Cellular networks

PI:

cDNA/

Affy

Geneprioritization

Algorithmic research Data generation

Project structure – budget (750 KEuro?)

Geneticalgenomics

Endocrinology

Salmonellagenomics

Postdoc 2

Phd 2

Techn 1

Postdoc 3

Phd 3

Postdoc 1

Phd 1

Techn 2

Techn 3

Phd 4

allerlei

Eerste citaties met “bioinformatics”

Trends Biotechnol 1993 Ann N Y Acad Sci 1993

Network reconstruction based on heterogeneous data

Microarray-data

ChIP-chip-data

Library of strains, eachwith a tagged regulator

Chromatin IP toenrich promoters

bound by regulatorin vivo

Microarray to identifypromoters bound by

regulator in vivo

Regulator Tag

Library of strains, eachwith a tagged regulator

Chromatin IP toenrich promoters

bound by regulatorin vivo

Microarray to identifypromoters bound by

regulator in vivo

Regulator Tag Sequence data

Preprocessing Network inference

AK

Av

sv

max

1max ifAvv

Network structures based on real biological networks

Realistic network dynamics Simulated networks

Benchmarking network inference methodologies

R M Functional Class: p-value Seed Profile

Module 1

Mbp1 Swi6 Swi4 Stb1

M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_67 (Swi4)

10 CELL CYCLE AND DNA PROCESSING: 0 10.03 cell cycle: 2.7e-5 10.01 DNA processing: 1.3e-4 42.04 cytoskeleton: 4.2e-3

Module 2

Swi4 Mbp1 Swi6 FKH2

M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_8 (Mcm)

40 CELL FATE : 5.2e-4 40.01 cell growth / morphogenesis: 2.6e-3 43 CELL TYPE DIFFERENTIATION: 5.2e-3 43.01 f ungal/microorganismic cell type differentiation: 5.2e-3 34.11 cellular sensing and response: 5.3e-3 01.05.01 C-compound and carbohydrate utilization: 6.8e-3 10.03.04.03 chromosome condensation: 9.4e-3

Module 3

NDD1 FKH2 Mcm1

M_8 (Mcm) M_30 (Mcm)

43 CELL TYPE DIFFERENTIATION: 3.6e-3 43.01 fungal/microorganismic cell type differentiation: 3.6e-3 10.03.03 cytokinesis (cell division) /septum formation : 4.8e-3

Module 4

Swi5 (Ace2)

M_8 (Mcm)

32.01 stress response: 3.2e-3 10.03 cell cycle: 8.7e-3

AK

Av

sv

max

1max ifAvv

Realistic network structures

Realistic network dynamics

Simulated networks

Benchmarking network inference methodologies

Inferred networks

Graphical models

System identification

Now: the molecular pipeline

Powerful high-throughput technologies enable genomewide screening

Sequencing, microarrays, etc.

Some genes selected(arbitrarily) for validation

After a long validationthe best-known genesare integrated into a biological model (maken van predictieve modellen op beperkte genen is niet het onderwerp van het project)

Screen

Validate

Model

Future: the systems genomics pipeline

Validate

Select

By integrating computation tightly with biological experiments, promising genes are selected and integrated to computational models to retain only the best candidates for validation

There is a continuous interchange between the different levels of analysis

Screen

Model

Recommended