Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk

Reverse engineering gene regulatory networks

Dirk Husmeier

Adriano Werhli

Marco Grzegorczyk

Systems biology

Learning signalling pathways and regulatory networks from

postgenomic data

unknown

high-throughput experiment

postgenomic data

unknown

data data

machine learning

statistical methods

true network extracted network

Does the extracted network provide a good prediction of the true interactions?

Reverse Engineering of Regulatory Networks

• Can we learn the network structure from postgenomic data themselves?

• Statistical methods to distinguish between– Direct interactions– Indirect interactions

• Challenge: Distinguish between– Correlations– Causal interactions

• Breaking symmetries with active interventions:– Gene knockouts (VIGs, RNAi)

direct

interaction

common

regulator

indirect

interaction

co-regulation

• Relevance networks

• Graphical Gaussian models

• Bayesian networks

Relevance networks(Butte and Kohane, 2000)

1. Choose a measure of association A(.,.)

2. Define a threshold value tA

3. For all pairs of domain variables (X,Y) compute their association A(X,Y)

4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value tA

Association scores

‘direct interaction’

‘common regulator’

‘indirect interaction’X

strong

correlation σ12

Pairwise associations without taking the context of the system

into consideration

Graphical Gaussian Models

direct interaction

Partial correlation, i.e. correlation

conditional on all other domain variables

Corr(X1,X2|X3,…,Xn)

strong partial

correlation π12

direct

interaction

common

regulator

indirect

interaction

co-regulation

Distinguish between direct and indirect interactions

A and B have a low partial correlation

direct interaction

Partial correlation, i.e. correlation

conditional on all other domain variables

Corr(X1,X2|X3,…,Xn)

Problem: #observations < #variables

strong partial

correlation π12

Shrinkage estimation and the lemma of Ledoit-Wolf

direct

interaction

common

regulator

indirect

interaction

P(A,B)=P(A)·P(B)

But: P(A,B|C)≠P(A|C)·P(B|C)

Undirected versus directed edges

• Relevance networks and Graphical Gaussian models can only extract undirected edges.

• Bayesian networks can extract directed edges.

• But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

Bayesian networks

•Marriage between graph theory and probability theory.

•Directed acyclic graph (DAG) representing conditional independence relations.

•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.

•We can infer how well a particular network explains the observed data.

),|()|(),|()|()|()(

),,,,,(

DCFPDEPCBDPACPABPAP

FEDCBAP

Bayesian networks versus causal networks

Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

True causal graph

Node A unknown

• Equivalence classes: networks with the same scores: P(D|M).

• Equivalent networks cannot be distinguished in light of the data.

Equivalence classes of BNs

)|()()|(

)()|()()()|( 1

BCPBPCAP

CPCAPCPBPBCP

11 )(),()(),()(

)|()|()(

APACPCPCBPAP

ACPCBPAP

),|()()( BACPBPAP

)()|()|(

),()|(

CPCBPCAP

CBPCAP

completed partially directed graphs (CPDAGs)

v-structure

P(A,B)=P(A)·P(B)

P(A,B|C)≠P(A|C)·P(B|C)

P(A,B)≠P(A)·P(B)

P(A,B|C)=P(A|C)·P(B|C)

Symmetry breaking

•Interventions

•Prior knowledge

Symmetry breaking

•Interventions

•Prior knowledge

Interventional data

A B A B

inhibition of A

iXpaiii i

DXpaDXPMDP1

][ )][|()|(

DXpaDXP1

}{ )][|(

down-regulation of B

no effect on B

A and B are correlated

Learning Bayesian networks from data

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Learning Bayesian networks from data

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Evaluation

• On real experimental data, using the gold standard network from the literature

• On synthetic data simulated from the gold-standard network

Evaluation

From Sachs et al., Science 2005

Evaluation: Raf signalling pathway

• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell

• Deregulation carcinogenesis

• Extensively studied in the literature gold standard network

Raf regulatory network

From Sachs et al Science 2005

Flow cytometry data

• Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins

• 5400 cells have been measured under 9 different cellular conditions (cues)

• Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Two types of experiments

Evaluation

Comparison with simulated data 1

Raf pathway

Comparison with simulated data 2

Steady-state approximation

Real versus simulated data

• Real biological data: full complexity of biological systems.

• The “gold-standard” only represents our current state of knowledge; it is not guaranteed to represent the true network.

• Simulated data: Simplifications that might be biologically unrealistic.

• We know the true network.

How can we evaluate the reconstruction accuracy?

true network extracted network

biological knowledge

(gold standard network)

Evaluation of

learning

performance

Performance evaluation:ROC curves

•We use the Area Under the Receiver Operating

Characteristic Curve (AUC).

0.5<AUC<1

AUC=1AUC=0.5

Performance evaluation:ROC curves

Alternative performance evaluation: True positive (TP) scores

We set the threshold such that we obtain 5 spurious edges (5 FPs) and count the corresponding number of true edges (TP count).

5 FP counts

Alternative performance evaluation: True positive (TP) scores

Directed graph evaluation - DGE

true regulatory network

Thresholding

edge scores

TP:1/2

FP:0/4

TP:2/2

FP:1/4

concrete networkpredictions

lowhigh

Undirected graph evaluation - UGE

skeleton of the

true regulatory network

Thresholding

undirected edge scores

TP:1/2

FP:0/1

TP:2/2

FP:1/1

high low

concrete network(skeleton) predictions

Synthetic data, observations

Synthetic data, interventions

Cytometry data, interventions

How can we explain the difference between synthetic

and real data ?

Simulated data are “simpler”.

No mismatch between models used for data generation and inference.

Complications with real data

Can we trust our gold-standard network?

Raf regulatory network

From Sachs et al Science 2005

Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al

Disputed structure of the gold-standard network

Stabilisationthrough negative feedback loops inhibition

Complications with real data

Interventions might not be “ideal” owing to negative feedback loops.

Conclusions 1

• BNs and GGMs outperform RNs, most notably on Gaussian data.

• No significant difference between BNs and GGMs on observational data.

• For interventional data, BNs clearly outperform GGMs and RNs, especially when taking the edge direction (DGE score) rather than just the skeleton (UGE score) into account.

Conclusions 2

Performance on synthetic data better than on real data.

• Real data: more complex• Real interventions are not ideal• Errors in the gold-standard

network

How do we model feedback loops?

Unfolding in time

Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk

Documents

Adriano CIP final

Honoring Joel - Adriano Barra

-Leonela Adriano -Hector Osorio

HOTEL / APPARTMENTS - Ryanair · HOTEL / APPARTMENTS SHUTTLE ZONE ARRIVAL / DROP-OFF POINT DEPARTURE / PICK-UP POINT PICK-UP TIME (H:Min) ADRIANO TO2 Adriano Hotel (Main Door) Adriano

Proyecto integrado adriano (1)

Adriano Matrix Labor

Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br

primovittoriacolonna@gmail · - Adriano Celentano & Mina - Brivido Felino (cori).mp3 5039 KB - Adriano Celentano & Mina - Brivido Felino(Voce Femminile).mp3 3346 KB - Adriano Celentano

Archivio Adriano Amore

Fuzzy Sets - Hedges. Adriano Joaquim de Oliveira Cruz – NCE e IM, UFRJ adriano@nce.ufrj.br

TAM nas Nuvens Adriano Gambarini

Valino vs. Adriano

2017 CV Adriano Cancellari ita. · 2018. 4. 25. · Microsoft Word - 2017 CV Adriano Cancellari ita. Author: Adriano Created Date: 8/9/2017 3:33:07 PM

Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics

Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ adriano@nce.ufrj.br

Adriano Adelson Costa

Linguistic Descriptions Adriano Joaquim de Oliveira Cruz NCE e IM/UFRJ adriano@nce.ufrj.br © 2003

Anthony Birley, Adriano

Neuro-Fuzzy Control Adriano Joaquim de Oliveira Cruz NCE/UFRJ adriano@nce.ufrj.br

O HOLY NIGHT - Adriano Secco Music fileO HOLY NIGHT (CANTIQUE DE NOËL) Adagio Adagio Adolphe Adam (1803 - 1856) Arr. Adriano Secco Arranged and edited by Adriano Secco ()