BIGDATA Workshop

8/22/2019 BIGDATA Workshop

1/94

Big Challenges with Big Data

in Life Sciences

Shankar Subramaniam

UC San Diego


2/94

The Digital Human


3/94

A Super-Moores Law

Adapted from Lincoln Stein 2012
http://www.google.com/url?sa=i&rct=j&q=moores+law+in+genomics+lincoln+stein&source=images&cd=&cad=rja&docid=CUa75_qiud_w3M&tbnid=BLVNmC10CilvcM:&ved=0CAUQjRw&url=http%3A%2F%2Fivory.idyll.org%2Fblog%2Fcloud-not-the-solution.html&ei=nCpGUbLYFIugqQHT-ICoDg&bvm=bv.43828540,d.aWc&psig=AFQjCNG2Bd86eAp_WlsrdIWztvI_qm72tw&ust=1363639319309795


4/94

The Phenotypic Readout


5/94

Data to Networks to Biology


6/94

NETWORK RECONSTRUCTION Data-driven network reconstruction of biological

systems Derive relationships between input/output data

Represent the relationships as a network

Inverse Problem: Data-driven Network Reconstruction

Experiments/Measurements


7/94

Network ReconstructionsReverse Engineering of biological networks

Reverse engineering of biological networks:

- Structural identification: to ascertain network structure ortopology.

- Identification of dynamics to determine interaction details.

Main approaches:

- Statistical methods

-Simulation methods

- Optimization methods

- Regression techniques

- Clustering


8/94

Network Reconstruction ofDynamic Biological Systems:Doubly Penalized LASSO

Behrang Asadi*, Mano R. Maurya*,

Daniel Tartakovsky, Shankar Subramaniam

Department of BioengineeringUniversity of California, San Diego

NSF grants (STC-0939370, DBI-0641037 and DBI-0835541)

NIH grants 5 R33 HL087375-02* Equal effort


9/94

APPLICATIONPhosphoprotein signaling and cytokine measurements in RAW

264.7 macrophage cells.


10/94

MOTIVATION FOR THE NOVEL METHOD

Various methods

Regression-based approaches (least-squares) with statisticalsignificance testing of coefficients

Dimensionality-reduction to handle correlation: PCR and PLS

Optimization/Shrinkage (penalty)-based approach: LASSO

Partial-correlation and probabilistic model/Bayesian-based Different methods have distinct

advantages/disadvantages

Can we benefit by combining the methods?

Compensate for the disadvantages

A novel method: Doubly Penalized Linear Absolute

Shrinkage and Selection Operator (DPLASSO)

Incorporate both statistical significant testing andShrinkage


11/94

LINEAR REGRESSION

Goal: Building a linear-relationship based model

X: input data (m samples by n inputs), zero mean, unit standard deviation

y: output data (m samples by 1 output column), zero-mean

b: model coefficients: translates into the edges in the network

e: normal random noise with zero mean

Ordinary Least Squares solution:

Formulation for dynamic systems:

2 arg min{ ( - ) ( - )}Te b y Xb y Xb-1

( )T T

b X X X y

),0(~ Nee;Xby

),0(~)( Nttdt

dee;Xb

XXXy


12/94

Most coefficients non-zero, a mathematical artifact

Perform statistical significance testing Compute the standard deviation on the coefficients

Ratio

Coefficient is significant (different from zero) if:

Edges in the network graph represents the coefficients.

STATISTICAL SIGNIFICANCE TESTING

* 2

cov( )

T

b bb

y y

, , , ,/ij k ij k ij k r b b

tinv(1 / 2, )

, 1 confidence level

ijr v

v DOF

* Krmer, Nicole, and Masashi Sugiyama. "The degrees of freedom of partial least squares regression." Journal of the AmericanStatistical Association106.494 (2011): 697-705.

1);/())((:SquaresFor Least 2/11, nmvvmRMSEXXdiag LSTLSb

mmyystdyym

RMSE piim

i piiLS/)1()()(

1,1

2

,


13/94

Partial least squares finds direction in the X space that explainsthe maximum variance direction in the Y space

PLS regression is used when the number of observations per

variable is low and/or collinearity exists among X values

Requires iterative algorithm: NIPALS, SIMPLS, etc

Statistical significance testing is iterative

CORRELATED INPUTS: PLS

T

T

0

X=TP +E

Y=UQ +F

Y=XB+B

* H. WOLD, (1975), Soft modelling by latent variables; the non-linear iterative partial least squares approach, inPerspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett, J. Gani, ed., Academic Press, London.


14/94

LASSO

Shrinkage version of the Ordinary Least Squares, subject to

L-1 penalty constraint (the sum of the absolute value of the

coefficients should be less than a threshold)

Where represents the full least square estimates

0 < t < 1: causes the shrinkage

The LASSO estimator is then defined as:

* Tibshirani, R.: Regression shrinkage and selection via the Lasso, J. Roy. Stat. Soc. B Met., 1996, 58, (1), pp. 267288

CostFunction

L-1Constraint

j

j

j

j

N

i j

ijji

btb

xbbybb

0

1

200

subject to

)(argmin),(

0b


15/94

Noise and Missing Data

More systematic comparison needed withrespect to:1. Noise: Level, Type

2. Size (dimension)

3. Level of missing data4. Collinearity or dependency among input channels

5. Missing data

6. Nonlinearity between inputs/outputs and nonlineardependency

7. Time-series inputs(/outputs) and dynamicstructure


16/94

METHODS

Linear Matrix Inequalities (LMI)*

Converts a nonlinear optimization problem into a linearoptimization problem.

Congruence transformation:

Pre-existing knowledge of the system (e.g. ) can be added inthe form of LMI constraints:

Threshold the coefficients:

13 210 , 0a a

* [Cosentino, C., et al., IET Systems Biology, 2007. 1(3): p. 164-173]

min( ) / ( - )( - )n p

T

m mB

e s t e

Y Xb Y Xb I

-0

( ) -

m m

T

p p

e

I Y Xb

Y - Xb I

( )0T T Ti j j iv u u v B B0,

1,

r

i

r

v r i

v v r i

0,

1,

r

i

r

u r iu u r i

.. :2 2

/ij ij i jb b b b


17/94

METRICS

Metrics for comparing the methods

o Reconstruction from 80% of datasets and 20% for validation

o RMSE on the test set, and the number and the identity of the significant

predictors as the basic metric to evaluate the performance of each method1. Fractional error in the estimating the parameters

2. Sensitivity, specificity, G, accuracy

,,

,

1method j

frac jtrue j

bb mean

b

parameters smaller than 10% of the standard deviation ofall parameter values were set to 0 when generating thesynthetic data

:

:

:

TN TP Accuracy

TN TP FN FP

TPSensitivity

TP FN

TNSpecificity

TN FP

TP : True PositiveFP : False PositiveTN : True NegativeFN : False Negative


18/94

RESULTS: DATA SETS

Data sets for benchmarking: Two data sets

1. First set: experimental data measured on

macrophage cells (Phosphoprotein (PP) vsCytokine)*

2. Second sets consist of synthetic datagenerated in Matlab. We build the model using80% of the data-set (called training set) anduse the rest of data-set to validate the model(called test set).

* [Pradervand, S., M.R. Maurya, and S. Subramaniam, Genome Biology, 2006. 7(2): p. R11].


19/94

RESULTS: PP-Cytokine Data Set

Schematic representation of Phosphoprotein (PP) vsCytokine

- Signals were transmitted through22 recorded signaling proteins and

other pathways (unmeasuredpathways).

- Only measured pathwayscontributed to the analysis

Schematic graphs from:

[Pradervand, S., M.R. Maurya, and S. Subramaniam, Genome Biology, 2006. 7(2): p. R11].


20/94

PP-CYTOKINE DATASET

Measurements of phosphoproteins in response to LPS

Courtesy: AfCS


21/94

Measurements of cytokines in response toLPS

~ 250 such datasets


22/94

RESULTS: COMPARISON

Comparison on synthetic noisy data The methods are applied on synthetic data with 22 inputs and 1 output.

The true coefficients for the inputs (about 1/3rd) are made zero totest the methods if they identify them as insignificant.

Effect of noise levelFour outputs with 5, 10, 20 and 40% noise levels, respectively, aregenerated from the noise-free (true) output.

Effect of noise type

Three outputs with White, t-distributed, and uniform noise types,respectively are generated from the noise-free (true) output


23/94

RESULTS: COMPARISONVariability between realizations of data with white noisePCR, LASSO, and LMIare used to identify significant predictors for1000 input-output pairs.

Histograms of the coefficients in the three significant predictorscommon to the three methods:

Method Predictor # 1 10 11

True value -3.40 5.82 -6.95

PCR Mean -3.81 4.73 -6.06

Std. 0.33 0.32 0.32

Frac. Err. in mean 0.12 0.19 0.13

LASSO Mean -2.82 4.48 -5.62

Std. 0.34 0.32 0.33

Frac. Err. in mean 0.17 0.23 0.19

LMI Mean -3.70 4.74 -6.34

Std. 0.34 0.32 0.34

Frac. Err. in mean 0.09 0.18 0.09

Mean and standard deviation in the histograms ofthe coefficients computed with PCR, LASSO, and

LMI.


24/94

RESULTS: COMPARISON Comparison of outcome of different methods on the real data

Different methods identified unique sets of common and distinctpredictors for each output

Graphical illustration of methods PCR, LASSO, and LMI in detection of

significant predictors for output IL-6 in PP/cytokine experimental dataset

Only the PCRmethod detectsthe true inputcAMP

zone I providesvalidation and ithighlights thecommon output ofall the methods


25/94

RESULTS: SUMMARY

Comparison with respect to different noise types: LASSO is the most robust methods for different noise types.

Missing data RMSE: LASSO less deviation, more robust.

Collinearity:

PCR less deviation against noise level, better accuracy and Gwithincreasing noise level.


26/94

A COMPARISON (Asadi, et al., 2012)Methods / Criteria PCR LASSO LMI

Increasing Noise

RMSE

Score= (average RMSE across different noise levels for LS)/(average RMSE across different noise levels

for the chosen method)

/ 0.68

degrades gradually

with level of noise

/ 0.56 / 0.94

Standard deviation and error in mean of Coefficients.

Score = 1average (fractional error in mean(10,12,20) + (std(10,12,20)/ |true associated coefficients|) ) / 0.53 / 0.47 / 0.55

Acc./G

Score = average accuracy across different noise levels for chosen method (white noise) / 0.70 / 0.87

/ 0.91

at high noise all

similar

Fractional Error in estimating the parameters

Score = 1- average fractional error in estimating the coefficients across different noise levels for chosen

method (white noise)

/ 0.81 / .55 / 0.78

Types of noise


Score = 1- average fractional error in estimating the coefficients across different noise levels and different

noise types (20% noise level)

/ 0.80 / 0.56 / 0.79

Accuracy and G

Score = average accuracy across different noise levels and different noise types / 0.71 / 0.87 / 0.91

Dimension ratio / Size


Score = 1- average fractional error in estimating the coefficients across different noise levels and different

ratios (m/n = 100/25, 100/50, 400/100)

/ 0.77 / 0.53 / 0.75

Accuracy and G

Score = average accuracy across different white noise levels and different ratios (m/n = 100/25, 100/50,400/100)

/ 0.66

/ 0.83

/ 0.90


27/94

DPLASSO

Doubly Penalized Least AbsoluteShrinkage and Selection Operator


28/94

OUR APPROACH: DPLASSO

Reconstructed

Network

1 3 5 6 7

y = Xb +

: , , , , ,...B b b b b b

Statistical

Significant Testing

PLS

1 2 3 4 5 6 7 8: , , , , , , , , ...

: 0, 1 , 0 , 1 , 0 , 1 , 0, 1 ,...

B b b b b b b b b

W

LASSO

1, 2 3 4 5 6 7 8: , , , , , , , ...B b b b b b b b b

Model

y = Xb +


29/94

Our approach: DPLASSO includes two parameterselection layers:

Layer 1 (supervisory layer):

Partial Least Squares (PLS)

Statistical significance testing

Layer 2 (lower layer):

LASSO with extra weights on less informative model parameters

derived in layer 1

Retain significant predictors and set the remaining small coefficients to

zero

DPLASSO WORK FLOW

2

1,..., 1,...,

arg min{ ( - ) ( - )}

/

T

j j j j j

LS

ij ij ij ij

i p i p

e

s t w b t w b

b y Xb y Xb

wij 0 bij is PLS- significant

1 otherwise


30/94

DPLASSO: EXTENDED VERSION Smooth weights:

Layer 1 : Continuous significance score (versus binary):

Mapping function (logistic significance score):

Layer 2:

Continuous weight vector (versus fuzzy weight vector)

-tinv(1 / 2, )

, 1 confidence level

i i

PLS

r v

v DOF

( )

1( )

1 ii is

e

2

1,..., 1,...,

argmin{ ( - ) ( - )} , /T LSj j j j j i ij i iji p i p

e s t w b t w b

b y Xb y Xb

-5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(Significance Score)

s() (Significance Score)

w() (Weight function)

15.05.0)(00:tscoefficienantinsignific5.001)(5.00:tscoefficientsignifican

),()(1

iiii

iiii

iiiii

wsws

ssw

Tuning parameter


31/94

APPLICATIONS

1. Synthetic (random) networks: Datasetsgenerated in Matlab

2. Biological dataset: Saccharomyces

cerevisiae - cell cycle model


32/94

SYNTHETIC (RANDOM) NETWORKS

Datasets generated in Matlab using:

Linear dynamic system

Dominant poles/Eigen values () ranges [-2,0]

Lyapunov stable

Informal definition from wikipedia: if all solutions of the

dynamical system that start out near an equilibrium point xestay near xe forever, then the system is Lyapunov stable.

Zero-input/Excited-state release condition

5% measurement (white) noise.

),0(~)( Nttdt

dee;Xb

XXXy


33/94

Two metrics to evaluate the performance of DPLASSO1. Sensitivity, Specificity, G (Geometric mean of Sensitivity and

Specificity), Accuracy

2. The root-mean-squared error (RMSE) of prediction

METRICS

TP : True Positive

FP : False Positive

TN : True NegativeFN : False Negative

2

,

1

1( )

m

i i p

i

RMSE y ym

Accuracy TNTP

TNTPFNFP

SensitivityTP

TPFN

SpecificityTN

TNFP

Precision TP

TPFP


34/94

TUNING

Tuning shrinkage parameter for DPLASSO

The shrinkage parameters in LASSO level (threshold t) via k-foldcross-validation (k= 10) on associated dataset

Validation error versus selection threshold t for

DPLASSO on synthetic data set

Rule of thumb after cross

validations:

Example:

Optimal value of the tuning

parameter for a network with 65%

connectivity roughly equal to 0.65

PERFORMANCE COMPARISON ACCURACY


35/94

PERFORMANCE COMPARISON: ACCURACY

0

0.51

1.5

-4

-2

0

20.5

0.55

0.6

0.65

0.7

Accuracy

LASSODPLASSO

PLS

00.5

11.5

-4

-2

0

20.2

0.4

0.6

0.8

1

Accuracy

LASSO

DPLASSO

PLS

00.5

11.5

-4

-2

0

20.2

0.4

0.6

0.8

1

Accuracy

LASSO

DPLASSO

PLS

00.5

11.5

-4

-2

0

20

0.2

0.4

0.6

0.8

Accuracy

LASSO

DPLASSO

PLS

Density 5%

Density 10%

Density 50%Density 20%

Network Size 20MC 10Noise 5%

PLS Better performance

DPLASSO provides good compromise between LASSO and PLS in terms of

accuracy for different network densities

PERFORMANCE COMPARISON SENSITIVITY


36/94

PERFORMANCE COMPARISON: SENSITIVITY

0

0.51

1.5

-4

-2

0

20.4

0.6

0.8

1

Sensitivity

LASSO

DPLASSO

PLS

0 0.5

11.5

-4

-2

0

20.4

0.6

0.8

1

Sensitivity

LASSO

DPLASSO

PLS

00.5

1

1.5

-4

-2

0

20.4

0.6

0.8

1

Sensitivity

LASSO

DPLASSO

PLS

00.5

1

1.5

-4

-2

0

20.4

0.6

0.8

1

Sensitivity

LASSO

DPLASSO

PLS




LASSO has better performance


Sensitivity for different network densities

PERFORMANCE COMPARISON SPECIFICITY


37/94

PERFORMANCE COMPARISON: SPECIFICITY

00.5

1

1.5

-4

-2

0

20

0.2

0.4

0.6

0.8

Specificity

LASSO

DPLASSO

PLS

00.5

1

1.5

-4

-2

0

20

0.2

0.4

0.6

0.8

Specificity

LASSO

DPLASSO

PLS

00.5

11.5

-4

-2

0

20

0.2

0.4

0.6

0.8

Specificity

LASSO

DPLASSO

PLS

00.5

11.5

-4

-2

0

20

0.2

0.4

0.6

0.8

Specificity

LASSO

DPLASSO

PLS

Density 50% Density 20%




specificity for different network densities.

PERFORMANCE COMPARISON NETWORK SIZE


38/94

PERFORMANCE COMPARISON: NETWORK-SIZE


accuracy for different network sizes


sensitivity (not shown) for different network sizes

00.5

11.5

-4

-2

0

20.2

0.4

0.6

0.8

1

LASSO

DPLASSO

PLS

0

0.51

1.5

-4

-2

0

20.2

0.4

0.6

0.8

1

LASSO

DPLASSO

PLS

00.5

11.5

-4

-2

0

20.2

0.4

0.6

0.8

1

LASSO

DPLASSO

PLS

Network Size: 10* 100 potential connections



Acc

uracy

Acc

uracy

Acc

uracy

ROC CURVE DYNAMICS AND WEIGHTINGS


39/94

ROC CURVE vs. DYNAMICS AND WEIGHTINGS

DPLASSO exhibits better performance for networks with slow dynamics.

The parameter in DPLASSO can be adjusted to improve performance

for fast dynamic networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

ROC for variable (the closer to origin the larger - Density: 20% MC: 10 Size: 50)

Specificity

Sensitivity

LASSO

DPLASSO

PLS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1ROC for variable (the larger the larger - Density: 20% MC: 10 Size: 50)

Specificity

Sensitivity

LASSO

DPLASSO

PLS


40/94

YEAST CELL DIVISION

Experimental dataset generated via well-known nonlinear model of a

cell division cycle of fission yeast. The model is dynamic with 9 state

variables.

* Novak, Bela, et al. "Mathematical model of the cell division cycle of fissionyeast." Chaos: An Interdisciplinary Journal of Nonlinear Science 11.1 (2001): 277-286.


41/94

CELL DIVISION CYCLE

True Network (Cell Division Cycle)

PLS DPLASSO LASSO

Missing in DPLASSO!


42/94

RECONSTRUCTION PERFORMANCE

MethodMetric

Accuracy Sensitivity Specificity SD RMSE/MeanLASSO 0.31 0.92 0.16 0.14

DPLASSO 0.56 0.73 0.52 0.08

PLS 0.60 0.67 0.63 0.09

Case Study II: Cell Division Cycle, Average over value

Case Study I: 10 Monte Carlo Simulations, Size 20, Average over different , , network

density, and Monte Carlo sample datasets

MethodMetric

Accuracy

Sensitivity

Specificity

SD RMSE/MeanLASSO 0.39 0.90 0.05 0.06

DPLASSO 0.52 0.90 0.34 0.07

PLS 0.59 0.80 0.20 0.07


43/94

CONCLUSION

Novel method, Doubly Penalized Linear Absolute Shrinkage and

Selection Operator (DPLASSO), to reconstruct dynamic biologicalnetworks

Based on integration of significance testing of coefficients and optimization

Smoothening function to trade off between PLS and LASSO

Simulation results on synthetic datasets

DPLASSO provides good compromise between PLS and LASSO in terms

of accuracy and sensitivity for

Different network densities

Different network sizes

For biological dataset

DPLASSO best in terms of sensitivity

DPLASSO good compromise between LASSO and PLS in terms of

accuracy, specificity and lift


44/94

Information TheoryMethods

Farzaneh Farangmehr


45/94

Mutual Information

It gives us a metric that is indicative of how much information from avariable can be obtained to predict the behavior of the other variable .

The higher the mutual information, the more similar are the two profiles.

For two discrete random variables of X={x1,..,xn} and Y={y1,ym}:

p(xi,yj) is the joint probability of xi and yjP(xi) and p(yj) are marginal probability of xi and yj

m

j

n

i ji

ji

jiypxp

yxpyxpYXI

1 1 )()(

),(log),();(

I f ti th ti l h


46/94

Information theoretical approachShannon theory

Hartleys conceptual framework of information relates the information of a randomvariable with its probability.

Shannon defined entropy, H, of a random variable X given a random sample in termsof its probability distribution:

Entropy is a good measure of randomness or uncertainty.

Shannon defines mutual information as the amount of information about a randomvariable Xthat can be obtained by observing another random variable Y:

},...,{ 1 nxx

)](log[)()()()(11

i

n

i

ii

n

i

ixPxPxIxPXH

),()()()()(),()()(),( XYIYXHXHXYHYHYXHYHXHYXI


47/94

Mutual information networks

X={x1, ,xi} Y={y1 , ,yj}

The ultimate goal is to find the best model that maps X Y- The general definition: Y= f(X)+U. In linear cases: Y=[A]X+U where [A] is a matrix

defines the linear dependency of inputs and outputs

Information theory maps inputs to outputs (both linear and non-linear models)by using the mutual information:

m

j

n

i ji

ji

jiypxp

yxpyxpYXI

1 1 )()(

),(log),();(


48/94

Mutual information networks The entire framework of network reconstruction using information theory

has two stages:

1-Mutual information measurements

2- The selection of a proper threshold.

Mutual information networks rely on the measurement of the mutualinformation matrix (MIM). MIM is a square matrix whose elements (MIMij= I(Xi;Yj)) are the mutual information between Xi and Yj.

Choosing a proper threshold is a non-trivial problem. The usual way is toperform permutations of expression of measurements many times andrecalculate a distribution of the mutual information for each permutation.Then distributions are averaged and the good choice for the threshold isthe largest mutual information value in the averaged permuteddistribution.

M t l i f ti t k


49/94

Mutual information networksData Processing Inequality (DPI)

The DPI for biological networks states that if genesg1 andg3 interactonly through a third gene,g2, then:

Checking against the DPI may identify those gene pairs which are notdirectly dependent even if

)],();,(min[),( 322131 ggIggIggI

)()(),( jiji gpgpggp


50/94

ARACNe algorithm

ARACNE flowchart [Califano and coworkers]

ARACNE stands for Algorithmfor the Reconstruction ofAccurate Cellular NEtworks[25].

ARACNE identifies candidateinteractions by estimatingpairwise gene expression profilemutual information, I(gi, gj) andthen filter MIs using anappropriate threshold, I0,

computed for a specific p-value,p0. In the second step, ARACNeremoves the vast majority ofindirect connections using theData Processing Inequality(DPI).


51/94

Protein-Cytokine

Network in

MacrophageActivation


52/94

Application to Protein-Cytokine Network Reconstruction

Release of immune-regulatory Cytokines during inflammatory response is medicated by acomplex signaling network [45].

Current knowledge does not provide a complete picture of these signaling components.

22 Signaling proteins responsible for cytokine releases:

cAMP, AKT, ERK1, ERK2, Ezr/Rdx, GSK3A, GSK3B, JNK lg, JNK sh, MSN, p38,p40Phox, NFkB p65, PKCd, PKCmu2,RSK, Rps6 , SMAD2, STAT1a, STAT1b, STAT3,STAT5

7 released cytokines (as signal receivers):

G-CSF, IL-1a, IL-6, IL-10, MIP-1a, RANTES, TNFa

we developed an information theoretic-based model that derives the responses of sevenCytokines from the activation of twenty two signaling Phosphoproteins in RAW 264.7

macrophages.

This model captured most of known signaling components involved in Cytokine releases and wasable to reasonably predict potentially important novel signaling components.

Protein Cytokine Network Reconstruction


53/94

Protein-Cytokine Network ReconstructionMI Estimation using KDE

- Given a random sample for a univariate random variable Xwith an unknowndensity a kernel density estimator (KDE) estimates the shape of this function as:

assuming Gaussian kernels:

- Bivariate kernel density function of two random variables Xand Ygiven two randomsamples and :

-

Mutual information of X and Y using Kernel Density Estimation:

n =sample size; h=kernel width

},...,{ 1 nxxf

)(1

)(1

)(1 h

xxk

nhxxk

nxf ihi

n

i

h

n

i

i

h

xx

nhxf

12

2

2 2

)(exp

2

1)(

},...,{ 1 nxx },...,{ 1 nyy

n

i

ii

h

yyxx

nhyxf

1 2

22

2 2

)()(exp

2

1),(

n

j jj

jj

yfxf

yxf

nYXI

1 )()(

),(ln

1),(

P t in C t kin N t k R nst u ti n


54/94

Protein-Cytokine Network ReconstructionKernel bandwidth selection

There is not a universal way of choosing h and however the ranking of the MIsdepends only weakly on them.

The most common criterion used to select the optimal kernel width is to minimizeexpected risk function, also known as the mean integrated squared error (MISE):

Loss function (Integrated Squared Error) :

Unbiased Cross-validation approach select the kernel width that minimizes the lostfunction by minimizing:

where f(-i),h

(xi

) is the kernel density estimator using the bandwidth h at xi

obtainedafter removing ith observation.

constdxxfdxxfdxxfxfdxxf

dxxfxfhL

h

h

h

)(where)()()(2)(

)](()([)(

222

2

)(2

)()(),(1

2

i

hi

n

i

h xfn

dxxfhUCV

xxfxfEhMISE h d)]()([)( 2

Protein-Cytokine Network Reconstruction


55/94

Protein Cytokine Network ReconstructionThreshold Selection

Based on large deviation theory (extended to biological networks by ARACNE), the

probability that an empirical value of mutual information I is greater than I0,provided that its true value , is:

Where the bar denotes the true MI, N is the sample size and c is a constant. After taking thelogarithm of both sides of the above equation:

Therefore, lnPcan be fitted as a linear function of I0 and the slope of b, where b isproportional to the sample size N.

Using these results, for any given dataset with sample size N and a desired p-value,the corresponding threshold can be obtained.

0I

P(I> I

0I=

0) ~ e

- cNI0

0ln bIaP

Protein-Cytokine Network Reconstructionl d f k


56/94

Kernel density estimation of cytokines

Figure 3: The probability distribution ofseven released cytokines in macrophage 246.7using on Kernel density estimation (KDE)

Mutual information for all 22x7pairs of phosphoprotein-cytokine

from toll data (the upper bar) andnon-toll data (the lower bar).

Protein Cytokine Network Reconstruction


57/94

Protein-Cytokine Network ReconstructionProtein-Cytokine signaling networks

+ =

The topology of signaling protein-released cytokinesobtained from the non-Toll (A) and Toll (B) data.

A

B

Protein cytokine Network Reconstruction


58/94

Protein-cytokine Network ReconstructionSummary

This model successfully captures all known signaling

components involved in cytokine releases

It predicts two potentially new signaling componentsinvolved in releases of cytokines including: Ribosomal

S6 kinase on Tumor Necrosis Factor and RibosomalProtein S6 on Interleukin-10.

For MIP-1 and IL-10 with low coefficient of

determination data that lead to less precise linearthe information theoretical model shows advantageover linear methods such as PCR minimal model[Pradervand et al.] in capturing all known regulatory

components involved in cytokine releases.


59/94

Network reconstruction from time-course dataBackground: Time-delayed gene networks

Comes from the consideration that the expression of a gene at a certain timecould depend by the expression level of another gene at previous time point orat very few time points before.

The time-delayed gene regulation pattern in organisms is a common phenomenon

since:

If effect of geneg1 on geneg2 depends on an inducer,g3, that has to bebound first in order to be able to bind to the inhibition site ong2, therecan be a significant delay between the expression of geneg1 and itsobserved effect, i.e., the inhibition of geneg2.

Not all the genes that influence the expression level of a gene arenecessarily observable in one microarray experiment. It is quite possiblethat thereare not among the genes that are being monitored in theexperiment, or its function is currently unknown.

Network reconstruction from time-course data


60/94

The Algorithm

downstsuptssts iiiii eeoreeeICNA 00 //minarg)(

N t k t ti f ti d t


61/94

Network reconstruction from time-course dataAlgorithm



62/94

Network reconstruction from time-course dataThe flow diagram

Gene lists

Cluster into

n

subnetwork

Measure

sub-network

activities

Measure the influence

between flagged sub-

networks

Build Inflence matrixFind the

threshold

Remove

connections

below the

threshold

Apply DPI for

connections above

the threshold

Build the network

based on non-zero

elements of the

mutual information

matrix

Flag potentially

dependent sub-

networks by

measuring ICNA

The flow diagram of the information theoretic approach forbiological network reconstruction from time-course microarraydata by identifying the topology of functional sub-networks



63/94

Network reconstruction from time-course dataCase study: the yeast cell-cycle

The cell cycle consists of four distinct phases:

G0 (Gap 0) :A resting phase where the cell has left the cycle and has stopped dividing.

G1 (Gap 1) : Cells increase in size in Gap 1. The G1checkpointcontrol mechanism ensures thateverything is ready for DNA synthesis.

S1 (Synthesis): DNA replication occurs during this phase.

G2 (Gap 2): During the gap between DNA

synthesis and mitosis, the cell will

continue to grow. The G2checkpoint

control mechanism ensures that

everything is ready to enter the M

(mitosis) phase and divide.

M (Mitosis) : Cell growth stops at this stage and

cellular energy is focused on the orderly

division into two daughter cells. A checkpoint

in the middle of mitosis (Metaphase Checkpoint) ensures that the

cell is ready to complete cell division.

http://en.wikipedia.org/wiki/DNA_replicationhttp://en.wikipedia.org/wiki/DNA_replicationhttp://en.wikipedia.org/wiki/DNA_replicationhttp://en.wikipedia.org/wiki/DNA_replication


64/94

Network reconstruction from time-course dataCase study: the yeast cell-cycle

Data from Gene Expression Omnibus (GEO)

Culture synchronized by alpha factor arrest. samples taken every 7minutes as cells went through cell cycle.

Value type: Log ratio

5,981 genes, 7728 probes and 14 time points

94 Pathways from KEGG Pathways



65/94

Network reconstruction from time course dataCase study: the yeast cell-cycle

Thereconstructedfunctionalnetwork of yeastcell cycleobtained from

time-coursemicroarray data

Mutual information networks


66/94

Mutual information networksAdvantages and Limits

A major advantage of information theory is its nonparametric nature.Entropy does not require any assumptions about the distribution ofvariables [43].

It does not make any assumption about the linearity of the model forthe ease of computation.

It is applicable for time series data.

A high mutual information does not tell us anything about the directionof the relationship.


67/94

Time Varying Networks

Causality

Maryam Masnardi-Shirazi

Causal Inference of Time Varying


68/94

Causal Inference of Time-VaryingBiological Networks

Definition of Causality


69/94

Definition of Causality

Beyond Correlation: Causation


70/94

y

Idea: map a set of K time series to a directed graph with K nodeswhere an edge is placed from a to b if the past of a has an impact on

the future of b

How do we quantitatively do this in a general purpose manner?

G N i f C li


71/94

Grangers Notion of Causality

It is said that process X Granger Causes Process Y, if future values of Ycan be better predicted using the past values of X and Y than only using

past values of Y.

G C lit F l ti


72/94

Ganger Causality Formulation

There are many ways to formulate the notionof granger causality, some of which are:

- Information Theory and the concept of

Directed Information- Learning Theory

- Dynamic Bayesian Networks

- Vector Autoregressive Models (VAR)- Hypothesis Tests, e.g. t-test and F tests

Vector Autoregressive Model (VAR)


73/94

Vector Autoregressive Model (VAR)

Least Squares Estimation


74/94

Least Squares Estimation


75/94

Least Squares Estimation (Cont.)

P i th d t


76/94

Processing the data

Phosphoprotein two-ligand screen assay: RAW 264.7

There are 327 experiments from western blots processed withmixtures of phosphospecific antibodies. In all experiments, theeffects of single ligand and simultaneous ligand addition are

measured

Each experiment includes the fold change of Phosphoprotein attime points t=0, 1, 3, 10, 30 minutes

Data at time=30 minute is omitted, and data from t=0:10 isinterpolated by steps=1 min

Least Squares Estimation and Rank Deficiency of


77/94

Transformation Matrix

Exp.1

Exp.2

Exp. 327

All Y data

Exp.1

Exp.2

Exp. 327

All X data

N li i h d


78/94

Normalizing the data


79/94

Statistical Significance Test (Confidence Interval)

The Reconstructed Phosphoproteins Signaling


80/94

The Reconstructed Phosphoproteins SignalingNetwork

The network isreconstructed byestimating causalrelationships between allnodes

All the 21phosphoproteins arepresent and interactingwith one another

There are 122 edges inthis network

l d


81/94

Correlation and Causation

The conventional dictum that "correlation does notimply causation" means that correlation cannot beused to infer a causal relationship between thevariables

This does not mean that correlations cannot indicatethe potential existence of causal relations. However,the causes underlying the correlation, if any, may beindirect and unknown

Consequently, establishing a correlation between twovariables is not a sufficient condition to establish acausal relationship (in either direction).

C l ti d C lit i
http://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causation


82/94

Correlation and Causality comparison

Heat-map of the correlation matrix betweenthe input (X) and output (Y)

The reconstructed network consideringsignificant coefficients and their intersection

with connections having correlations higher than0.5

The conventional dictum that "correlation does not imply causation" means that correlation cannot be used to infer a causalrelationship between the variables. This dictum should not be taken to mean that correlations cannot indicate the potentialexistence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown.

Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causalrelationship (in either direction).

Correlation and Causality comparison (cont )
http://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causation


83/94

Correlation and Causality comparison (cont.)

Heat-map of the correlation matrix between

the input (X) and output (Y)

The reconstructed network consideringsignificant coefficients and their intersection

with connections having correlations higher than0.4

Validating our network


84/94

Validating our network

Identification ofCrosswalk between

phosphoproteinSignaling Pathways in

RAW 264.7Macrophage Cells

(Gupta et al., 2010)

The Reconstructed Phosphoproteins Signaling Network


85/94

The Reconstructed Phosphoproteins Signaling Networkfor t=0 to t=4 minutes

Heat-map of the correlation matrixbetween the input (X) and output (Y)

for t=0 to t=4 minutes

Intersection of Causal Coefficients withconnections with correlations higher than

0.4 for time t=0 to t=4 minutes

9 nodes15 edges



86/94






19 nodes51 edges



87/94






19 nodes56 edges

Time-Varying reconstructed Network


88/94

Time-Varying reconstructed Network

t=0 to 4 min t=3 to 7 min t=6 to 10 min

The Reconstructed Network for t=0 to t=4 minutes


89/94

The Reconstructed Network for t 0 to t 4 minuteswithout the presence of LPS as a Ligand

With LPS15 Edges

WithoutLPS

16 Edges

The Reconstructed Network for t=3 to t=7 minuteswithout the presence of LPS as a Ligand VS the


90/94

without the presence of LPS as a Ligand VS thepresence of all ligands

With all ligandsincluding LPS

(51 Edges)

Without LPS

(55 Edges)

The Reconstructed Network for t=6 to t=10 minutes without


91/94

The Reconstructed Network for t 6 to t 10 minutes withoutthe presence of LPS as a Ligand VS the presence of all ligands

With all ligandsincluding LPS

(56 Edges)

Without LPS

(66 Edges)

Time-Varying Network with LPS not present as aligand


92/94

g

t=0 to 4 min t=3 to 7 min t=6 to 10 min

Summary


93/94

Summary

Information theory methods can help in determining causal and time-dependent networks from time series data.

The granularity of the time course will be a factor in determining the

causal connections.

Such dynamical networks can be used to construct both linear andnonlinear models from data.


94/94

Documents

BIGDATA Workshop