View
40
Download
3
Category
Tags:
Preview:
DESCRIPTION
A Sparsification Approach for Temporal Graphical Model Decomposition. Ning Ruan Kent State University. Joint work with Ruoming Jin (KSU), Victor Lee (KSU) and Kun Huang (OSU). Motivation: Financial Markets. Fluorescence Counts. Protein-Protein Interaction. - PowerPoint PPT Presentation
Citation preview
A Sparsification Approach for Temporal Graphical Model
Decomposition
Ning Ruan Kent State University
Joint work with Ruoming Jin (KSU), Victor Lee (KSU) and Kun Huang (OSU)
Motivation: Financial Markets
Motivation: Biological Systems
3
Microarray time series profileProtein-Protein Interaction
Fluorescence Counts
4
Vector Autoregression• Univariate Autoregression is self-regression for a time-
series
• VAR is the multivariate extension of autoregression
T
u
tutXutX1
)()()()(
T
u
tutut1
)()()()( XΦX
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
(0) (1) (2) (3) (4) ( )(0) (1) (2) (3) (4) ( )(0) (1) (2) (3) (4) ( )
(0) (1) (2) (3) (4) ( )m m m m m m
x x x x x x Tx x x x x x Tx x x x x x T
x x x x x x T
0t= 1 2 3 4 T
5
Granger Causality• Goal: reveal causal relationship between two
univariate time series.– Y is Granger causal for X at time t if Xt-1 and Yt-1
together are a better predictor for Xt than Xt-1 alone.– i.e., compare the magnitude of error ε(t) vs. ε′(t)
)()]()([)(
.
)()]([)(
1
1
tutYutXtX
vs
tutXtX
ut
T
uut
T
uut
Temporal Graphical Modeling
• Recover the causal structure among a group of relevant time series
X1
X2
X3
X4
X5
X6
X7
X8 temporal graphical model
X1
X3
X2
X5
X4
X7 X6
X8
Φ12
The Problem• Given a temporal graphical model, can we
decompose it to get a simpler global view of the interactions among relevant time series?
How to interpret these How to interpret these causal relationshipscausal relationships??????
Extra Benefit
X1
X2
X3
X4
X5
X6
X7
X8
Clustering based on similarity
Consider time series clustering from a new perspective!
X1 X2 X8X7X6X5X4X3
X1 X3 X8X7X6X5X4X2
X1
X3
X2
X5
X4
X7 X6
X8
Clustered Regression Coefficient Matrix
• Vector Autoregression Model
– Φ(u) is a NxN coefficient matrix• Clustered Regression Coefficient Matrix
T
u
tutut1
)()()()( XΦX
)(00
0)(000)(
)( 2
1
u
uu
u
K
1) ifΦ(u)ij≠0,then time series i and j are in the same cluster
2) if time series i and j are not in the same cluster,then Φ(u)ij=0
submatrix
Temporal Graphical Model Decomposition Cost
• Goal: preserve prediction accuracy while reducing representation cost
• Given a temporal graphical model, the cost for model decomposition is
• Problem– Tend to group all time series into one cluster
)||)(||(||)()()(|| 2
1
2
1
uutXutXL
t
T
u
prediction error L2 penalty
Refined Cost for Decomposition• Balance size of clusters
– C is NxK membership matrix• Overall cost is the sum of three parts
• Optimal Decomposition Problem– Find a cluster membership matrix C and its
regression coefficient matrix Φ such that the cost for decomposition is minimal
))(()||)(||(||)()()(|| 2
1
2
1
CCtruutXutX TL
t
T
u
k i
ikT CCCtr 2)()(
prediction error L2 penalty size constraint
1 0 01 0 00 1 00 0 1
X2
C1
Hardness of Decomposition Problem
• Combined integer (membership matrix) and numerical (regression coefficient matrix) optimization problem
• Large number of unknown variables – NxK variables in membership matrix– NxN variables in regression coefficient matrix
Basic Idea for Iterative Optimization Algorithm
• Relax binary membership matrix C to probabilistic membership matrix P
• Optimize membership matrix while fixing regression coefficient matrix
• Optimize regression coefficient matrix while fixing membership matrix
• Employ two optimization steps iteratively to get a local optimal solution
Overview of Iterative Optimization Algorithm
Time Series Data
Temporal Graphical Model
Optimize cluster membership matrix
Quasi-Newton Method
Optimize regression coefficient matrix
Generalized ridge regression
Step 1 Step 2
Step 1: Optimize Membership Matrix
• Apply Lagrange multiplier method:
• Quasi-Newton method– Approximate Hessian matrix by iteratively
updating
cost( ) ( ( | ) 1)ii k
F P p k i
( 1) ( )( ) ( ) ( )
( 1) ( )( , )
n nn n n
n n
P PH F P
Step 2: Optimize Regression Coefficient Matrix
• Decompose cost functions into N subfunctions
• Generalized Ridge Regression
– yk is a vector related with P and X (length L)– Xk is a matrix related with P and X (size LxN)k=1, traditional ridge regression
iiTi
k
Tikk
TTikki MXyXyF )()(
constant
1
costN
ii
F
Complexity Analysis
Step 1 is the computational bottleneck of entire algorithm
NxK+N
NxK
+N
Update Hessian Matrix takes 2( ( ) )O k NK N
1 0 0 7 0
5 0 5 0 6
8 0 2 0 3
0 3 0 1 2
4 0 6 0 0
Compute coefficient matrix 3( )iO RN
NNxK
Basic Idea for Scalable Approach
• Utilize variable dependence relationship to optimize each variable (or a small number of variables) independently, assuming other relationships are fixed
• Convert the problem to a Maximal Weight Independent Set (MWIS) problem
Experiments: Synthetic Data• Synthetic data generator
– Generate community-based graph as underlying temporal graphical model [Girvan and Newman 05]
– Assign random weights to graphical model and generate time series data using recursive matrix multiplication [Arnold et al. 07]
• Decomposition Accuracy– Find a matching between clustering results and
ground-truth clusters such that the number of intersected variables are maximal
– The number of intersected variables over total number of variables is decomposition accuracy
Experiments: Synthetic Data (cont.)
• Applied algorithms– Iterative optimization algorithm based on Quasi-
Newton method (newton)– Iterative optimization algorithm based on MWIS
method (mwis)– Benchmark 1: Pearson correlation test to generate
temporal graphical model, and Ncut [Shi00] for clustering (Cor_Ncut)
– Benchmark 2: directed spectral clustering [Zhou05] on ground-truth temporal graphical model (Dcut)
Experimental Results: Synthetic• On average, newton is
better than Cor_Ncut and Dcut by 27% and 32%, respectively
• On average, mwis is better than Cor_Ncut and Dcut by 24% and 29%, respectively
Experimental Results: Synthetic
mwis is better than Cor_Ncut by an average of 30%
mwis is better than Dcut by an average of 52%
Experiment: Real Data• Data
– Annual GDP growth rate (downloaded from http://www.ers.usda.gov/Data/Macroeconomics)
– 192 countries• 4 Time periods
– 1969-1979– 1980-1989– 1990-1999– 1998-2007
• Hierarchically bipartition into 6 or 7 clusters
Experimental Result: Real Data
Summary• We formulate a novel objective function for the
decomposition problem in temporal graphical modeling.
• We introduce an iterative optimization approach utilizing Quasi-Newton method and generalized ridge regression.
• We employ a maximum weight independent set based approach to speed up the Quasi-Newton method.
• The experimental results demonstrate the effective and efficiency of our approaches.
Thank youThank you
Recommended