Estimating Time-Varying Networks · Acknowledgments E. Xing L. Song A. Ahmed A. Parikh M. Kolar (Chicago Booth) Estimating Time-Varying Networks November 20, 2013 2

Estimating Time-Varying Networks

Mladen Kolar

The University of ChicagoBooth School of Business

November 20, 2013

Acknowledgments

E. Xing L. Song A. Ahmed A. Parikh

M. Kolar (Chicago Booth) Estimating Time-Varying Networks November 20, 2013 2

Networks are mathematical abstractions of complexsystems

Networks are useful for

visualization

discovery of regularitypatterns

exploratory analysis

. . .

of complex systems.


Interactions between variables are not always observable


Interactions between variables are not always observable

Data collected over a period oftime is easily accessible


Estimating time-varying networks


Talk Objective

How to recover changing interactions between objects from datacollected over time?

Challenges:

- Number of samples small

- Large number of objects

- Noisy data

- Data may contain missing values

- . . .


Talk Objective

How to recover changing interactions between objects from datacollected over time?

Challenges:

- Number of samples small

- Large number of objects

- Noisy data

- Data may contain missing values

- . . .


Outline

1 Estimating Conditional Independence Relationships

Representation – Markov NetworksEstimating Graph Structure

2 Time-Varying Networks

Smoothly Varying NetworksNetworks With Jumps

3 An Application

4 Some theoretical results


Outline





3 An Application



Outline





3 An Application



Outline





3 An Application



Outline





3 An Application



Markov Networks

Random vector X = (X1, . . . , Xp)′

Graph G = (V,E) with p nodes

- represents conditional independence relationships between nodes

Useful for exploring associations between measured variables

(a, b) 6∈ E ⇐⇒ Xa ⊥ Xb | Xab

(ab := V \a, b

)P[Xa | Xb, Xab] = P[Xa | Xab]

(Koller and Friedman, 2009)


Two Common Markov Networks

Gaussian Markov Network: X ∼ N (µ,Σ)

p(x) ∝ exp

(−1

2(x− µ)TΣ−1(x− µ)

)The precision matrix Ω = Σ−1 encodes both parameters and the graphstructure

∗ ∗ ∗ ∗ ∗ 0∗ ∗ ∗ ∗ ∗ 0∗ ∗ ∗ 0 0 0∗ ∗ 0 ∗ 0 0∗ ∗ 0 0 ∗ 00 0 0 0 0 0

1 2

3

45

6

(Koller and Friedman, 2009; Lauritzen, 1996)


Two Common Markov Networks

Gaussian Markov Network: X ∼ N (µ,Σ)

p(x) ∝ exp

(−1

2(x− µ)TΣ−1(x− µ)

)The precision matrix Ω = Σ−1 encodes both parameters and the graphstructure

Discrete Markov network: X ∈ −1, 1p (Ising model)

p(x; Θ) ∝ exp

∑a∈V

xaθaa +∑

a,b∈V×Vxaxbθab

Θ = (θab)ab encodes the conditional independence relationships

(Koller and Friedman, 2009; Lauritzen, 1996)


Structure Learning Problem

Given an i.i.d. sample Dn = xini=1 from a distribution P ∈ P

Learn the set of conditional independence relationships

G = G(Dn)

Gaussian Markov Networks (Drton and Perlman, 2007)

- Form the maximum likelihood estimator for the covariance matrix

- Test for zeros in the precision matrix

Discrete Markov Networks (Chickering, 1996)

- Hard to learn structure, since the log partition function cannot beevaluated efficiently





G = G(Dn)










G = G(Dn)







Structure Learning in High-Dimensions

Penalized Pseudo-Likelihood Estimation

- Neighborhood Selection

- Useful for learning the structure of Gaussian and discrete MarkovNetworks

θa = arg maxθa∈Rp

∑i∈[n]

γ(θa; xi)− λ||θa||1

Conditional likelihood: γ(θa; xi) = logP[xi,a | xi,a;θa]

(Meinshausen and Buhlmann, 2006)(Ravikumar, Wainwright, and Lafferty, 2009)


Neighborhood Selection

Local structure estimation


`(θa;Dn)−λ||θa||1

θ1 =(∗ ∗ 0 ∗ 0 0 0

)

Estimated neighborhood

Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8






θ1 =(∗ ∗ 0 ∗ 0 0 0

)


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

1






θ1 =(∗ ∗ 0 ∗ 0 0 0

)


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

1

θ12

θ13

θ14

θ15

θ16θ17

θ18






θ1 =(∗ ∗ 0 ∗ 0 0 0

)Estimated neighborhood

Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

1

θ12

θ13

θ15

θ14

θ16θ17

θ18






θ1 =(∗ ∗ 0 ∗ 0 0 0


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

1

θ12

θ13

θ15






θ1 =(∗ ∗ 0 ∗ 0 0 0


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

1






θ1 =(∗ ∗ 0 ∗ 0 0 0

)


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

2






θ1 =(∗ ∗ 0 ∗ 0 0 0

)


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8

2






θ1 =(∗ ∗ 0 ∗ 0 0 0

)


Na = b ∈ V | θab 6= 0

Na = 2, 3, 5

1

2

3

4

5

6

7

8


Properties of Neighborhood Selection

Graph structure can be recovered consistently

- provable guarantees in a high-dimensional setting- Meinshausen and Buhlmann (2006); Ravikumar, Wainwright, and Lafferty (2009)

Peng, Wang, Zhou, and Zhu (2009)

Fast estimation procedures

- efficient solvers for `1 penalized problems

- Beck and Teboulle (2009); Friedman, Hastie, and Tibshirani (2008)


Outline





3 An Application



Outline





3 An Application












Et = (a, b) ∈ V × V | θtab 6= 0



Et = (a, b) ∈ V × V | θtab 6= 0M. Kolar (Chicago Booth) Estimating Time-Varying Networks November 20, 2013 28

General Estimation Framework

Data: Dn = xt | xt ∼ P(θt;Gt)t∈Tn , Tn = 1/n, 2/n, . . . , 1

argmax `(Dn, θt)− pen(θt

)Loss: `(Dn, θt)

- measures the fit of model to data

Penalty: pen(θt

)- balances the complexity of model and the fit to data

- encodes structural assumptions about model class


Two scenarios

1 Smooth Networks

(Song et al., 2009)(Kolar et al., 2010)(Kolar and Xing, 2011)(Kolar and Xing, 2012c)

2 Networks With Jumps


Two scenarios

1 Smooth Networks




Two scenarios

1 Smooth Networks




Two scenarios

1 Smooth Networks(Song et al., 2009)(Kolar et al., 2010)(Kolar and Xing, 2011)(Kolar and Xing, 2012c)



Smoothly Evolving Networks

γ(θ; xt) = logP[xt,a | xt,a;θ]

wτ (t) =Kh(t− τ)∑t∈Tn Kh(t− τ)

Kolar, Song, Ahmed, and Xing (2010)

















Two scenarios

1 Smooth Networks


(Kolar et al., 2010)(Kolar, Song, and Xing, 2009)(Kolar and Xing, 2012a)


Two scenarios

1 Smooth Networks

2 Networks With Jumps(Kolar et al., 2010)(Kolar, Song, and Xing, 2009)(Kolar and Xing, 2012a)


Networks With Jumps

maxθtt∈Tn

∑t

γ(θt; xt)− λ1

∑t

||θt||1

− λ2

∑t

||θt − θt−1||2

Fused Penalty (Tibshirani et al., 2005)



θt12

θt13

θt1p

Networks With Jumps

maxθtt∈Tn

∑t

γ(θt; xt)− λ1

∑t

||θt||1

− λ2

∑t

||θt − θt−1||2




θt12

θt13

θt1p

Sparsity

Networks With Jumps

maxθtt∈Tn

∑t

γ(θt; xt)− λ1

∑t

||θt||1

− λ2

∑t

||θt − θt−1||2




θt12

θt13

θt1p

Sparsity

Structural Changes

Networks With Jumps

maxθtt∈Tn

∑t

γ(θt; xt)− λ1

∑t

||θt||1

− λ2

∑t

||θt − θt−1||2




θt12

θt13

θt1p

Sparsity

Structural Changes

Outline





3 An Application



Outline





3 An Application



Drosophila Life Cycle

Data from Arbeitman et al. (2002)

66 microarray measurements acrossfull life cycle

Four stages in the life cycle

- embryo

- larva

- pupal

- adult

Analyze subset of 588 genes relatedto development



Estimated Dynamic Network


molecular

function

biological

process

cellular

component

Transient Group Interactions


Known Gene Interactions






Outline





3 An Application



Outline





3 An Application



Theoretical Properties

Theorem (Kolar and Xing (2012c))

Under suitable technical assumptions the graph Gτ is recovered withexponentially high probability for any fixed point τ ∈ [0, 1].

Fisher information matrix:

Qτa := E[∇2 logPθτa [Xa|Xa]], a ∈ V, τ ∈ [0, 1]

- bounded eigenvalues

- incoherence condition

Smoothness: Σt = (σtab) are smooth functions of time

Kernel satisfies regularity conditions

























Parameters: λ √

log pn1/3 , h n−

13

Sparsity: s3 log pn2/3 = o(1) (s – maximal node degree)

Signal strength: θmin = mine∈Eτ |θτe | = Ω(√

s log pn1/3

)

P [graph not recovered] = O(exp

(−Cs−3nh+ C ′ log p

)) n,p→∞−−−−−→ 0





Parameters: λ √

log pn1/3 , h n−

13



s log pn1/3

)



)) n,p→∞−−−−−→ 0





Parameters: λ √

log pn1/3 , h n−

13



s log pn1/3

)



)) n,p→∞−−−−−→ 0





Parameters: λ √

log pn1/3 , h n−

13



s log pn1/3

)



)) n,p→∞−−−−−→ 0





Parameters: λ √

log pn1/3 , h n−

13



s log pn1/3

)



)) n,p→∞−−−−−→ 0


Simulation Results

15 20 25 30

0

20

40

60

80

100

Chain Graph

Scaled sample size n/(s4.5 log1.5(p))

Ha

mm

ing

dis

tan

ce

p = 60p = 100p = 140


Thank you!


References I

Arbeitman, M., E. Furlong, F. Imam, E. Johnson, B. Null, B. Baker, M. Krasnow,M. Scott, R. Davis, and K. White (2002). “Gene Expression During the Life Cycle ofDrosophila melanogaster”. In: Science 297, pp. 2270–2275.

Balakrishnan, S., M. Kolar, A. Rinaldo, and A. Singh (2012). “Recovering Block-structuredActivations Using Compressive Measurements”. In: arXiv preprint arXiv:1209.3431.

Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont (2008). “ModelSelection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian orBinary Data”. In: J. Mach. Learn. Res. 9, pp. 485–516. issn: 1533-7928.

Beck, A. and M. Teboulle (2009). “A fast iterative shrinkage-thresholding algorithm forlinear inverse problems”. In: SIAM Journal on Imaging Sciences 2.1, 183202.

Chickering, David Maxwell (1996). “Learning Bayesian Networks is NP-Complete”. In:Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag, pp. 121–130.

Drton, Mathias and Michael D. Perlman (2007). “Multiple Testing and Error Control inGaussian Graphical Model Selection”. In: Statistical Science 22.3, pp. 430–449. url:doi:10.1214/088342307000000113.

Friedman, J., T. Hastie, and R. Tibshirani (2008). “Regularization paths for generalizedlinear models via coordinate descent”. In: Department of Statistics, Stanford University,Tech. Rep.


doi:10.1214/088342307000000113

References II

Kolar, M., H. Liu, and E. P. Xing (Oct. 2012). “Graph Estimation From Multi-attributeData”. In: ArXiv e-prints. arXiv:1210.7665 [stat.ML].

Kolar, M. and J. Sharpnack (May 2012). “Variance function estimation inhigh-dimensions”. In: ArXiv e-prints. arXiv:1205.4770 [stat.ML].

Kolar, M. and E.P. Xing (2011). “On Time Varying Undirected Graphs”. In: Proceedings ofthe 14th International Conference on Artifical Intelligence and Statistics AISTATS.

— (2012a). “Estimating Networks With Jumps”. In: Electronic Journal of Statistics.

— (2012b). “Estimating Sparse Precision Matrices from Data with Missing Values”. In:

Kolar, M., S. Balakrishnan, A. Rinaldo, and A. Singh (2011). “Minimax localization ofstructural information in large noisy matrices”. In: Advances in Neural InformationProcessing Systems.

Kolar, Mladen, John Lafferty, and Larry Wasserman (July 2011). “Union Support Recoveryin Multi-task Learning”. In: J. Mach. Learn. Res. 12, pp. 2415–2435. issn: 1532-4435.

Kolar, Mladen and Hai Liu (2012a). “Optimal ROAD For Feature Selection inHigh-Dimensional Classification”. In: Submitted.

— (2012b). “Union Support Recovery in Multi-task Learning”. In: J. Mach. Learn. Res.(AISTATS track) WCP Vol 22, pp. 647–655.


http://arxiv.org/abs/1210.7665

http://arxiv.org/abs/1205.4770

References III

Kolar, Mladen, Ankur P. Parikh, and Eric P. Xing (2010). “On Sparse NonparametricConditional Covariance Selection”. In: ICML ’10: Proceedings of the 27th AnnualInternational Conference on Machine Learning. Hiafa, Israel.

Kolar, Mladen, Le Song, and Eric Xing (2009). “Sparsistent Learning of Varying-coefficientModels with Structural Changes”. In: Advances in Neural Information Processing Systems22. Ed. by Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta,pp. 1006–1014.

Kolar, Mladen and Eric P. Xing (2010). “Ultra-high Dimensional Multiple Output LearningWith Simultaneous Orthogonal Matching Pursuit: Screening Approach”. In: AISTATS2010: Proceedings of the 13th International Conference on Artifical Intelligence andStatistics, pp. 413–420.

Kolar, Mladen and Eric P Xing (July 2012c). “Sparsistent Estimation of Time-VaryingDiscrete Markov Random Fields”. In: 0907.2337.

Kolar, Mladen, Le Song, Amr Ahmed, and Eric P. Xing (2010). “Estimating Time-VaryingNetworks”. In: Annals of Applied Statistics 4.1, pp. 94–123.

Koller, D. and N. Friedman (2009). Probabilistic graphical models: principles andtechniques. MIT press.


References IV

Lauritzen, S. L. (1996). Graphical Models (Oxford Statistical Science Series). OxfordUniversity Press, USA.

Meinshausen, Nicolai and Peter Buhlmann (2006). “High-dimensional graphs and variableselection with the Lasso”. In: Annals of Statistics 34.3, pp. 1436–1462.

Nesterov, Yu. (May 2005). “Smooth minimization of non-smooth functions”. In:Mathematical Programming 103.1, pp. 127–152. doi: 10.1007/s10107-004-0552-5. url:http://dx.doi.org/10.1007/s10107-004-0552-5.

Peng, Jie, Pei Wang, Nengfeng Zhou, and Ji Zhu (2009). “Partial Correlation Estimation byJoint Sparse Regression Models”. In: Journal of the American Statistical Association104.486, pp. 735–746. doi: 10.1198/jasa.2009.0126. eprint:http://pubs.amstat.org/doi/pdf/10.1198/jasa.2009.0126. url:http://pubs.amstat.org/doi/abs/10.1198/jasa.2009.0126.

Ravikumar, P., M. J. Wainwright, and J. D. Lafferty (2009). “High-Dimensional Ising ModelSelection Using `1 Regularized Logistic Regression”. In: Annals of Statistics to appear.

Song, L., M. Kolar, and E.P. Xing (2009). “Time-varying dynamic bayesian networks”. In:Advances in Neural Information Processing Systems 22, pp. 1732–1740.


http://dx.doi.org/10.1007/s10107-004-0552-5

http://dx.doi.org/10.1007/s10107-004-0552-5

http://dx.doi.org/10.1198/jasa.2009.0126

http://pubs.amstat.org/doi/pdf/10.1198/jasa.2009.0126

http://pubs.amstat.org/doi/abs/10.1198/jasa.2009.0126

References V

Song, Le, Mladen Kolar, and Eric P. Xing (2009). “KELLER: Estimating Time-EvolvingInteractions Between Genes”. In: Proceedings of the 16th International Conference onIntelligent Systems for Molecular Biology.

Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight (2005).“Sparsity and smoothness via the fused lasso”. In: Journal Of The Royal Statistical SocietySeries B 67.1, pp. 91–108. url:http://ideas.repec.org/a/bla/jorssb/v67y2005i1p91-108.html.


http://ideas.repec.org/a/bla/jorssb/v67y2005i1p91-108.html

Documents

Estimating Time-Varying Networks · Acknowledgments E. Xing L. Song A. Ahmed A. Parikh M. Kolar (Chicago Booth) Estimating Time-Varying Networks November 20, 2013 2