Kayvan Sadeghi Department of Statistics ... - mypages.iit.edu

Hierarchical Models for Independence Structures of Networks

Kayvan Sadeghi

Department of Statistics

Carnegie Mellon University

May 2014

Joint work with Alessandro Rinaldo

1

Objective

• The independence structure of networks has not been

well-studied.

• Graphical Markov models have been used to model conditional

independence structures among random variables.

• The goal is to combine network and graphical models to deal with

different possible independence structures.

2

Network models

• A network is a complete graph where

– nodes correspond to labeled individuals;

– edges correspond to binary random variables (also called

dyads).

• In a realization of a network, i ∼ j if the random variable

corresponding to the edge ij takes 1, and i 6∼ j otherwise.

3

Graphical models

• In graphical models different types of dependence graphs have

been used.

• A dependence graph is a graph where

– nodes correspond to random variables X = (X1, . . . , X|N |);

– edges indicate some specific types of conditional

dependencies among the random variables corresponding to

the nodes.

• Here we focus on undirected graphs (conditional

independencies), but a similar theory has been developed for

bidirected graph (marginal independencies).

4

Probabilistic independence

• X ⊥⊥ Y |Z ⇔ fXY Z(x, y, z) =fXZ(x,z)fY Z(y,z)

fZ(z)

• (Xα)α∈V : a collection of random variables with a probability

distribution P

• Short notation A⊥⊥ B |C for XA ⊥⊥XB |XC for A,B,C

subsets of V

5

The relationship between networks and dependence graphs

• We are interested in the network Gn with n = |V | nodes and

|E| =(n2

):= m dyads.

• Consider a dependence graph D with m nodes that are binary

random variables.

• The basic assumption is that Gn entails the same conditional

independence structure among its edges as that of D among its

nodes.

• We use a corresponding dependence graph D to model H .

6

Modeling dependence graphs for modeling networks

• Is it enough to model a dependence graph with m binary random

variables? No!:

• Nodes of the network induce a specific labeling to the edges, which is

lost by choosing an arbitrarily labelled dependence graph:

• Nodes of D should be labelled by pairs ij such that 1 ≤ i < j ≤ n.

12

13

23

7

Structural and non-structural approaches to select dependence

graph

• Structural approach: A given dependence graph is used to model

the independence structure for the network.

• Non-structural approach: Model selection is performed to obtain a

dependence graph.

• Here we deal with the structural approach.

8

Discrete graphical models

• Suppose that we have a dataset with N observations of d

discrete random variables.

• We write the discrete variables as X = (Xv)v∈V , and we call

the possible values a discrete variable may take its levels.

• We can write a generic observation (or cell) as i = (i1, . . . , id),

and the set of possible cells as I .

• We are interested in modelling the probabilities

p(i) = Pr(X = i) for i ∈ I .

9

Log-linear models

• For a three-dimensional table:

We write variables as (A,B,C) and a generic cell as

i = (j, k, l);

the saturated model as:

log p(i) = u+ uAj + uBk + uCl + uABjk + uACjl + uBCkl + uABCjkl .

• By letting u = expu,

p(i) = u.uAj .uBk .u

Cl .u

ABjk .u

ACjl .u

BCkl .u

ABCijk .

10

Hierarchical log-linear models

• The term hierarchical means that if a term is set to zero, all its

higher-order relatives are also set to zero.

• Hierarchical models are specified using the maximal interaction

terms not set to zero: these are called the generators of the

model.

• For example

log p(i) = u+ uaj + ubk + ucl + uabjk + uacjl

has generators {ab, ac}.

11

Hierarchical log-linear models for undirected graphs

• The generators of a hierarchical log-linear model correspond to

the maximal cliques in the undirected graph.

• For example, for the generator set {ab, ac}:

• p(i) = (u.uaj .ubk.u

abjk)(u

cl .u

acjl ), which implies b⊥⊥ c | a, as

needed.

b a c

12

Markov dependence property for undirected graphs

• We assume a dependence graph satisfies the Markov

dependence property (Frank and Strauss 1986) if when ij 6∼ kl

in G then ij 6∼ kl in D.

• Notice that the statement holds only in one direction; therefore,

such dependence graphs are subgraphs of the line graph of Gn.

• This implies that cliques in D correspond to stars and triangles in

G.

• In most practical cases, this is a plausible assumption.

13

Frank and Strauss: Markov Graphs

Figure 2. Dependence Graph D of G in Example 2.

of the probability function. We will say that a graph G is where the first sum is over {u, v, w) E (f) with T,,, C G a Markov graph, or has Markov dependence, if D contains and the last sum is over (vo, . . . , vk) E N(k+l) with no edge between disjoint sets {s, t ) and {u, v) in M. This S,o...,kC G. This log-linear model has n 2"-' + (;) -means that nonincident edges in G are conditionally in- (":I) parameters and sufficient statistics corresponding to dependent. This definition of Markov dependence is con- triangles and stars in G. sistent with the nearest neighbor definition (Besag 1974) It is interesting to note that no tetrads or other induced if the nearest neighbors of {u, u } E M are {u, w} and {v, w) for all w E N different from u and v. Figure 3 illustrates the dependence graph D for a Markov graph of order n = 4.

For general Markov graphs the cliques of D correspond to sets of edges such that any pair of edges within the set must be incident. Such sets are just the triangles and stars; that is, triangles

and k-stars D

S ,,..., ,= {{v,, vi}: i = 1, . . . , k},

(Vo, . . . , Vk) E N(k+l) (3.5) for k = 1, . . . , n - 1. These sufficient subgraphs are displayed in Figure 4 for n = 4. We note that T,,, is invariant under permutations of (u, u, w). Furthermore, S,,", = S,,,,and for k > 1 S ,,...,, is invariant under permutations of (v,, . . . , u k ) Thus there are (;) distinct triangles, (;) distinct 1-stars, and n("i l ) distinct k-stars for k = 2, . . . , Star of G n - 1. In total there are

Clique of D

sufficient subgraphs. Writing a, = a(A), we now intro- duce the notation

t,", = ff(T,",), a"u...",= 4 s ",... "J (3.7) and obtain the following result from Theorem 1. Triangle of G

V Theorem 2. Any undirected Markov graph has prob-

ability Clique of D

n - I Figure 3. A Markov Graph G of Order n = 4 and Its Dependence Pr(G) = c-lexp 2 t,,, + a,...,,lk! , (3.8) Graph D (two illustrations of the correspondence between cliques of D

k = l and sufficient subgraphs of G).

14

Network models

• For x, an observed network, exponential random graph models

are of from P (x) = exp{∑

i si(x)θi − ψ(θ)},where si(x) is sufficient statistics, θi natural parameter, and ψ(θ)

the normalizing constant.

• Models in the literature that assume dyadic independence

(Erdos-Renyi, Beta, P1, etc.) can be generalized to capture the

independence structure of D.

• We show the method for Erdos-Renyi and Beta.

• For Erdos-Renyi the sufficient statistics is the number of edges

and for beta is the degree sequence.

15

Hierarchical Erdos-Renyi models

• The baseline for empty independence graph:

p(x) = u∏

i<j exp{xijq}, where q = log( p1−p).

• Hierarchical Erdos-Renyi models:

pX(x) =∏

C∈C uiC = u exp{∑

C∈C γC∏

c∈C xc},

where c is of from ij, and

• γC = {q(r), if C = {ii1, ii2, . . . , iir};

t, if C = {ij, ik, jk}.

16

Example

12

13

23

• P (x) =u exp{q(1)x12+ q(1)x13+ q(1)x23+ q(2)x12x13+ q(2)x13x23}.

17

Hierarchical Erdos-Renyi models

• Number of parameters is r: the size of the largest clique in D.

• Model hierarchical: q(r) = 0⇒ q(r+1) = 0.

• Model in exponential family form:

pX(x) = exp{∑n

r=1 q(r)s

(r)

C(r)(x) + t.s(t)

C(t)(x)− ψ(q, t)},

C(r): set of all cliques with r nodes in D,

s(r)

C(r)(x): the number of r-stars in G with edges that are members of C(r);

• The saturated model (where D is the line graph of a complete graph):

The k-stars and triangle model.

• Normalizing constant in closed form, but depending on the cliques of the

dependence graph.

18

Hierarchical Beta models

• The baseline for empty independence graph:

pβ(x) =∏

i<j pxijij (1− pij)1−xij = u

∏i<j e

(βi+βj)xij .

• Hierarchical Beta models:

pX(x) =∏

C∈C uiC = u exp{∑

C∈C γC∏

c∈C xc},

where c is of from ij, and

• γC = {β(r)i , if C = {ii1, ii2, . . . , iir}, r ≥ 1

τi + τj + τk, if C = {ij, ik, jk}.

19

Example

12

13

23

• P (x) = u exp{(β(1)1 + β

(1)2 )x12 + (β

(1)1 + β

(1)3 )x13 + (β

(1)2 +

β(1)3 )x23 + β

(2)1 x12x13 + β

(2)3 x13x23}.

20

Hierarchical Beta models

• Number of parameters is nr.

• Model hierarchical: β(r)i = 0⇒ β

(r+1)i = 0.

• Model in exponential family form:

pX(x) = exp{∑n

i=1

∑nr=1 β

(r)i d

(r)

i,C(r)i

(x) + β(t)i d

(t)

i,C(t)i

(x)− ψ(β)},

C(r)i : the set of all cliques with r nodes in D such that all their nodes share i,

d(r)

i,C(r)i

(x): the number of r-stars in G with hub i and endpoints that are pairs

of i in members of C(r)i .

• The saturated model for line graph of a complete graph: sufficient statistics are

the number of r-stars with i as the hub; and the number of triangles that

contain i.

21

Parameter estimation

• Normalizing constant is in closed form⇒ there is an explicit formula for the

likelihood function and its gredient; e.x. for HER:

∂l(q)/∂q(i) = s(i)(x)−∑m

r=1

∑H⊆D(r) c(i)(H) exp{

∑min(d,r)r′=1 c(r

′)(H)q(r′)}

1 +∑m

r=1

∑H⊆D(r) exp{

∑min(d,r)r′=1 c(r′)(H)q(r′)}

.

• We apply the gradient decent method to obtain the MLE.

• The method os computationally demanding due to many terms in the sum and

large values in exp:

• For sparse dependence graphs, i.e., number of edges of order of a constant,

the method works reasonably well.

22

Erdos-Renyi vs. hierarchical Erdos-Renyi: Simulation results

• We simulated binary data with conditional dependencies of a

dependence graph using Gibbs sampling.

• We applied the likelihood ratio test to compare Erdos-Renyi vs.

hierarchical Erdos-Renyi.

D = 2lHER(q)− 2lER(q1, . . . , qr).

• Under the null hypothesis that these two models are not different

D ∼ χr−1.

23

Example for n = 12:

24

Related algebraic and geometric questions

• Find the model polytopes based on a given dependence graph in

order to study the MLE existent.

• The identifiability problem for these models.

• Markov moves: How to move on the space of networks so that we

keep sufficient statistics, which are based on the dependence

graph, unchanged.

• Model selection for the non-structrul approach: Model selection

for binary variables with certain restrictions.

25

Summary

• Modeling the independence structure of networks is an essential but neglected

task.

• In order to model independence structure for networks, we model binary

graphical models with nodes labelled as ij from.

• We generalized the models for network by using the hierarchical models in the

graphical models literature to come up with models that capture certain

independence structures.

• The selection of dependence graphs from data is an important task for further

work.

• There is a lot of interesting geometry in these models to discover.

• A similar method has been developed for bidirected graphs, which are those

that capture marginal independence.

26

Documents

Kayvan Sadeghi Department of Statistics ... - mypages.iit.edu