Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Hierarchical Models for Independence Structures of Networks
Kayvan Sadeghi
Department of Statistics
Carnegie Mellon University
May 2014
Joint work with Alessandro Rinaldo
1
Objective
• The independence structure of networks has not been
well-studied.
• Graphical Markov models have been used to model conditional
independence structures among random variables.
• The goal is to combine network and graphical models to deal with
different possible independence structures.
2
Network models
• A network is a complete graph where
– nodes correspond to labeled individuals;
– edges correspond to binary random variables (also called
dyads).
• In a realization of a network, i ∼ j if the random variable
corresponding to the edge ij takes 1, and i 6∼ j otherwise.
3
Graphical models
• In graphical models different types of dependence graphs have
been used.
• A dependence graph is a graph where
– nodes correspond to random variables X = (X1, . . . , X|N |);
– edges indicate some specific types of conditional
dependencies among the random variables corresponding to
the nodes.
• Here we focus on undirected graphs (conditional
independencies), but a similar theory has been developed for
bidirected graph (marginal independencies).
4
Probabilistic independence
• X ⊥⊥ Y |Z ⇔ fXY Z(x, y, z) =fXZ(x,z)fY Z(y,z)
fZ(z)
• (Xα)α∈V : a collection of random variables with a probability
distribution P
• Short notation A⊥⊥ B |C for XA ⊥⊥XB |XC for A,B,C
subsets of V
5
The relationship between networks and dependence graphs
• We are interested in the network Gn with n = |V | nodes and
|E| =(n2
):= m dyads.
• Consider a dependence graph D with m nodes that are binary
random variables.
• The basic assumption is that Gn entails the same conditional
independence structure among its edges as that of D among its
nodes.
• We use a corresponding dependence graph D to model H .
6
Modeling dependence graphs for modeling networks
• Is it enough to model a dependence graph with m binary random
variables? No!:
• Nodes of the network induce a specific labeling to the edges, which is
lost by choosing an arbitrarily labelled dependence graph:
• Nodes of D should be labelled by pairs ij such that 1 ≤ i < j ≤ n.
12
13
23
7
Structural and non-structural approaches to select dependence
graph
• Structural approach: A given dependence graph is used to model
the independence structure for the network.
• Non-structural approach: Model selection is performed to obtain a
dependence graph.
• Here we deal with the structural approach.
8
Discrete graphical models
• Suppose that we have a dataset with N observations of d
discrete random variables.
• We write the discrete variables as X = (Xv)v∈V , and we call
the possible values a discrete variable may take its levels.
• We can write a generic observation (or cell) as i = (i1, . . . , id),
and the set of possible cells as I .
• We are interested in modelling the probabilities
p(i) = Pr(X = i) for i ∈ I .
9
Log-linear models
• For a three-dimensional table:
We write variables as (A,B,C) and a generic cell as
i = (j, k, l);
the saturated model as:
log p(i) = u+ uAj + uBk + uCl + uABjk + uACjl + uBCkl + uABCjkl .
• By letting u = expu,
p(i) = u.uAj .uBk .u
Cl .u
ABjk .u
ACjl .u
BCkl .u
ABCijk .
10
Hierarchical log-linear models
• The term hierarchical means that if a term is set to zero, all its
higher-order relatives are also set to zero.
• Hierarchical models are specified using the maximal interaction
terms not set to zero: these are called the generators of the
model.
• For example
log p(i) = u+ uaj + ubk + ucl + uabjk + uacjl
has generators {ab, ac}.
11
Hierarchical log-linear models for undirected graphs
• The generators of a hierarchical log-linear model correspond to
the maximal cliques in the undirected graph.
• For example, for the generator set {ab, ac}:
• p(i) = (u.uaj .ubk.u
abjk)(u
cl .u
acjl ), which implies b⊥⊥ c | a, as
needed.
b a c
12
Markov dependence property for undirected graphs
• We assume a dependence graph satisfies the Markov
dependence property (Frank and Strauss 1986) if when ij 6∼ kl
in G then ij 6∼ kl in D.
• Notice that the statement holds only in one direction; therefore,
such dependence graphs are subgraphs of the line graph of Gn.
• This implies that cliques in D correspond to stars and triangles in
G.
• In most practical cases, this is a plausible assumption.
13
Frank and Strauss: Markov Graphs
Figure 2. Dependence Graph D of G in Example 2.
of the probability function. We will say that a graph G is where the first sum is over {u, v, w) E (f) with T,,, C G a Markov graph, or has Markov dependence, if D contains and the last sum is over (vo, . . . , vk) E N(k+l) with no edge between disjoint sets {s, t ) and {u, v) in M. This S,o...,kC G. This log-linear model has n 2"-' + (;) -means that nonincident edges in G are conditionally in- (":I) parameters and sufficient statistics corresponding to dependent. This definition of Markov dependence is con- triangles and stars in G. sistent with the nearest neighbor definition (Besag 1974) It is interesting to note that no tetrads or other induced if the nearest neighbors of {u, u } E M are {u, w} and {v, w) for all w E N different from u and v. Figure 3 illustrates the dependence graph D for a Markov graph of order n = 4.
For general Markov graphs the cliques of D correspond to sets of edges such that any pair of edges within the set must be incident. Such sets are just the triangles and stars; that is, triangles
and k-stars D
S ,,..., ,= {{v,, vi}: i = 1, . . . , k},
(Vo, . . . , Vk) E N(k+l) (3.5) for k = 1, . . . , n - 1. These sufficient subgraphs are displayed in Figure 4 for n = 4. We note that T,,, is invari- ant under permutations of (u, u, w). Furthermore, S,,", = S,,,,and for k > 1 S ,,...,, is invariant under permutations of (v,, . . . , u k ) Thus there are (;) distinct triangles, (;) distinct 1-stars, and n("i l ) distinct k-stars for k = 2, . . . , Star of G n - 1. In total there are
Clique of D
sufficient subgraphs. Writing a, = a(A), we now intro- duce the notation
t,", = ff(T,",), a"u...",= 4 s ",... "J (3.7) and obtain the following result from Theorem 1. Triangle of G
V Theorem 2. Any undirected Markov graph has prob-
ability Clique of D
n - I Figure 3. A Markov Graph G of Order n = 4 and Its Dependence Pr(G) = c-lexp 2 t,,, + a,...,,lk! , (3.8) Graph D (two illustrations of the correspondence between cliques of D
k = l and sufficient subgraphs of G).
14
Network models
• For x, an observed network, exponential random graph models
are of from P (x) = exp{∑
i si(x)θi − ψ(θ)},where si(x) is sufficient statistics, θi natural parameter, and ψ(θ)
the normalizing constant.
• Models in the literature that assume dyadic independence
(Erdos-Renyi, Beta, P1, etc.) can be generalized to capture the
independence structure of D.
• We show the method for Erdos-Renyi and Beta.
• For Erdos-Renyi the sufficient statistics is the number of edges
and for beta is the degree sequence.
15
Hierarchical Erdos-Renyi models
• The baseline for empty independence graph:
p(x) = u∏
i<j exp{xijq}, where q = log( p1−p).
• Hierarchical Erdos-Renyi models:
pX(x) =∏
C∈C uiC = u exp{∑
C∈C γC∏
c∈C xc},
where c is of from ij, and
• γC = {q(r), if C = {ii1, ii2, . . . , iir};
t, if C = {ij, ik, jk}.
16
Example
12
13
23
• P (x) =u exp{q(1)x12+ q(1)x13+ q(1)x23+ q(2)x12x13+ q(2)x13x23}.
17
Hierarchical Erdos-Renyi models
• Number of parameters is r: the size of the largest clique in D.
• Model hierarchical: q(r) = 0⇒ q(r+1) = 0.
• Model in exponential family form:
pX(x) = exp{∑n
r=1 q(r)s
(r)
C(r)(x) + t.s(t)
C(t)(x)− ψ(q, t)},
C(r): set of all cliques with r nodes in D,
s(r)
C(r)(x): the number of r-stars in G with edges that are members of C(r);
• The saturated model (where D is the line graph of a complete graph):
The k-stars and triangle model.
• Normalizing constant in closed form, but depending on the cliques of the
dependence graph.
18
Hierarchical Beta models
• The baseline for empty independence graph:
pβ(x) =∏
i<j pxijij (1− pij)1−xij = u
∏i<j e
(βi+βj)xij .
• Hierarchical Beta models:
pX(x) =∏
C∈C uiC = u exp{∑
C∈C γC∏
c∈C xc},
where c is of from ij, and
• γC = {β(r)i , if C = {ii1, ii2, . . . , iir}, r ≥ 1
τi + τj + τk, if C = {ij, ik, jk}.
19
Example
12
13
23
• P (x) = u exp{(β(1)1 + β
(1)2 )x12 + (β
(1)1 + β
(1)3 )x13 + (β
(1)2 +
β(1)3 )x23 + β
(2)1 x12x13 + β
(2)3 x13x23}.
20
Hierarchical Beta models
• Number of parameters is nr.
• Model hierarchical: β(r)i = 0⇒ β
(r+1)i = 0.
• Model in exponential family form:
pX(x) = exp{∑n
i=1
∑nr=1 β
(r)i d
(r)
i,C(r)i
(x) + β(t)i d
(t)
i,C(t)i
(x)− ψ(β)},
C(r)i : the set of all cliques with r nodes in D such that all their nodes share i,
d(r)
i,C(r)i
(x): the number of r-stars in G with hub i and endpoints that are pairs
of i in members of C(r)i .
• The saturated model for line graph of a complete graph: sufficient statistics are
the number of r-stars with i as the hub; and the number of triangles that
contain i.
21
Parameter estimation
• Normalizing constant is in closed form⇒ there is an explicit formula for the
likelihood function and its gredient; e.x. for HER:
∂l(q)/∂q(i) = s(i)(x)−∑m
r=1
∑H⊆D(r) c(i)(H) exp{
∑min(d,r)r′=1 c(r
′)(H)q(r′)}
1 +∑m
r=1
∑H⊆D(r) exp{
∑min(d,r)r′=1 c(r′)(H)q(r′)}
.
• We apply the gradient decent method to obtain the MLE.
• The method os computationally demanding due to many terms in the sum and
large values in exp:
• For sparse dependence graphs, i.e., number of edges of order of a constant,
the method works reasonably well.
22
Erdos-Renyi vs. hierarchical Erdos-Renyi: Simulation results
• We simulated binary data with conditional dependencies of a
dependence graph using Gibbs sampling.
• We applied the likelihood ratio test to compare Erdos-Renyi vs.
hierarchical Erdos-Renyi.
D = 2lHER(q)− 2lER(q1, . . . , qr).
• Under the null hypothesis that these two models are not different
D ∼ χr−1.
23
Example for n = 12:
24
Related algebraic and geometric questions
• Find the model polytopes based on a given dependence graph in
order to study the MLE existent.
• The identifiability problem for these models.
• Markov moves: How to move on the space of networks so that we
keep sufficient statistics, which are based on the dependence
graph, unchanged.
• Model selection for the non-structrul approach: Model selection
for binary variables with certain restrictions.
25
Summary
• Modeling the independence structure of networks is an essential but neglected
task.
• In order to model independence structure for networks, we model binary
graphical models with nodes labelled as ij from.
• We generalized the models for network by using the hierarchical models in the
graphical models literature to come up with models that capture certain
independence structures.
• The selection of dependence graphs from data is an important task for further
work.
• There is a lot of interesting geometry in these models to discover.
• A similar method has been developed for bidirected graphs, which are those
that capture marginal independence.
26