17
Copula Gaussian Graphical Models * Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics, University of Washington, Seattle, WA 98195 [email protected] and Alex Lenkoski Department of Statistics, University of Washington, Seattle, WA 98195 [email protected] March 10, 2009 Abstract We propose a comprehensive Bayesian approach for graphical model determination in ob- servational studies that can accommodate binary, ordinal or continuous variables simultane- ously. Our new models are called copula Gaussian graphical models and embed graphical model selection inside a semiparametric Gaussian copula. The domain of applicability of our methods is very broad and encompass many studies from social science and economics. We illustrate the use of the copula Gaussian graphical models in three representative datasets. Keywords: Bayesian inference, Gaussian graphical models, latent variable model, Markov chain Monte Carlo. * Adrian Dobra is Assistant Professor, Department of Statistics, University of Washington, Seattle, WA 98195 (email: [email protected]). Alex Lenkoski is Graduate Student, Department of Statistics, University of Wash- ington, Seattle, WA 98195 (email: [email protected]). 1

Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

Copula Gaussian Graphical Models∗

Technical Report no. 555Department of Statistics, University of Washington

Adrian DobraDepartment of Statistics,

University of Washington,Seattle, WA 98195

[email protected]

and

Alex LenkoskiDepartment of Statistics,

University of Washington,Seattle, WA 98195

[email protected]

March 10, 2009

Abstract

We propose a comprehensive Bayesian approach for graphical model determination in ob-servational studies that can accommodate binary, ordinal or continuous variables simultane-ously. Our new models are called copula Gaussian graphical models and embed graphicalmodel selection inside a semiparametric Gaussian copula. The domain of applicability of ourmethods is very broad and encompass many studies from social science and economics. Weillustrate the use of the copula Gaussian graphical models in three representative datasets.

Keywords: Bayesian inference, Gaussian graphical models, latent variable model, Markovchain Monte Carlo.

∗Adrian Dobra is Assistant Professor, Department of Statistics, University of Washington, Seattle, WA 98195(email: [email protected]). Alex Lenkoski is Graduate Student, Department of Statistics, University of Wash-ington, Seattle, WA 98195 (email: [email protected]).

1

Page 2: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

1 IntroductionThe determination of conditional independence relationships through graphical models is a key

component of the statistical analysis of observational studies. A pertinent example we will focuson in this paper is a functional disability dataset extracted from the “analytic” data file for Na-tional Long Term Care Survey (NLTCS) created by the Center of Demographic Studies at DukeUniversity. Each observed variable is binary and corresponds to a measure of disability definedby an activity of daily living. This contingency table cross-classifies information on elderly aged65 and above pooled across four survey waves, 1982, 1984, 1989 and 1994 – see Manton, Corderand Stallard (15) for more details. The 16 dimensions of this table correspond to six activities ofdaily living (ADLs) and ten instrumental activities of daily living (IADLs). Specifically, the ADLsrelate to hygiene and personal care: eating (ADL1), getting in/out of bed (ADL2), getting aroundinside (ADL3), dressing (ADL4), bathing (ADL5) and getting to the bathroom or using a toilet(ADL6). The IADLs relate to activities needed to live without dedicated professional care: doingheavy house work (IADL1), doing light house work (IADL2), doing laundry (IADL3), cooking(IADL4), grocery shopping (IADL5), getting about outside (IADL6), travelling (IADL7), manag-ing money (IADL8), taking medicine (IADL9) and telephoning (IADL10). For each ADL/IADLmeasure, subjects were classified as being either healthy (level 1) or disabled (level 2) on that mea-sure. The methodology we develop in this paper allows us to determine the complex pattern ofconditional associations that exist among the 16 daily living activities. This represents a criticalissue that was left unexplored in previous analyses of this dataset (7; 8).

In fact, the domain of applicability of our methods is not restricted to contingency tables. Sincemultivariate datasets arising from social science or economics typically contain variables of manytypes, our goal was to develop an approach to graphical model determination that is broad enoughto be applicable to any study that involves a mixture of binary, ordinal and continuous variables.

Most of the research efforts in the graphical models literature have been focused on multivariatenormal models or on log-linear models – see, for example, the excellent monographs of Lauritzen(11) and Whittaker (22). These models relate to datasets that contain exclusively continuous orcategorical variables. CG distributions (11) constitute the basis of a class of graphical models formixed variables, but they impose an overly restrictive assumption: the conditional distribution ofthe continuous variables given the discrete variables must be multivariate normal. As such, thethree three main classes of graphical models are too restrictive to be widely applicable to socialscience or economics studies.

Copulas (16) provide the theoretical framework in which multivariate associations can be mod-eled separately from the univariate distributions of the observed variables. In this paper we employthe Gaussian copula and further require conditional independence constraints on the inverse of itscorrelation matrix. The resulting models are called copula Gaussian graphical models because theyonly impose a multivariate normal assumption for a set of latent variables which are in a one-to-onecorrespondence with the set of observed variables. A related approach for inference in Gaussiancopulas has been developed by Pitt et al. (18). Their framework involves parametric models forGaussian copulas and the univariate marginal distributions of the observed variables. We make useof the extended rank likelihood proposed by Hoff (9) and treat these marginal distributions as nui-sance parameters. This makes our methodology more focused on the determination of graphical

2

Page 3: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

models and avoids the difficult problem of modeling marginal distributions.The structure of the paper is as follows. In Section 2 we formally introduce Gaussian graphi-

cal models (GGMs) and describe a Bayesian framework for inference in this class of models. InSection 3 we show how to extend GGMs to represent conditional independence associations in alatent variables space. We also present a comprehensive Bayesian approach for learning and es-timation in the resulting copula GGMs. In Section 4 we analyze the NLTCS functional disabilitydata together with two other representative datasets from social science using copula GGMs. Wediscuss our proposed methodology in Section 5.

2 Gaussian graphical modelsWe let X = XV , V = {1, 2, . . . , p}, be a random vector with a joint distribution p(XV). The

conditional independence relationships among {Xv : v ∈ V} under p(XV) can be summarized in agraph G = (V, E), where each vertex v ∈ V corresponds with a random variable Xv and E ⊂ V × Vare undirected edges (22). Here “undirected” means that (v1, v2) ∈ E is equivalent with (v2, v1) ∈ E.

The absence of an edge between Xv1 and Xv2 corresponds with the conditional independence ofthese two random variables given the remaining variables under p(XV) and is denoted by

Xv1 y Xv2 | XV\{v1,v2}. (2.1)

This is called the pairwise Markov property relative to G, which in turn implies the local as wellas the global Markov properties relative to G (11).

We denote by GV the set of all 2p(p−1)/2 undirected graphs with vertices V . Since GV containsmany graphs even for relatively small values of p, it cannot be enumerated and has to be visitedusing stochastic search methods (14; 10). Such algorithms move through GV using neighborhoodsets nbd(G) ⊂ GV for G ∈ GV . The neighborhood of a graph G ∈ GV is comprised of all thegraphs obtained from G by adding or deleting one edge. These neighborhood sets are symmetricand link any two graphs through a path of graphs such that two consecutive graphs on this pathare neighbors of each other. We remark that the neighborhood sets associated with GV contain thesame number of graphs p(p − 1)/2.

Furthermore we assume that X = XV follows a p-dimensional multivariate normal distribu-tion Np(0,K−1) with precision matrix K = (Kv1,v2)1≤v1,v2≤p. We let x(1:n) = (x(1), . . . , x(n))T be theobserved data of n independent samples of X. The likelihood function is proportional to

p(x(1:n)|K) ∝ (det K)n/2 exp{−

12〈K,U〉

}, (2.2)

where U =∑n

i=1 x(i)x(i)T , and 〈A, B〉 = tr(AT B) denotes the trace inner product. We assumed thatthe data have been centered and scaled, so that the sample mean of each Xv is zero and its samplevariance is one.

A graphical model G = (V, E) for Np(0,K−1) is called a Gaussian graphical model (GGM) andis constructed by constraining some of the off-diagonal elements of K to zero. For example, thepairwise Markov property (2.1) holds if and only if Kv1,v2 = 0. This implies that the edges of G

3

Page 4: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

correspond with the off-diagonal non-zero elements of K, i.e. E = {(v1, v2)|Kv1,v2 , 0, v1 < v2}.Given G, the precision matrix K is constrained to the cone PG of symmetric positive definitematrices with entries Kv1,v2 equal to zero for all (v1, v2) < E, v1 < v2.

We take a G-Wishart prior WG(δ,D) for K with density

p(K) =1

IG(δ,D)(det K)(δ−2)/2 exp

{−

12〈K,D〉

}, (2.3)

with respect to the Lebesgue measure on PG (19; 2; 13). The normalizing constant IG(δ,D) is finiteif and only if δ > 2 and D−1 ∈ PG (5). Throughout this paper we set the prior parameters for Kto δ = 3 and D = Ip, the p-dimensional identity matrix. The interpretation of this prior is that thecomponents of X are independent apriori and that the “weight” of the prior is equivalent to oneobserved sample. Other choices of prior parameters are discussed in Carvalho and Scott (3).

Sampling from a G-Wishart distribution is performed using the block Gibbs sampler algorithmof Asci and Piccioni (1). For completeness, we describe this method in the Appendix.

The G-Wishart prior is conjugate to the likelihood (2.2), thus the posterior distribution of Kgiven G is WG(δ + n,D + U). In practice the matrix D + U needs to be adjusted with respect to Gusing the iterative proportional scaling algorithm (21) so that it actually belongs to PG.

The marginal likelihood of G is the ratio of the normalizing constants of the G-Wishart posteriorand prior:

p(x(1:n)|G) =∫

K∈PG

p((1:n)|K)p(K|G) dK,

= IG(δ + n,D + U)/IG(δ,D).

If G is decomposable the marginal likelihood can be explicitly calculated (19). Otherwise numer-ical approximation methods for p(x(1:n)|G) have to be employed (19; 4; 2). We use the Laplaceapproximation developed in Lenkoski and Dobra (12) for IG(δ + n,D + U) because it is fast andaccurate, and the Monte Carlo method of Atay-Kayis and Massam (2) for IG(δ,D).

Given K ∈ PG, the regression of Xv on the remaining elements of X depends only on theneighbors of v in G:

p(Xv|XV\{v} = xV\{v},K) = N

− ∑v′∈bdG(v)

Kv,v′

Kv,vxv′ ,

1Kv,v

. (2.4)

3 Copula Gaussian graphical modelsIn this section we relax the multivariate normal assumption for the joint distribution of X = XV

as follows:

• we assume that there exists an ordering for the possible values of each observed variable Xv,v ∈ V . This assumption is satisfied if Xv is binary, categorical with ordered categories, countor continuous.

4

Page 5: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

• we assume that the multivariate dependence patterns among the observed variables X = XV

are given by the Gaussian copula with p × p correlation matrix Υ (16):

C(u1, . . . , up|Υ) = Φp(Φ−1(u1), . . . ,Φ−1(up)|Υ) : [0, 1]p → [0, 1]. (3.1)

Here Φ(·) represents the CDF of the standard normal distribution and Φp(·|Υ) is the CDF ofNp(0,Υ). Song (20) shows that the density of the Gaussian copula is

c(u1, . . . , up|Υ) ∝ (det Υ)−1/2 exp{−

12〈Υ−1 − Ip, qT q〉

}, (3.2)

where q = (q1, . . . , qp)T with qv = Φ−1(uv) for v ∈ V .

The joint distribution F of X = XV is subsequently a function of Υ and the univariate distribu-tions Fv of Xv, v ∈ V:

p(X1 ≤ x1, . . . , Xp ≤ xp) = F(x1, . . . , xp|Υ, F1, . . . , Fp),= C(F1(x1), . . . , Fp(xp)|Υ).

The Gaussian copula model can also be constructed by introducing a vector of latent variablesZ = ZV ∼ Np(0,Υ) that are related to the observed variables X = XV as

Xv = F−1v (Φ(Zv)), v ∈ V,

where F−1v is the pseudo inverse of Fv. Hoff (9) argues that inference for the correlation matrix Υ

can be performed independently of the marginal distributions {Fv : v ∈ V} which are treated asnuisance parameters. His argument is based on the remark that, given the observed data x(1:n), thelatent samples z(1:n) = (z(1), z(2), . . . , z(n)) are constrained to belong to the set:

A(x(1:n)) = {(zi,v)1≤i≤n,v∈V ∈ Rn×p : liv < zi,v < ui

v, for 1 ≤ i ≤ n, v ∈ V},

where

liv = max{zk,v : x(k)

v < x(i)v }, ui

v = min{zk,v : x(i)v < x(k)

v }. (3.3)

If the value x(i)v is missing from the observed data, we define li

v = −∞ and uiv = ∞.

Hoff (9) decomposes the probability of x(1:n) as

p(x(1:n)|Υ, F1, . . . , Fp) = p(D|Υ)p(x(1:n)|D,Υ, F1, . . . , Fp),

where D is the event {z(1:n) ∈ A(x(1:n))}. In the latent variables space, the observed data is rep-resented by the event D thus any subsequent inference on Υ is performed conditional on the oc-currence of D. The corresponding likelihood function p(D|Υ) does not depend on the univariatedistributions {Fv : v ∈ V}.

We perform inference on the parameter Υ of the Gaussian copula (3.1) by modeling the con-ditional independence relationships of the latent variables Z using Gaussian graphical models. Wecall these models copula Gaussian graphical models. The Markov properties associated with agraphical model for the latent variables are guaranteed to translate into Markov properties for theobserved variables if all the marginals {Fv : v ∈ V} are continuous. The presence of some discretemarginals might induce additional dependencies among the X’s that are not modeled in a graphicalmodel for the latent variables. However, these additional dependencies can be regarded as havinga secondary relevance since they emerge from the marginal distributions {Fv : v ∈ V}.

5

Page 6: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

3.1 Bayesian inference in copula Gaussian graphical modelsWe consider the equivalent hierarchical model for X:

ZV ∼ Np(0,K−1),

Z̃v = Zv/(K−1)1/2v,v , v ∈ V,

Xv = F−1v (Φ(Z̃v)), v ∈ V.

We see that Z̃ = Z̃V ∼ Np(0,Υ) where Υ is a correlation matrix with entries

Υv1,v2 =(K−1)v1,v2√

(K−1)v1,v1(K−1)v2,v2

. (3.4)

Let G ∈ GV be a graph defining a Gaussian graphical model for the latent variables ZV . The jointposterior distribution of K ∈ PG, the latent data z(1:n) ∈ A(x(1:n)) and the graphs G ∈ GV is given by

p(K, z(1:n),G|D) ∝ p(z(1:n)|K,D) × p(K|G) × p(G). (3.5)

The prior distribution of K conditional on G is G-Wishart WG(δ,D) and the prior distribution overGV is uniform, i.e. p(G) ∝ 1. Since the joint distribution (3.5) is not defined if K < PG, weconstruct a Gibbs sampling algorithm for the marginal

p(z(1:n),G|D) =∫

K∈PG

p(K, z(1:n),G|D) dK. (3.6)

Our sampling scheme proceeds by repeating the following two steps. To improve mixing, eachstep is repeated several times before proceeding to the other step.

Step 1: Resample the graph. We sample from the conditional

p(G|z(1:n),D) = p(G|z(1:n)) ∝ p(z(1:n)|G)p(G),

using a Metropolis-Hastings step. We draw a graph Gnew ∈ nbd(G) from the conditional densityp(Gnew|G) ∝ p(G). We move to Gnew with probability

min{

1,p(Gnew|z(1:n))p(G|Gnew)

p(G|z(1:n))p(Gnew|G)

},

otherwise the chain stays in G.

Step 2: Resample the latent data. We sample from p(z(1:n)|G,D) by sampling from the jointdistribution of the latent data and K ∈ PG:

p(z(1:n),K|G,D) ∝ p(z(1:n)|K,D) × p(K|G).

6

Page 7: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

To this end, we employ a Gibbs sampler as follows.

Step 2A: Sample the precision matrix. The conditional

p(K|G, z(1:n),D) = p(K|G, z(1:n)) (3.7)

is the G-Wishart distribution WG(δ + n,D +∑n

i=1 z(i)(z(i))T ). Here we assumed that the latent datahave been centered to have zero sample mean. We sample from (3.7) with the block Gibbs sampleralgorithm.

Step 2B: Sample the latent data. We sample from

p(z(1:n)|K,G,D) = p(z(1:n)|K,D) =n∏

i=1

p(z(i)|K,D).

For each i = 1, . . . , n, Z = z(i) is distributed Np(0,K−1). For each v ∈ V , Zv = z(i)v conditional on

ZV\{v} = z(i)V\{v} follows a normal distribution with mean µv = −

∑v′∈bdG(v)

Kv,v′

Kv,vz(i)

v′ and variance σ2v =

1Kv,v

– see (2.4). Moreover, Zv = z(i)v must belong to the interval [li

v, uiv] – see (3.3). We update z(i)

v bysampling from this truncated normal distribution.

Determining the cliques of the graph G is one of the most computationally expensive partsof the application of the block Gibbs sampler algorithm. As such, the cliques of G are identifiedonce at Step 2, then carried over every time Steps 2A and 2B are executed.

3.2 Estimation in copula Gaussian graphical modelsNow assume we run the Gibbs sampler for S iterations. For each iteration s, let G(s) be the

graph obtained from Step 1 before proceeding to Step 2 and let K(s) ∈ PG(s) be the last matrixsampled at Step 2 before proceeding to Step 1. We denote by Υ(s) the correlation matrix obtainedby rescaling the elements of K(s) as in (3.4).

The samples {(G(s),K(s),Υ(s)) : s = 1, 2, . . . , S } are used to produce Monte Carlo estimates offunctions involving the latent variables Z or the observed variables X. The posterior probabilitythat two latent variables Zv1 and Zv2 are not conditionally independent given ZV\{v1,v2} is the posteriorinclusion probability of the edge (v1,v2) which is estimated as the number of graphs G(s) that containthe edge (v1, v2) divided by S . The posterior expectation of Υ is estimated by the average Υ̃ =1S

∑Ss=1Υ

(s).The CDF of X = XV is estimated as

C(F̂1(x1), . . . , F̂p(xp)|Υ̃),

where F̂v is the empirical univariate distribution of Xv. If each observed variable is discrete andtakes values {0, 1, 2, . . .}, we obtain (20):

P(XV = xV) =1∑

j1=0

· · ·

1∑jp=0

(−1) j1+...+ jpC(u j11 (x1), . . . , u jp

p (xp)|Υ̃), (3.8)

7

Page 8: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

where u0v(xv) = F̂v(xv) and u1

v(xv) = F̂v(xv − 1). We define u1v(0) = 0. For example, if Xv ∈ {0, 1} is

a binary random variable, we have u0v(1) = 1 and u0

v(0) = u1v(1) = 1

n

n∑i=1δ{x(i)

v =0}. Here δA is 1 if A is

true and is 0 otherwise.

4 ExamplesIn this section we apply copula GGMs to analyze three multivariate datasets with high relevance

in the social science literature.

4.1 Labor force survey dataHoff (9) studied the dependencies among income levels, educational attainment, fertility and

family background using a dataset that is publicly available from http://webapp.icpsr.umich.edu/GSSS/and concerns 1002 males in the U.S. labor force. The seven observed variables have been mea-sured on various scales as follows: INC (an ordinal variable with 21 categories representing theincome of the respondent in thousands of dollars), DEG ( an ordinal variable with five categoriesrepresenting the highest degree obtained by the respondent), CHILD (a count variable representingthe number of children of the respondent), PINC (an ordinal variable with five categories repre-senting the financial status of the parents of the respondent), PDEG (an ordinal variable with fivecategories representing the highest degree obtained by the respondent’s parents), PCHILD (a countvariable representing the number of children of the respondent’s parents), AGE (a count variablerepresenting the respondent’s age in years). There is a certain amount of missing data with higherrates for INC and PINC.

The heterogeneity in the marginal distributions of the seven observed variables makes the studyof their joint distribution very difficult. However, we show that our inference framework can besuccessfully applied here. We run the Gibbs sampler described in Section 3 for 7500 iterationsfrom 10 random starting graphs. The initial burn-in time was 500 iterations. We improved themixing of the Markov chain by repeating Step 1 one hundred times and Step 2 fifty times withineach iteration. The convergence to stationarity is illustrated in Figure 4.1. The posterior inclu-sion probability of the edges between (CHILD, PINC) and (DEG, PCHILD) in the latent variablesspace stabilizes after the first 2000 iterations.

Table 4.1 shows the pairs of latent variables whose edge inclusion probability was above 0.5,together with the estimates of the corresponding entries in the correlation matrix Υ. The standarderrors of these estimates calculated across the ten instances of the Gibbs sampler are smaller than0.001 which is indicates that the chains have reached convergence. Table 4.1 suggests that thedeterminants of income (INC) are educational attainment (DEG) and fertility (CHILD). Both de-terminants have a positive effect on income. Income does not seem to be related to the parents’financial status since the posterior probability that INC and PINC are conditionally independent is0.948. Since the posterior inclusion probability of edge (DEG, PDEG) is equal to one, the respon-dent’s education is certainly related to his parents’ education. We also remark that the relationshipsamong income, degree and fertility seem to hold across generations.

8

Page 9: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

0 2000 4000 6000

0.0

0.2

0.4

0.6

0.8

1.0

CHILD − PINC

Iteration

Edg

e P

roba

bilit

y

0 2000 4000 6000

0.0

0.2

0.4

0.6

0.8

1.0

DEG − PCHILD

Iteration

Edg

e P

roba

bilit

y

Figure 4.1: Estimates of the posterior inclusion probability of edges (CHILD, PINC) and (DEG,PCHILD) across iterations.

Hoff (9) assessed links between variables in this dataset by inspecting the 95% credible inter-vals for the regression coefficients to see if they have spanned zero. The main conclusions resultingfrom our copula Gaussian graphical models approach are shared by Hoff (9) though we differ intwo instances. First, our method shows a high probability (essentially one) of an edge betweenCHILD and PCHILD, while this link was absent in Hoff (9). Such an inclusion seems sensible,as individual fertility levels are likely to be related to historical fertility in a given family. Further-more, we place only a 20% inclusion probability on an edge between PINC and PCHILD, thoughthis connection was displayed in Hoff (9).

Table 4.1: Posterior estimates of the off-diagonal elements of Υ and posterior inclusion probabilityof edges for the labor force data.Variable 1 Variable 2 Entry in Υ Edge ProbabilityCHILD INC 0.292 0.997CHILD PCHILD 0.22 0.999CHILD PDEG -0.262 0.953CHILD AGE 0.599 1INC DEG 0.489 1INC AGE 0.34 1DEG PCHILD -0.187 0.668DEG PDEG 0.473 1PCHILD PDEG -0.303 0.991PINC PDEG 0.453 1PDEG AGE -0.232 0.988

9

Page 10: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

0 5000 10000 15000 20000

0.0

0.2

0.4

0.6

0.8

1.0

c − e

Iteration

Edg

e P

roba

bilit

y

0 5000 10000 15000 20000

0.0

0.2

0.4

0.6

0.8

1.0

f − h

Iteration

Edg

e P

roba

bilit

y

Figure 4.2: Estimates of the posterior inclusion probability of edges (c, e) and ( f , h) across itera-tions.

4.2 The Rochdale dataWe consider a social survey dataset previously analyzed in Whittaker (22). This observational

study was conducted in Rochdale and attempted to assess the relationships among factors affectingwomen’s economic activity. The variables are as follows: a, wife economically active (no,yes);b, age of wife > 38 (no,yes); c, husband unemployed (no,yes); d, child ≤ 4 (no,yes); e, wife’seducation, high-school+ (no,yes); f , husband’s education, high-school+ (no,yes); g, Asian origin(no,yes); h, other household member working (no,yes). The resulting 28 cross-classification has165 counts of zero, while 217 cells contain small positive counts smaller than 3. There are quite afew counts larger than 30 or even 50.

Since the sample size is only 665, the table is sparse and seems to be imbalanced. Due to thesereasons Whittaker (22) argues that higher-order interactions involving more than two variablesshould not be included in any log-linear model that is fit to this dataset. He subsequently studiestwo log-linear models: the all two-way interactions model whose minimal sufficient statistics arethe 28 two-way marginals and the model whose minimal sufficient statistics are

f g|e f |dh|dg|cg|c f |ce|bh|be|bd|ag|ae|ad|ac. (4.1)

We learned copula GGMs by running the Gibbs sampler algorithm from Section 3 for 20000 itera-tions from 10 random starting graphs. The burn-in time was 5000 iterations. Within each iteration,we repeated Step 1 one hundred times and Step 2 fifty times. Convergence to the stationary distri-bution is illustrated in Figure 4.3 that gives the posterior inclusion probabilities of the edges (c, e)and ( f , h) across iteration for each of ten chains. As shown, there is a general level of agreementbetween the chains within 10000 iteration and essentially full convergence by 20000 iterations.

Figure 4.3 shows the edges having posterior inclusion probabilities greater than 0.5 togetherwith the signs of the corresponding elements of the posterior estimate of the correlation matrixΥ. Of particular interest is the determination of the factors that influence variable a — the wife’s

10

Page 11: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

a

b

+

d-

g

-

--

e

-

h+

+

-

f+

c

+

-

-

Figure 4.3: Edges with posterior inclusion probability greater than 0.5 in the Rochdale data. The+/− sign above each edge indicates whether the estimated correlation between the pair of latentvariables was positive or negative.

economic activity. The posterior inclusion probabilities of edges that link the latent variable asso-ciated with a with the other latent variables are: 0.634 for (a, b), 0.394 for (a, c), 0.664 for (a, d),0.288 for (a, e), 0.223 for (a, f ), 1.00 for (a, g) and 0.079 for (a, h). This suggests that the variablewith the strongest association with a is g, while the variable with the weakest association with ais h. Whittaker (22) considers the log-linear model ac|ad|ae|ag induced by the generators of (4.1)that involve a. Using maximum likelihood estimation of log-linear parameters in this model, heobtains the following estimates of the logistic regression of a on c, d, e and g:

logp(a = 1|c, d, e, g)p(a = 0|c, d, e, g)

= const. − 1.33c − 1.32d + 0.69e − 2.17g. (4.2)

Equation (4.2) suggests that variable g is the variable with the strongest association with a, andthat these two variables are negatively correlated. Moreover, variable h does not appear in (4.2),which suggests that it has a weak influence on a. Therefore Whittaker’s findings based on log-linear models are perfectly coherent with our conclusions based on copula GGMs.

The expected cell counts corresponding with the copula GGMs are determined accordingto equation (3.8). These expected cell counts associated with the cells containing the 20 largestobserved counts are compared with the expected cell counts associated with the all two-way inter-actions log-linear model and the log-linear model (4.1). We see that the all two-way interactionsmodel seems to have the closest fit, followed by the copula Gaussian graphical model and thelog-linear model (4.1). It is remarkable that the copula model performs as well as the all two-wayinteraction model for the largest cell count 57. The squared error between the observed counts andthe expected cell counts for all the 256 cells in the table are: 284.79 for the all two-way interac-tions model, 443.14 for the copula model and 905.78 for the model (4.1). Since the all two-wayinteraction model has essentially no value for the social scientist who wants to learn about the re-lationships among the eight observed variables, the copula Gaussian graphical model offers a goodfit while it reveals the most significant patterns of interactions.

11

Page 12: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

Table 4.2: Expected cell counts for the top 20 largest counts cells associated with the all two-wayinteractions log-linear model, the Whittaker’s log-linear model (4.1) and the copula GGMs in theRochdale data. Here 1 stands for no and 2 stands for yes.

Cell Observed All two-way Whittaker Copula GGMs2 1 1 1 2 2 1 1 57 56.78 52.08 56.512 2 1 1 2 2 1 1 43 44.61 40.97 47.092 2 1 1 1 1 1 1 41 36.40 36.32 37.252 2 1 1 1 2 1 1 37 38.77 36.92 36.002 1 1 2 2 2 1 1 29 33.29 39.06 32.61 1 1 2 2 2 1 1 26 20.36 9.63 17.322 2 1 1 1 2 1 2 26 23.68 22.89 24.612 2 1 1 1 1 1 2 25 28.12 22.52 27.322 1 1 1 1 2 1 1 23 22.73 20.06 22.022 1 1 1 2 1 1 1 22 19.22 16.54 16.382 2 1 1 2 2 1 2 22 22.85 25.41 25.362 1 1 1 1 1 1 1 18 21.54 19.74 20.421 2 1 1 1 1 1 1 17 15.06 16.02 15.821 2 1 1 1 2 1 1 16 14.65 16.28 11.702 2 1 1 2 1 1 1 15 14.96 13.01 15.241 1 1 2 1 2 1 1 13 12.06 6.63 10.072 1 1 2 2 1 1 1 11 7.70 12.40 9.012 1 1 2 1 2 1 1 11 10.50 15.05 10.181 1 1 2 1 1 2 1 11 8.08 6.72 6.11

4.3 The NLTCS functional disability dataOur last example involves the 216 contingency table introduced in Section 1. Dobra et al. (6)

analyze these data from a disclosure limitation perspective, while Fienberg et al. (8) develop latentclass (LC) models that are very similar to the Grade of Membership (GoM) models of Erosheva etal. (7). The need to consider alternatives to log-linear models for the NLTCS data comes from thesevere imbalance that exists among the cell counts in this table. The largest cell count is 3853, butmost of the cells (62384 or 95.19%) contain counts of zero, while 1729 (2.64%) contain countsof 1 and 1499 (0.76%) contain counts of 2. There are 24 cells with counts larger than 100 whichaccounts for 42% of the observed sample size 21574. This gives a very small mean number ofobservations per cell of 0.33, which is indicative of an extremely high degree of sparsity that ischaracteristic of high-dimensional categorical data.

We ran the Gibbs sampler algorithm from Section 3 for 1250 iterations from 10 random startinggraphs. The burn-in time was 100 iterations. Within each iteration, we repeated Step 1 two hundredtimes and Step 2 twenty times. This example was run for a shorter number of iterations as the largesample size caused there to be significantly less discrepancy in determining edge probabilities.Indeed, edges that were given high probability were typically included in the graph at an earlystage by all chains and subsequently never excluded, while the determination of the exact value for

12

Page 13: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

0 200 600 1000

0.0

0.2

0.4

0.6

0.8

1.0

ADL5 − IADL5

Iteration

Edg

e P

roba

bilit

y

0 200 600 1000

0.0

0.2

0.4

0.6

0.8

1.0

ADL2 − IADL1

Iteration

Edg

e P

roba

bilit

yFigure 4.4: Estimates of the posterior inclusion probability of edges (ADL5, IADL5) and (ADL2,IADL1) across iterations.

low probability edges took somewhat longer. Figure 4.3 shows the posterior inclusion probabilitiesof edges (ADL5,IADL5) and (ADL2,IADL1). In the case of (ADL5,IADL5) each chain eventuallygives the edge a probability of close to 0.05, with a general level of agreement after iteration 500.However, in the case of the (ADL2,IADL5) edge, once the burn-in period had ended the edge wasincluded in all of the ten chains and was rarely excluded afterwards.

The conditional independence relationships among the 16 latent variables are represented inFigure 4.3 which shows the edges whose posterior inclusion probabilities was greater than 0.5. Theposterior estimates of the off-diagonal elements of Υ corresponding with these edges are strictlypositive, which is intuitively correct: the ability to perform any activity of daily living is positivelyassociated with the ability to perform some other activity.

In order to provide a better understanding of how the posterior estimate for Υ found by copulaGGMs translate into estimates for cell probabilities, we take the six largest cell counts and reportthe fitted cell values associated with the most suitable Grade of Membership (GoM) model ofErosheva et al. (7) and the latent class (LC) model of Fienberg et al. (8) – see Table 4.3. Thecopula GGMs seem to perform comparably with the other two methods in terms of capturing thedependency patterns that lead to the largest counts in this table.

5 DiscussionThe inference approach we presented in this paper extends Gaussian graphical models to

datasets in which the multivariate normal assumption for the observed variables is unlikely to hold.The copula GGMs capture conditional independence relationships among a set of latent variables

13

Page 14: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

ADL1

ADL2

ADL3

IADL1 IADL3

IADL4

IADL6

IADL8

IADL10

ADL6

IADL7

ADL4

IADL9

IADL2

ADL5

IADL5

Figure 4.5: Edges with posterior inclusion probability greater than 0.5 in the NLTCS functionaldisability data.

Table 4.3: Expected cell counts for the top six largest counts cells in the NLTCS data. We reportthe results obtained from the GoM model (7), the LC model (8) and the copula GGMs. Here 1stands for healthy and 2 stands for disabled.

Cell Observed GoM LC Copula GGMs1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3853 3269 3836.01 3766.131 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1107 1010 1111.51 1145.072 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 660 612 646.39 576.121 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 351 331 360.52 449.611 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 303 273 285.27 352.081 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 216 202 220.47 201.74

that are in a one-to-one relationship with the set of observed variables. The fact that the numberof latent variables coincides with the number of observed variables avoids the difficult statisticalissue of having to select the number of latent classes – see, for example, the excellent discussionsin Erosheva et al. (7) and Fienberg et al. (8). This one-to-one correspondence also facilitates theinterpretation of the resulting independence graphs.

Our goal was to model dependencies separately from the univariate marginal distribution ofeach variable. As such, we did not include a parametric representation of the marginal distribu-tions in our framework. In the two contingency tables examples we show the fitted cell probabilitiesmainly as a means of relating our work to existing approaches, and should not be seen as an intrin-sic component of our methodology. Pitt et al. (18) give a Bayesian approach to model conditionalindependence relationships in Gaussian copulas in which the univariate marginal distributions areallowed to depend on a set of parameters and on certain sets of explanatory variables. There is a

14

Page 15: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

definite possibility to combine our prior specification for the precision matrix for the latent vari-ables with the methods of Pitt et al. (18) into a procedure that takes into account the uncertaintyin the specification of the univariate distributions. Unfortunately such an inference method cannotbe employed off-the-shelf since it would require an effort to specify appropriate models for eachvariable. Again, this extension is not necessary if the goal relates to modeling multivariate depen-dencies alone.

The foundation of our copula GGMs is the extended rank likelihood proposed by Hoff (9).Our contribution becomes apparent by contrasting the way we identified conditional independencerelationships by imposing constraints on the elements of the precision matrix with the method ofdefining a dependence graph followed by Hoff (9). In his framework, two variables are consideredto be conditional independent given the rest based on their associated regression parameter. Whilethis approach is sensible, it does not translate into actual constraints for the precision matrix whichis always assumed to be full.

Copula GGMs are applicable to any observational study for the purpose of identifying condi-tional independence relationships. Although the interactions among the latent variables do not gobeyond second-order moments, copula GGMs give sensible results in the analysis of sparse con-tingency tables. It is generally believed that sparse tables cannot support log-linear models withhigher-order interaction terms (22), therefore the use of copula GGMs seems to be reasonable. Ouranalysis of the NLTCS functional disability data is the first analysis that involves graphical models.We hope that copula GGMs will play a significant role in many quantitative fields of research.

AcknowledgmentsThe authors thank Peter Hoff for useful discussions. The authors are also grateful to Elena

Erosheva who provided the NLTCS data.

Appendix: the block Gibbs samplerLet G be an undirected graph and let C1, . . . ,Ck be its set of cliques. For any given clique C

of G and a given |C| × |C| positive definite matrix A, we introduce the following operator from PG

into PG

MC,AK =(

A−1 + KC,V\C(KV\C)−1KV\C,C KC,V\C

KV\C,C KV\C

). (5.1)

which is such that [(MC,AK)−1]C = A (11). Since KV\C,KV\C,C,KC,V\C remain unchanged under thistransformation, it follows that MC,A maps PG into PG.

We describe a sampling method for the G-Wishart distribution WG(δ,D) that was originallydiscussed in Piccioni (17) and further developed in Asci and Piccioni (1). It is based on the theoryof exponential families with cuts. Here a cut is represented by a clique of G.

The algorithm proceeds as follows:

Step 1. Start with K0 = Ip, the p-dimensional identity matrix.

15

Page 16: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

Step 2. At iteration r = 0, 1, . . . do

Step 2A. Set Kr+(0/k) = Kr.

Step 2B. For each j = 1, . . . , k, simulate A from W|C j |(δ,DC j) and set Kr+( j/k) = MC j,A−1 Kr+(( j−1)/k).

Step 2C. Set Kr+1 = Kr+(k/k).

The sequence of matrices (Kr)r≥r0 generated by the block Gibbs sampler are random samples fromWG(δ,D) after a suitable burn-in time r0.

ReferencesA, C. P, M. (2007). Functionally compatible local characteristics for the localspecification of priors in graphical models. Scand. J. Statist. 34, 829–40.

A-K, A. M, H. (2005). A Monte Carlo method for computing the marginallikelihood in nondecomposable Gaussian graphical models. Biometrika 92, 317–35.

C, C. M. S, J. G. (2009). Objective Bayesian model selection in Gaussian graph-ical models. Biometrika. to appear.

D, P., G, P., R, G. (2003). Bayesian inference for nondecomposablegraphical Gaussian models. Sankhya 65, 43–55.

D, P. Y, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7,269–81.

D, A., E, E. A., F, S. E. (2003). Disclosure limitation methods basedon bounds for large contingency tables with application to disability data. In Proceedings ofConference on the New Frontiers of Statistical Data Mining, e. H. Bozdogan, Ed. CRC Press,93–116.

E, E. A., F, S. E., J, C. (2008). Describing disability throughindividual-level mixture models for multivariate binary data. Ann. Appl. Stat. 1, 502–537.

F, S. E., H, P., R, A., Z, Y. (2008). Maximum likelihood estimation inlatent class models for contingency table data. In Algebraic and Geometric Methods in Statistics,P. Gibilisco, E. Riccomagno, M. P. Rogantin, and e. Wynn, H. P., Eds. Cambridge UniversityPress. forthcoming.

H, P. (2007). Extending the rank likelihood for semiparametric copula estimation. Annals ofApplied Statistics 1, 265–283.

J, B., C, C., D, A., H, C., C, C., W, M. (2005). Experiments instochastic computation for high-dimensional graphical models. Statist. Sci. 20, 388–400.

16

Page 17: Copula Gaussian Graphical ModelsCopula Gaussian Graphical Models∗ Technical Report no. 555 Department of Statistics, University of Washington Adrian Dobra Department of Statistics,

L, S. L. (1996). Graphical Models. Oxford University Press.

L, A. D, A. (2008). Bayesian structural learning and estimation in Gaussiangraphical models. Technical Report no. 545, Department of Statistics, University of Washington.

L, G. M, H. (2007). Wishart distributions for decomposable graphs. Ann.Statist. 35, 1278–323.

M, D. Y, J. (1995). Bayesian graphical models for discrete data. InternationalStatistical Review 63, 215–232.

M, K. G., C, L., S, E. (1993). Estimates of change in chronic disabilityand institutional incidence and prevalence rate in the U.S. elderly populations from 1982 to 1989.J. Gerontology: Social Sciences 48, S153–S166.

N, R. B. (1999). An Introduction to Copulas. New York: Springer-Verlag.

P, M. (2000). Independence structure of natural conjugate densities to exponential familiesand the Gibbs Sampler. Scand. J. Statist. 27, 111–27.

P, M., C, D., K, R. (2006). Efficient Bayesian inference for Gaussian copularegression models. Biometrika 93, 537–554.

R, A. (2002). Hyper inverse Wishart distribution for non-decomposable graphs and itsapplication to Bayesian inference for Gaussian graphical models. Scand. J. Statist. 29, 391–411.

S, P. X.-K. (2000). Multivariate dispersion models generated from Gaussian copula. Scandi-navian Journal of Statistics 27, 305–320.

S, T. P. K, H. T. (1986). Gaussian Markov distributions over finite graphs. Ann.Statist. 14, 138–150.

W, J. (2009). Graphical Models in Applied Multivariate Statistics. Wiley.

17