Multimodel Identiﬁcation of Group Structure in Network Data

Multimodel Identification of Group Structure in Network Data∗

Christopher WheatMIT Sloan School of Management

50 Memorial DriveCambridge, MA 02142-1347

[email protected]

November 3, 2008

Abstract

This article proposes a method of identifying the number of groups implied by the pattern of ties ina network based on BICcat—an extension of the Bayesian Information Criterion (BIC). The proposedextension is based on a set of assumptions that derive from specific characteristics of the statisticalevaluation of group structures in networks that diverge from the set of assumptions that underpin mostlarge-scale regression-style empirical social science research. I use a simulation of randomly generatednetworks from the pair-dependent stochastic blockmodel (Anderson et al. 1992) and the p1 stochasticblockmodel distribution (Wang and Wong 1987), along with a multimodel inference technique (Burnhamand Anderson 2004) to demonstrate that BICcat produces less biased estimates of the number of groupsimplied by the pattern of ties in a network than does BIC.

1 Introduction

A considerable amount of attention in recent scholarship has been dedicated to the development of appropri-

ate methods for model selection (Akaike 1974; Schwarz 1978; Rissanen 1983, 1989) and in particular to the

assessment of the applicability of these methods in a social science context (Raftery 1995; Weakliem 1999;

Burnham and Anderson 2004). While a wide range of model selection criteria have been proposed, much of

the debate in the social science literature has centered around the relative merits of the Akaike Information

Criterion (AIC), the Bayesian Information Criterion (BIC), and, to a lesser extent, criteria based on the

Minimum Description Length (Burnham and Anderson 2004; Kuha 2004; Stine 2004). Differences between

each of these model selection criteria can be attributed to differences in their theoretical and philosophical

underpinnings, which may at least in part account for the current lack of consensus about which of these

approaches is best suited to the general task of model selection for social science applications (Weakliem

1999; Raftery 1999; Yang 2005).∗I would like to thank Peter Marsden, Tiziana Casciaro, David Gibson, Joel Podolny, Nitin Norhia, Kate Kellogg, and an

anonymous reviewer for their invaluable feedback on this and earlier versions of this work. I would particularly like to thankDavid Hunter for generous assistance with statnet. Some of the analyses were performed with statnet 1.0 developed withsupport from NIH grants R01DA012831 and R01HD041877.

1

While these differences may make it difficult to reach a conclusion about a method best suited for model

selection problems in general, there are classes of problems within sociology in particular for which some

model selection approaches are likely to be more helpful than others. One such class of problems is the use

of blockmodels in identifying group structure within patterns of social relations (White et al. 1976; Laumann

et al. 1978). Model selection criteria are particularly important in this context, as even the most basic

blockmodel analysis presents a researcher with a set of models that must be evaluated. A critical step in

these analyses is the clustering of actors into groups on the basis of a quantitative measure of similarity—

typically structural equivalence (Lorrain and White 1971) or regular equivalence (White and Reitz 1983).

While some general-purpose clustering approaches explicitly embed a model selection criteria (Fraley and

Raftery 1998, 2002), the assumptions underlying the derivation of these criteria may not always be valid in

the context of modeling group structure in networks.

In this paper, I argue that that BICcat—an estimate of the Bayes Factor based on a different set of

assumptions about the relationship of observations to model parameters than those which underlie the

determination of BIC—is a less biased estimator of the number of groups in a network. I moreover attempt

to illustrate the advantage of applying this criterion in a multimodel parameter estimation framework. In

the following section I derive BICcat and show how, in the context of certain classes of categorical problems,

it shoul provide a more precise estimate of Bayes Factors. Section 3 describes a particular class of network

models that can be used to compare the model selection criteria proposed here to other potential model

selection criterion, and compares the performance of BIC and BICcat by applying them to the analysis of

simulated network data. I conclude with a discussion of the limitations of the approach presented here and

suggestions about how the criterion might be applied to model identification problems relevant to a broader

set of social science contexts.

2 Model Selection and Group Identification

Analyses of social network data that involve the identification of groups can roughly be divided into two

types. In the first of these, partitions of actors into groups heuristically summarize characteristics of the

actors analyzed in order to simplify central theoretical arguments (e.g. White et al. 1976; Gerlach 1992; Grbic

2007). This approach to modeling the structure of networks has been used to identify general types of actors

and their social behavior in cases where the empirical claims arising out of the analysis are not sensitive to

the specific partitioning of actors into groups. For example, Gerlach (1992) uses a blockmodel analysis of

2

ties between Japanese organizations to identify a rough distinction between financial and industrial firms,

which is then decomposed into three types of financial firms and five types of industrial firms. While Gerlach

subsequently describes the relationships between firms in the financial blocks and firms in the industrial

blocks, the analysis is not dependent in a particularly substantive way on whether the selected blockmodel

partitioned the industrial and financial groups into any specific number of subgroups.

The significance of specific group boundaries in this style of analysis can be contrasted to other studies

in which the number of groups itself is the central object of empirical investigation. One particularly

illustrative set of examples of this type of research is a series of studies that use data on interstate relations

to identify categorical structural positions in global political and economic systems (Snyder and Kick 1979;

Van Rossem 1996; Kick and Davis 2001; Alderson and Beckfield 2004; Flandreau and Jobst 2005). For

instance, Van Rossem (1996: 516) uses the results of a blockmodel analysis to implicitly draw substantive

conclusions about the difference between a three-position and a four-position structure of global relations.

Similarly, Flandreau and Jobst (2005: 998-999) specifically argue that it is difficult to understand the

international monetary system without adding an “intermediary” classification of countries to the distinction

between “core” and “peripheral” countries proposed in earlier work. Arguments rooted in specific claims

about the number of groups implied by a given pattern of network ties have been made in other domains, such

as the structure of academic disciplines (Han 2003) and the identification of classroom roles (Van Rossem

and Vermande 2004).

In analysis of this latter type, bias in the estimation of the number of groups implied by a particular

pattern of ties in a network can lead to empirically unsupported substantive conclusions. In this section I

outline how existing model selection approaches might be applied in contexts such as these where the sub-

stantive issue at hand rests critically on identifying the number of groups implied by an observed pattern of

exchanges. I then argue that while that in general, approaches based on Bayes factors have attractive prop-

erties with respect to model selection, in certain cases specific assumptions made in the BIC approximation

of Bayes factors may lead to biased estimates of the number of groups implied by a pattern of network ties.

2.1 Blockmodel Selection Criteria

Given the task of selecting one of a large set of candidate models group structure, empirical researchers

studying networks have employed a variety of selection methods. Kick and Davis (2001: 1566), for instance,

note a “need to be adequately specific while avoiding unwieldy information”, but they do not report a formal

model selection criterion to justify their selection of an eleven-group structure. Some attempts to formalize

3

this logic have included a proposal to use a G2 likelihood-ratio test (Anderson et al. 1992), or using the

extent to which observed relationships correlate with the relationships implied by the group structure to

evaluate candidate groupings (Van Rossem and Vermande 2004: 400)1.

Flandreau and Jobst (2005) base their analysis on a stochastic blockmodel (Nowicki and Snijders 2001)

that explicitly formalizes the ideas of specificity and unwieldiness suggested by Kick and Davis (2001).

In this framework, a parameter Iy is proportional to the average observed log-likelihood of observed ties

conditioned on a model, and a parameter Hx measures the extent to which actors are clearly assigned to

distinct groups by the model. In some sense, Iy is similar to the likelihood function L2(x|θ) incorporated by

general purpose model selection criteria such as the AIC, BIC and MDL, and Hx is similar to the penalty

terms associated with each of these criteria for over-parameterization. However, Nowicki and Snijders do

not propose a method by which these parameters can be combined to formally perform the model selection

task.

Handcock et al. (2007) present the most recent work in this area by proposing a method for identifying

group structure that combines a latent space network model (Hoff et al. 2002) with the BIC and Bayes

factors as general-purpose model selection criteria. In this work, Handcock et al. empirically estimate the

the number of groups in a set of networks. The present article builds on this work by asking whether

the assumptions underlying the applicability of the BIC approximation of Bayes factors (Raftery 1995) are

justified in the specific context of the analysis of group structure within networks. I propose BICcat as a

modified Bayes factor approximation, and then evaluate the comparitive performance of these criteria by

using a multimodel selection approach (Burnham and Anderson 2004) to identify the number of groups in

populations of simulated networks.

2.2 Approximation of Bayes Factors in Categorical Research

A key difference between model selection in the context of identifying group structures in network data and

model selection in the context of much of social science research concerns a difference in assumptions about

the relationship between observations x and model parameters θ = {θ1 . . . θK}. In a wide range of regression-

style social science research, every model parameter θk is assumed to have an effect on all observations x.1Analyses where the similarity of actors is based on the idea of role equivalence (White and Reitz 1983) are faced with a

particular challenge, as it is not clear how to statistically model the likelihood of observed a particular set of network ties xbased on a given assignment of actors to roles θ. One approach has been to only consider models in which sets actors assigned tothe same group are exactly regularly equivalent (Han 2003: 259), though this approach might be less useful in empirical settingswhere ties are observed with any significant degree of error. Alternatively, clusters can be defined by assigning actors that areapproximately regularly equivalent to the same group (Alderson and Beckfield 2004: 835, fn 23), though such a proceduretypically involves an arbitrary choice about a cutoff level of equivalence.

4

This assumption is not generally true of models that seek to identify categories, groups, and boundaries. In

many of these models, there are parameters θk that are theoretically identified as only relevant to a subset

of observations xk ⊆ x. This is perhaps most clearly evident in models of network structure such as the

pair-dependent blockmodel proposed by Holland et al. (1983), where the probability of a tie being sent from

an actor i to an actor j is fully determined by the group r of the sender and the group s of the receiver, such

that

p(xij = 1) = λrs, (1)

where the between group tie densities λrs are i.i.d. In this model, if hr is the number of actors in a group r,

then estimates of λrs are only dependent on the hrhs observations of ties from actors in group r to group s.

Other network models that incorporate group structure such as the p1 stochastic blockmodel (Wang and

Wong 1987) and the p∗ and ERGM (Wasserman and Pattison 1996; Anderson et al. 1999; Snijders et al. 2006)

can incorporate a wide range of other features that may influence the identification of group boundaries, but

in each case these models can incorporate a set of parameters equivalent to λrs. To the extent that these

parallel measures are conceptualized as measuring the independent effect of within- and between-group tie

density on network structure, they should also be thought of as being dependent principally on observations

of ties from actors in the group r to actors in the group s, and independent of all other observed tie values.

There are, of course, models of group and cluster structures in networks, such as the latent space approach

(Hoff et al. 2002; Handcock et al. 2007) in which the estimation of parameters conceptually associated with

group structure are based on all observations of tie data. The intent of this article is not to evaluate the

relative theoretical merit of models such as these and network models based on a more explicitly categorical

view of group structure. The question addressed here is how a model selection criteria might best be applied

in cases where a researcher believes that a model based on explicit categories is theoretically closest to the

empirical phenomenon of interest.

The relationship between estimated parameters θk and observations xk plays a central role in the deriva-

tion of the Bayesian Information Criterion estimate. Raftery (1995: 130) begins this estimation process by

noting that by integrating Bayes’ theorem

p(x) =∫

p(x|θ)p(θ)dθ. (2)

He then defines a function g(θ) = log p(x|θ)p(θ) to represent the integrand, and shows that an approximation

5

of the Taylor expansion of g(θ) around the maximum likelihood estimate θ is

g(θ) ' g(θ) +12(θ − θ)T g′′(θ)(θ − θ). (3)

For large numbers of observations n and cases where the maximum likelihood estimate θ is close to the

“true” value θ, only values of θ that are close to the value of θ will contribute significantly to Equation 2.

In these cases, a reasonable approximation is

p(x) =∫

exp[g(θ)]dθ

' exp[g(θ)]∫

exp[g(θ) +12(θ − θ)T g′′(θ)(θ − θ)]dθ,

(4)

where d is the number of parameters in the model θ. In as much as the integrand in this expression is

proportional to the multivariate normal density this expression can be rewritten as

p(x) = exp[g(θ)](2π)d/2∣∣∣−g′′(θ)

∣∣∣1/2

. (5)

Following this, the logarithm of p(x) can be approximated as

log p(x) = log p(x|θ) + log p(θ) + (d/2) log(2π)− 12

log∣∣∣−g′′(θ)

∣∣∣ + O(n−1). (6)

Here, Raftery (1995: 131) makes a key assumption, namely, that −g′′(θ) can be approximated as ni,

where i is the expected Fisher information matrix for one representative observation. The importance of this

assumption has been noted by Raftery as well as others (Kass and Raftery 1995; Raftery 1999; Weakliem 1999;

Volinsky and Raftery 2000; Handcock et al. 2007), and it has special significance to the problem presented

here. The nature of this assumption is made explicit by Kass and Vaidyanathan (1992:132, Equation 2.6)

who note that one matrix i that justifies this approximation for large n by satisfying

−g′′(θ)n

− i(θ) = O(n−1/2) (7)

would be the Fisher information matrix, but only under the assumption that the observations x are inde-

pendent and identically distributed.

While this is a valid assumption for the regression-style analyses that the BIC is typically applied to, it

is not valid for observations of tie values in networks with group structures based on a distribution such as

6

Equation 1. In such a network, tie values are drawn from distributions that are explicitly dependent on the

group of the sender r and the recipient s—a pair of tie values xij and xi′j′ are only identically distributed

when the sender and recipient groups are the same across the pair, such that r = r′ and s = s′. Intuitively,

in such a model, there is no such thing as a “representative individual observation” across all observations,

but only a representative individual observation for a given sending group r and recipient group s.

If observations of ties from each pair of sending and receiving groups were considered as independent

experiments, the additivity of information suggests that for this model

IX =∑r,s

Irs '∑r,s

hrhsirs. (8)

This logic can be extended to the more general case where the parameters θk are null-orthogonal (Kass and

Vaidyanathan 1992; Kass and Wasserman 1995), but are each on based only on nk = |xk| observations. In

this more general case, this result can be used to define dk = log nk/2 such that

IX '∑

k

dkik. (9)

Substituting this result into Equation 6, while acknowledging the introduction of an O(n1/2) error yields

log p(x) = log p(x|θ) + log p(θ) + (d/2) log(2π)− 12

∑k

log nk −12

∑k

ik + O(n−1/2). (10)

Constant terms in this expression can be dropped such that

log p(x) = log p(x|θ) + log p(θ)− 12

∑k

log nk + O(1). (11)

This in turn leads to an approximation BICcat that should be less biased when applied in the context

of empirical studies, such as the estimation of group structures in social networks, where the number of

observations nk used to estimate many model parameters is significantly less than n. This approximation2

can be expressed as

BICcat(x, θ) = L2(x|θ)− 12

∑k

log nk. (12)

2It is perhaps worth noting that this result, in particular the second term, is consistent with the selection criterion thatwould follow from an MDL or algorithmic complexity-based approach (Shannon 1948; Chaitin 1966; Kolmogorov 1965; Wallaceand Boulton 1968; Rissanen 1983, 1989; Wallace and Dowe 1999; Stine 2004). While approaches to model selection based upthe BIC and MDL or algorithmic complexity have different theoretical underpinnings, the basis of both of these approaches ininformation-theoretic concepts reassuringly leads to selection criteria that are close approximations of one another.

7

If the logic presented here is correct, the fact that nk ≤ n suggests that estimates of the prior p(θ)

based on BIC should be downwardly biased, particularly for those models in which nk is significantly smaller

than n. In the context of selecting blockmodels, where the representative nk is approximately inversely

proportional to b2, where b is the number of blocks in the model, this means that techniques based on BIC

should be biased towards selecting models with fewer blocks. This possibility is explored in the following

section.

3 Simulation

It is difficult to compare the relative performance of different selection criteria using empirical data from real-

world examples, as it is rarely the case that the process by which network ties are generated is known a priori.

In this section, I present comparative structural analyses of networks simulated from known distributions.

The principal objective of this analysis is to compare the bias of BICcat with that of BIC in estimating the

number of groups implied by a given pattern of network ties.

While a generic ERGM (Wasserman and Pattison 1996; Anderson et al. 1999; Snijders et al. 2006), would

allow a wide range of structural network features to be modeled and evaluated, the computational cost of

the Monte Carlo method needed to generate a single network of the size typically studied by social scientists

makes this approach unfeasible for the large number of sample networks needed for a simulation analysis. As

the intention of this analysis is to clarify the distinction between the BICcat and BIC approaches rather than

to illustrate a wide range of structural network features, in the following analysis I generate sample networks

from the pair-dependent stochastic blockmodel (Anderson et al. 1992) characterized by Equation 1, and

from the p1 stochastic blockmodel distribution (Wang and Wong 1987)—a specific extension of this model

described below.

3.1 Estimated Models

The pair-dependent stochastic blockmodel is essentially a collection of Bernoulli distributions between posi-

tions r and s in a given network. Accordingly, simulated networks from this distribution follow straightfor-

wardly from Equation 1. Here I identify features of the p1 stochastic blockmodel most relevant its simulation

in this context, and illustrate the relationship between the p1 and pair-dependent stochastic blockmodels.

The p1 stochastic blockmodel defines a distribution over observed dyads Dij = (xij , xji) where {Dij}

are presumed to be independent of one another conditional on the model. The model allows for variation in

8

the tendency of an actor to send ties αi and receive ties βj , as well as variation in the overall level of dyadic

reciprocity ρ. Like the pair-dependent stochastic blockmodel (Anderson et al. 1992), the λrs parameter of

the p1 stochastic blockmodel allows for variation in the extent to which actors from a group r will send a tie

to actors in group s.

For a given dyad Dij , the model specifies a multinomial distribution over (mij , aij , aji, nij), where

mij = p(Dij = (1, 1)), (13a)

aij = p(Dij = (0, 1)), (13b)

aji = p(Dij = (1, 0)), (13c)

and

nij = p(Dij = (0, 0)), (13d)

such that

mij + aij + aji + nij = 1. (14)

The model specifies the likelihood of observing a specific set of ties x as

p(x|θP1B) =exp{ρm +

∑r,s λrsx++(rs) +

∑i αixi+ +

∑j βjx+j}∏

i<j kij, (15)

where m is the observed number of mutual ties in the network, x++ is the total number of observed ties in

the network, xi+ is the observed number of ties sent by an actor i, x+j is the number of ties received by an

actor j, and x++(rs) represents the total number of ties in the r × s block. In this equation

λij = log(aij/nij) (16a)

and

kij = 1 + eλij + eλji + eρij+λij+λji . (16b)

9

The parameter λij is decomposed as

λij = αi + βj + λrs for all i 6= j, (17)

such that

α+ = β+ = 0. (18)

From this parameterization, it follows that the pair-dependent stochastic blockmodel (Anderson et al.

1992) is a submodel of the p1 stochastic blockmodel where ρ = αi = βj = 0.

3.2 Simulation Design

In order to evaluate the relative bias of the BIC and BICcat measures in the context of blockmodel selection,

I perform two sets of simulations. The first set of simulations is designed to illustrate how the level of bias

in these measures is affected by features of the network being analyzed and the statistical model used in

its evaluation. The second set of simulations is designed to illustrate how the application of these model

selection criteria to networks comparable to those previously analyzed in empirical research might lead an

analyst to substantially different conclusions about the structure of these networks.

To evaluate the basic relationship between network characteristics and the performance of BIC and BICcat

in identifying the number of groups in a newtork, for each type of network model I generate and evaluate 100

random networks of 24 actors each, based on known partitions of actors into 2, 3, 4, 6, 8 or 12 equally-sized

groups. In each of these networks, λrs = 10 if r = s and λrs = −10 if r 6= s, such that in the basic form of

each model, ties are extremely likely within group, and extremely unlikely between groups, corresponding

to a basic model of within-group clustering.

As noted by Nowicki and Snijders (2001), it is more difficult to recover group structures from networks to

the extent that there is not clear separation between the characteristics of the groups. In as much as empir-

ically observed network data often contain unmodeled sources of error that complicate model identification

in this way, networks for this simulation were generated according to a mixed model to illustrate the effect

of noise on the estimation procedures. For instance, the tie probability pij needed to define a pair-dependent

stochastic blockmodel with noise can be defined in terms of the mixed distribution

pij = (1− pn)(mrs + ars) + 0.5pnεij , (19)

10

where 0 ≤ pn ≤ 1 defines the amount of random noise in the model and 0 ≤ εij ≤ 1 is a uniformly distributed

disturbance term. Similarly, the dyadic probabilities mij and aij needed to define a p1 stochastic blockmodel

with noise can be defined in terms of the mixed distribution

mij = (1− pn)eλij+λji

kij+ 0.25pnεij , (20a)

aij = (1− pn)eλij

kij+ 0.25pnεij , (20b)

where λij and kij are defined by Equations 16a and 17. When pn = 0, these models correspond to a

standard stochastic blockmodels of their respective types, and when pn = 1, these model correspond to

random Bernoulli graphs with density λ = 0.5. Intermediate values of pn correspond to models that have

group structure of the proportion (1− pn) and random noise of the proportion pn.

Given an individual network xbi from the population of networks generated by this process, a multimodel

approach (Burnham and Anderson 2004) can be used to estimate the number of distinct groups b that was

used to generate it. Either model selection criterion can be used to determine at least the relative probability

that a given model θb′ with b′ groups generated a network as 2BIC(xbi|θb′ ) or 2BICcat(xbi|θb′ ), respectively. In

principle, a researcher who does not know the actual number of groups b used in generating a network xbi

should assign some prior probability to all possible models of network structure. In this simulation, I consider

a nested set of blockmodels based on 1 to n groups3, and evaluate the expected value of b with respect to

these models as

E(b) =

∑1≤b′≤n p(xbi|θb′)b′∑1≤b′≤n p(xbi|θb′)

. (21)

If a model criterion selection strongly favors a particular model θb∗ , then the prior probability assigned to

that model should be much higher than that for other models, causing E(b) to have a value very close to

b∗. I evaluate the bias of the estimates of b produced by each of the model selection criteria by assessing the

mean of E(b) across the population of radomly generated networks.

3The partition of actors into groups used to generate the sample network is included in this set, and partitions in the set arehierarchically arranged such that a partition with n groups differs from a partition with n + 1 groups only in that the largestgroup in the former is split into two groups in the latter that differ in size by at most one actor. In other words, if within aset of partitions being evaluated, the partition with seven groups has four groups of size 3 and three groups of size 4, then thepartition with eight groups has four groups of size 3, two groups of size 4, and two groups of size 2 formed by splitting one of thefour-member groups in half. This method of generating partitions ensures that the distribution of group sizes within a partitionis minimally skewed, and is consistent with the hierarchical clustering approaches frequently used in empirical analyses of groupstructure in networks.

11

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

3 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

4 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

6 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

8 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

12 Groups

pn

E((b

))b

Figure 1: BIC and BICcat Performance, Pair Dependent Stochastic Blockmodels

12

3.3 Simulation Results

Figure 1 shows the results of analyses of a set of networks generated from the pair-dependent stochastic

blockmodel. In these graphs, the solid line indicates the extent to which the BICcat measure either under-

predicts (E(b)/b < 1) or over-predicts (E(b)/b > 1) the number of groups in the network, and the dotted

line shows the same for the BIC measure.

The graphs illustrate several features of the model selection task as it is affected by the underlying

process being modeled. First, it is clear from the figures that in some sense, in this sample of networks

it was easier to estimate the number of groups in the generating process in cases where there were fewer,

larger groups, and the estimation task was impaired by high rates of unexplained variance from the model

in the data, and that these two features appear to interact with one another. The results suggest that it

was difficult for either method to accurately predict the number of groups in models where the number of

groups was significantly greater than three or four, or when pn was greater than about 20%. These results

are not particularly surprising—as the level of noise grows, the data are more and more consistent with a

model based on a single position with a constant random probability of within group ties. That said, the

BICcat-based measure was substantially less biased than the BIC-based measure across a wide range of noise

rates for models based on an intermediate number of groups.

Figure 2 shows a similar set of analyses from a set of networks generated from the p1 stochastic block-

model. The results are largely the same as those depicted in Figure 1, in that they suggest that it is quite

difficult for either the BICcat or the BIC measure to accurately estimate the number of groups in the gen-

erating stochastic blockmodel for large numbers of groups or high noise rates. The effect of both of these

factors on the ability of these methods to estimate structural features is somewhat more severe for networks

generated from the p1 stochastic blockmodel than it is for the pair-dependent stochastic blockmodel, per-

haps due to the strong simplifying assumptions introduced under p1 in order to ease estimations (Wang and

Wong 1987: 11). These differences notwithstanding, the BICcat-based measure still exhibits a smaller bias

in predicting the number of groups than does the BIC-based measure across a wide range of noise rates.

While these results suggest that the BIC may be more biased in estimating the number of groups in

a network than is the BICcat, the 24-actor simulated networks that these results are based on may not

correspond meaningfully to the kinds of networks that these methods are typically applied to in empirical

social science research. To this end, I here attempt to illustrate the effect of model selection criterion choice

by presenting a set of simulations that should in some ways more closely resemble prior work seeking to

use network analysis to identify group structure. Due to the computational intensity of estimating these

13

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

3 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

4 Groups

pn

E((b

))b

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

6 Groups

pn

E((b

))b

Figure 2: BIC and BICcat Performance, p1 Stochastic Blockmodels

14

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

5 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

10 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

15 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

20

20 Groups

pn

E((b

))

Figure 3: BIC and BICcat Performance—130 Actor Networks

models for large networks, in these analyses I generate 10 networks for each set of parameter values out of

the pair-dependent stochastic blockmodel.

As a first illustration, consider the analysis by Kick and Davis (2001) of networks of interstate relations.

In this analysis, CONCOR is used to divide 130 states into 11 distinct positions. In as much as the CONCOR

procedure is understood to identify actors in structurally equivalent positions with no actor-level variation,

this anlaysis is most consistent with a pair-dependent stochastic blockmodel. To this end, I analyze popu-

lations of 130-actor networks generated out of this distribution, based on even partitions of actors into 5,

10, 15, and 20 groups. The authors do not report within- and between-position tie densities, so I generate

networks across a range of noise rates pn.

Figure 3 presents the results of these analyses. The results for networks generated based on 10 and

15 groups suggest that attempting to use either the BICcat or the unadjusted BIC would likely produce

substantially biased estimates b for networks of this size where the actual number of groups g is roughly in

this range, unless within-group ties are extremely likely, and between-group ties are correspondingly unlikely.

That said, estimates of the number of groups deriving from the BICcat criterion are substantially less biased

15

and much closer to the actual number of groups for significant ranges of noise rates pn.

A second example is based on a similar analysis of how the pattern relations between 163 states can be

used to identify their respective positions in a system of global exchange (Van Rossem 1996). In the original

analysis, the authors use role-equivalence and structural equivalence metrics to partition these states into

four distinct positions. While the selection of a four-position solution seems to be principally theoretically

motivated in this research, it is instructive to consider the conclusions they might have reached had they

attempted to use a model selection criterion to empirically identify the number of groups implicated by the

observed pattern of network ties.

To this end, Figure 4 presents estimations of implied number of groups E(b) for networks of 163 actors

simulated out of the pair-dependent stochastic blockmodel based on 3, 4, 6, 8, and 10 groups. Given the

results from simulations of 24-actor networks, it is perhaps not surprising that both model selection criteria

provide less biased estimates of b in these networks with larger numbers of actors and smaller numbers of

candidate groups. With respect to this specific example, these results suggest that if the pattern of network

ties in the analyzed data really were generated from a four-group structure, then both the BIC and BICcat

criteria would likely esitmate b with more than sufficient accuracy over a wide range of noise rates pn.

On the other hand, this example illustrates a potential concern about the substantive conclusions that

might be drawn due to the downward bias of the BIC criterion in contexts where the number of groups

generating patterns of ties in a network is higher than that postulated by a researcher, particularly when

the proposed group-based model does not predict ties with high accuracy. For instance, considering the

simulations where b = 10 and pn = 0.6, the estimate of b based on BIC is 4.05, while the corresponding

estimate based on BICcat is 9.09. A researcher employing the BIC criterion would reasonably interpret this

result as evidence in support a four-group hypothesis, while a researcher using the BICcat criterion would

likely not. In context with relatively high noise rates pn, this downward bias in general can lead researchers

to find support for models of exchange that may be somewhat more parsimonious than the model that

actually produced the data.

More generally speaking, this example reinforces the idea that, in a multimodel selection context, it is

not sufficient that a selection criteria have minimally-biased performance in the neighborhood of a target

location in the solution space. When applied to this type of estimation task, the bias associated with a

model selection criteria may need to be minimized over the entire solution space. To the extent that the

BICcat does this, as these illustrations suggest that it does, it should outperform the BIC in identifying the

number of groups in a network.

16

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

3 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

4 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

6 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

8 Groups

pn

E((b

))

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

10 Groups

pn

E((b

))

Figure 4: BIC and BICcat Performance—163 Actor Networks

17

4 Discussion

Model selection criteria like the BIC, AIC and MDL are critically important in empirical research in which

the answer to the central theoretical question depends on the model used in analyzing empirical data. The

use of network data to identify the number of groups in a population is certainly an example of this type of

problem, and a domain in which general-purpose model selection criteria have been particularly helpful. In

general, the calculation of exact Bayes factors can be intractible for a wide range of problems—the BICcat

criterion proposed in this paper extends existing work that seeks to identify approximate Bayes Factors in

the specific context of these substantive problems.

While the BIC may be an effective approximation for a wide range of regression-style research in which

all model parameters are estimated on the basis of all observations, it appears that for some range of network

problems, it can produce downwardly-biased estimates of model parameters that are only based on a subset of

the observed data. The BICcat approximation proposed here is based on a set of assumptions that should be

more appropriate for problems like estimating the number of groups b implied by a given pattern of network

ties. The multimodel simulation results presented here bear this out, showing that the bias associated with

estimating the underlying number of groups E(b) is consistently lower when estimates are based on BICcat

rather than the unadjusted BIC.

Moreover, the results presented here suggest that applying the BICcat in a multimodel selection framework

may be an effective approach for other problems where empicial observations are theorized as instances of

distinct categories. Much of contemporary research in sociology has drawn attention to the importance of

boundaries and social categories (Lamont and Molnar 2002), ranging from arguments based on ideal-typical

career paths (Abbott and Hrycak 1990; Stovel et al. 1996; Han and Moen 1999), to arguments about the

benefits and costs of navigating specific social boundaries (Phillips and Zuckerman 2001; Hsu 2006). This

research places special significance on behavior that takes place near or across social boundaries—as such, the

empirical identification of the location of these boundaries may be of key substantive importance. Variations

of the approach outlined here may be useful in such efforts.

While the BICcat may be a less biased estimator of the number of groups implied by a network than does

the BIC, preliminary results in these contexts suggest that it is not necessarily a more efficient estimator.

Estimates E(b) based on BICcat based on a given number of actors n, generating blocks b, and noise level pn

show higher variance than corresponding estimates based on BIC. That said, some amount of the variance

reduction for the BIC-based estimates may be due to its downward bias—differences in variance between

the criteria are not pronounced in populations where estimates E(b) are similar. Further research may more

18

precisely address this issue.

The approach outlined here is also limited in that in can only be applied to identifying groups and

positions that conform to the stucture of specific stochastic network models. This is a limitation principally

because there are sociologically interesting forms of group structure such as those based on the idea of regular

equivalence (White and Reitz 1983) that are yet to be represented as such. If group structure in a particular

social system is in fact driven by regular equivalence rather than structural equivalence, a straightforward

application of the methods presented here could lead to misleading results. Nevertheless, the BICcat can be

applied to the wide range of features of social structure that can be modeled with general purpose stochastic

network models such as the ERGM (Wasserman and Pattison 1996; Anderson et al. 1999; Snijders et al.

2006).

Limitations of the ability to model particular features notwithstanding, the approach presented here

represents a helpful step forward in the statistical analysis of structure in social networks. The results

presented here illustrate how a careful consideration of the relationship between parameter estimates and

the observations upon which they are based can lead to less biased estimations of the parameters that

underpin structural characterizations of these data.

References

Abbott, Andrew and Alexandra Hrycak, 1990. “Measuring Resemblance in Sequence Data: An Optimal

Matching Analysis of Musicians’ Careers.” American Journal of Sociology 96:144–185. ISSN 0002-9602.

Akaike, Hirotugu, 1974. “A New Look at the Statistical Model Identification.” IEEE Transactions on

Automatic Control 19:716–623. ISSN 0018-9286.

Alderson, Arthur S. and Jason Beckfield, 2004. “Power and Position in the World City System.” American

Journal of Sociology 109:811–851.

Anderson, Carolyn J., Stanley Wasserman, and Bradley Crouch, 1999. “A p∗ Primer: Logit Models for

Social Networks.” Social Networks 21:37–66.

Anderson, Carolyn J., Stanley Wasserman, and Katherine Faust, 1992. “Building Stochastic Blockmodels.”

Social Networks 14:137–161.

Burnham, Kenneth P. and David R. Anderson, 2004. “Multimodel Inference: Understanding AIC and BIC

in Model Selection.” Sociological Methods and Research 33:261–304. doi:10.1177/0049124104268644.

19

Chaitin, Gregory J., 1966. “On the Length of Programs for Computing Finite Binary Se-

quences.” Journal of the Association for Computing Machinery 13:547–569. ISSN 0004-5411. doi:

http://doi.acm.org/10.1145/321356.321363.

Flandreau, Marc and Clemens Jobst, 2005. “The Ties that Divide: A Network Analysis of the In-

ternational Monetary System, 1890–1910.” The Journal of Economic History 65:977–1007. doi:

10.1017/S0022050705000379.

Fraley, Chris and Adrian E. Raftery, 1998. “How Many Clusters? Which Clustering Method? Answers Via

Model-Based Cluster Analysis.” The Computer Journal 41:578–588. doi:10.1093/comjnl/41.8.578.

———, 2002. “Model-Based Clustering, Discriminant Analysis, and Density Estimation.” Journal of the

American Statistical Association 97:611–631. ISSN 01621459.

Gerlach, Michael L., 1992. “The Japanese Corporate Network: A Blockmodel Analysis.” Administrative

Science Quarterly 37:105–139.

Grbic, Douglas, 2007. “The source, structure, and stability of control over Japan’s financial sector.” Social

Science Research 36:469–490.

Han, Shin-Kap, 2003. “Tribal regimes in academia: a comparative analysis of market structure across

disciplines.” Social Networks 25:251–280.

Han, Shin-Kap and Phyllis Moen, 1999. “Clocking Out: Temporal Patterning of Retirement.” American

Journal of Sociology 105:191–236. ISSN 0002-9602.

Handcock, Mark S., Adrian E. Raftery, and Jeremy M. Tantrum, 2007. “Model-based clustering for social

networks.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 170:301–354. doi:

doi:10.1111/j.1467-985X.2007.00471.x.

Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock, 2002. “Latent Space Approaches to

Social Network Analysis.” Journal of the American Statistical Association 97:1090–1098. doi:

doi:10.1198/016214502388618906.

Holland, Paul W., Kathryn B. Laskey, and Samuel Leinhardt, 1983. “Stochastic Blockmodels: Some First

Steps.” Social Networks 5:109–137.

20

Hsu, Greta, 2006. “Jacks of All Trades and Masters of None: Audiences’ Reactions to Spanning Genres in

Feature Film Production.” Administrative Science Quarterly 51:420–450. ISSN 0001-8392.

Kass, Robert E. and Adrian E. Raftery, 1995. “Bayes Factors.” Journal of the American Statistical Associ-

ation 90:773–795. ISSN 01621459.

Kass, Robert E. and Suresh K. Vaidyanathan, 1992. “Approximate Bayes Factors and Orthogonal Parame-

ters, with Application to Testing Equality of Two Binomial Proportions.” Journal of the Royal Statistical

Society. Series B (Methodological) 54:129–144. ISSN 00359246.

Kass, Robert E. and Larry Wasserman, 1995. “A Reference Bayesian Test for Nested Hypotheses and its

Relationship to the Schwarz Criterion.” Journal of the American Statistical Association 90:928–934. ISSN

01621459.

Kick, Edward L. and Byron L. Davis, 2001. “World-System Structure and Change: An Analysis of Global

Networks and Economic Growth across Two Time Periods.” American Behavioral Scientist 44:1561–1578.

doi:10.1177/00027640121958050.

Kolmogorov, Andrey N., 1965. “Three approaches to the quantitative definition of complexity.” Problems

in Information Transmission 1:4–7.

Kuha, Jouni, 2004. “AIC and BIC: Comparisons of Assumptions and Performance.” Sociological Methods

and Research 33:188–229. doi:10.1177/0049124103262065.

Lamont, Michele and Virag Molnar, 2002. “The Study of Boundaries in the Social Sciences.” Annual Review

of Sociology 28:167–195.

Laumann, Edward O., Joseph Galaskiewicz, and Peter V. Marsden, 1978. “Community Structure as Interor-

ganizational Linkages.” Annual Review of Sociology 4:455–484. doi:10.1146/annurev.so.04.080178.002323.

Lorrain, Francois P. and Harrison C. White, 1971. “Structural Equivalence of Individuals in Social Networks.”

Journal of Mathematical Sociology 1:48–80.

Nowicki, Krzysztof and Tom A. B. Snijders, 2001. “Estimation and Prediction for Stochastic Blockstruc-

tures.” Journal of the American Statistical Association 96:1077–1087.

Phillips, Damon J. and Ezra W. Zuckerman, 2001. “Middle-Status Conformity: Theoretical Restatement

and Empirical Demonstration in Two Markets.” American Journal of Sociology 107:379–429.

21

Raftery, Adrian E., 1995. “Bayesian Model Selection in Social Research.” In “Sociological Methodology

1995,” , edited by Peter V. Marsden, pp. 111–196. San Francisco: Jossey-Bass.

———, 1999. “Bayes Factors and BIC: Comment on “A Critique of the Bayesian Information Criterion for

Model Selection”.” Sociological Methods and Research 27:411–427. doi:10.1177/0049124199027003005.

Rissanen, Jorma, 1983. “A Universal Prior for Integers and Estimation by Minimum Description Length.”

The Annals of Statistics 11:416–431. ISSN 0090-5364.

———, 1989. Stochastic Complexity in Statistical Inquiry. Teaneck, N.J.: World Scientific.

Schwarz, Gideon, 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6:461–464. ISSN

00905364.

Shannon, Claude E., 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal

27:379–423.

Snijders, Tom A. B., Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock, 2006. “New

Specifications for Exponential Random Graph Models.” Sociological Methodology 36:forthcoming. doi:

10.1111/j.1467-9531.2006.00171.x.

Snyder, David and Edward L. Kick, 1979. “Structural Position in the World System and Economic Growth,

1955-1970: A Multiple-Network Analysis of Transnational Interactions.” American Journal of Sociology

84:1096–1126. ISSN 0002-9602.

Stine, Robert A., 2004. “Model Selection Using Information Theory and the MDL Principle.” Sociological

Methods and Research 33:230–260. doi:10.1177/0049124103262064.

Stovel, Katherine, Michael Savage, and Peter Bearman, 1996. “Ascription into Achievement: Models of

Career Systems at Lloyds Bank, 1890-1970.” American Journal of Sociology 102:358–399. ISSN 0002-

9602.

Van Rossem, Ronan, 1996. “The World System Paradigm as General Theory of Development: A Cross-

National Test.” American Sociological Review 61:508–527. ISSN 00031224.

Van Rossem, Ronan and Marjolijn M. Vermande, 2004. “Classroom Roles and School Adjustment.” Social

Psychology Quarterly 67:396–411. ISSN 01902725.

22

Volinsky, Chris T. and Adrian E. Raftery, 2000. “Bayesian Information Criterion for Censored Survival

Models.” Biometrics 56:256–262. ISSN 0006341X.

Wallace, Christopher S. and David M. Boulton, 1968. “An Information Measure for Classification.” The

Computer Journal 11:185–194.

Wallace, Christopher S. and David L. Dowe, 1999. “Minimum Message Length and Kolmogorov Complexity.”

The Computer Journal 42:270–283.

Wang, Yuchung J. and George Y. Wong, 1987. “Stochastic Blockmodels for Directed Graphs.” Journal of

the American Statistical Association 82:8–19.

Wasserman, Stanley and Phillipa Pattison, 1996. “Logit Models and Logistic Regression for Social Networks:

I. An Introduction to Markov Graphs and p∗.” Psychometrika 61:401–425.

Weakliem, David L., 1999. “A Critique of the Bayesian Information Criterion for Model Selection.” Socio-

logical Methods and Research 27:359–397. doi:10.1177/0049124199027003002.

White, Douglas R. and Karl P. Reitz, 1983. “Graph and Semi-group Homomorphisms on Networks of

Relations.” Social Networks 6:193–235.

White, Harrison C., Scott C. Boorman, and Ronald L. Breiger, 1976. “Social Structure from Multiple

Networks. I. Blockmodels of Roles and Positions.” American Journal of Sociology 81:730–779.

Yang, Yuhong, 2005. “Can the strengths of AIC and BIC be shared? A conflict between model indentification

and regression estimation.” Biometrika 92:937–950. doi:10.1093/biomet/92.4.937.

23

Documents

Multimodel Identiﬁcation of Group Structure in Network Data