Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Multimodel Identification of Group Structure in Network Data∗
Christopher WheatMIT Sloan School of Management
50 Memorial DriveCambridge, MA 02142-1347
November 3, 2008
Abstract
This article proposes a method of identifying the number of groups implied by the pattern of ties ina network based on BICcat—an extension of the Bayesian Information Criterion (BIC). The proposedextension is based on a set of assumptions that derive from specific characteristics of the statisticalevaluation of group structures in networks that diverge from the set of assumptions that underpin mostlarge-scale regression-style empirical social science research. I use a simulation of randomly generatednetworks from the pair-dependent stochastic blockmodel (Anderson et al. 1992) and the p1 stochasticblockmodel distribution (Wang and Wong 1987), along with a multimodel inference technique (Burnhamand Anderson 2004) to demonstrate that BICcat produces less biased estimates of the number of groupsimplied by the pattern of ties in a network than does BIC.
1 Introduction
A considerable amount of attention in recent scholarship has been dedicated to the development of appropri-
ate methods for model selection (Akaike 1974; Schwarz 1978; Rissanen 1983, 1989) and in particular to the
assessment of the applicability of these methods in a social science context (Raftery 1995; Weakliem 1999;
Burnham and Anderson 2004). While a wide range of model selection criteria have been proposed, much of
the debate in the social science literature has centered around the relative merits of the Akaike Information
Criterion (AIC), the Bayesian Information Criterion (BIC), and, to a lesser extent, criteria based on the
Minimum Description Length (Burnham and Anderson 2004; Kuha 2004; Stine 2004). Differences between
each of these model selection criteria can be attributed to differences in their theoretical and philosophical
underpinnings, which may at least in part account for the current lack of consensus about which of these
approaches is best suited to the general task of model selection for social science applications (Weakliem
1999; Raftery 1999; Yang 2005).∗I would like to thank Peter Marsden, Tiziana Casciaro, David Gibson, Joel Podolny, Nitin Norhia, Kate Kellogg, and an
anonymous reviewer for their invaluable feedback on this and earlier versions of this work. I would particularly like to thankDavid Hunter for generous assistance with statnet. Some of the analyses were performed with statnet 1.0 developed withsupport from NIH grants R01DA012831 and R01HD041877.
1
While these differences may make it difficult to reach a conclusion about a method best suited for model
selection problems in general, there are classes of problems within sociology in particular for which some
model selection approaches are likely to be more helpful than others. One such class of problems is the use
of blockmodels in identifying group structure within patterns of social relations (White et al. 1976; Laumann
et al. 1978). Model selection criteria are particularly important in this context, as even the most basic
blockmodel analysis presents a researcher with a set of models that must be evaluated. A critical step in
these analyses is the clustering of actors into groups on the basis of a quantitative measure of similarity—
typically structural equivalence (Lorrain and White 1971) or regular equivalence (White and Reitz 1983).
While some general-purpose clustering approaches explicitly embed a model selection criteria (Fraley and
Raftery 1998, 2002), the assumptions underlying the derivation of these criteria may not always be valid in
the context of modeling group structure in networks.
In this paper, I argue that that BICcat—an estimate of the Bayes Factor based on a different set of
assumptions about the relationship of observations to model parameters than those which underlie the
determination of BIC—is a less biased estimator of the number of groups in a network. I moreover attempt
to illustrate the advantage of applying this criterion in a multimodel parameter estimation framework. In
the following section I derive BICcat and show how, in the context of certain classes of categorical problems,
it shoul provide a more precise estimate of Bayes Factors. Section 3 describes a particular class of network
models that can be used to compare the model selection criteria proposed here to other potential model
selection criterion, and compares the performance of BIC and BICcat by applying them to the analysis of
simulated network data. I conclude with a discussion of the limitations of the approach presented here and
suggestions about how the criterion might be applied to model identification problems relevant to a broader
set of social science contexts.
2 Model Selection and Group Identification
Analyses of social network data that involve the identification of groups can roughly be divided into two
types. In the first of these, partitions of actors into groups heuristically summarize characteristics of the
actors analyzed in order to simplify central theoretical arguments (e.g. White et al. 1976; Gerlach 1992; Grbic
2007). This approach to modeling the structure of networks has been used to identify general types of actors
and their social behavior in cases where the empirical claims arising out of the analysis are not sensitive to
the specific partitioning of actors into groups. For example, Gerlach (1992) uses a blockmodel analysis of
2
ties between Japanese organizations to identify a rough distinction between financial and industrial firms,
which is then decomposed into three types of financial firms and five types of industrial firms. While Gerlach
subsequently describes the relationships between firms in the financial blocks and firms in the industrial
blocks, the analysis is not dependent in a particularly substantive way on whether the selected blockmodel
partitioned the industrial and financial groups into any specific number of subgroups.
The significance of specific group boundaries in this style of analysis can be contrasted to other studies
in which the number of groups itself is the central object of empirical investigation. One particularly
illustrative set of examples of this type of research is a series of studies that use data on interstate relations
to identify categorical structural positions in global political and economic systems (Snyder and Kick 1979;
Van Rossem 1996; Kick and Davis 2001; Alderson and Beckfield 2004; Flandreau and Jobst 2005). For
instance, Van Rossem (1996: 516) uses the results of a blockmodel analysis to implicitly draw substantive
conclusions about the difference between a three-position and a four-position structure of global relations.
Similarly, Flandreau and Jobst (2005: 998-999) specifically argue that it is difficult to understand the
international monetary system without adding an “intermediary” classification of countries to the distinction
between “core” and “peripheral” countries proposed in earlier work. Arguments rooted in specific claims
about the number of groups implied by a given pattern of network ties have been made in other domains, such
as the structure of academic disciplines (Han 2003) and the identification of classroom roles (Van Rossem
and Vermande 2004).
In analysis of this latter type, bias in the estimation of the number of groups implied by a particular
pattern of ties in a network can lead to empirically unsupported substantive conclusions. In this section I
outline how existing model selection approaches might be applied in contexts such as these where the sub-
stantive issue at hand rests critically on identifying the number of groups implied by an observed pattern of
exchanges. I then argue that while that in general, approaches based on Bayes factors have attractive prop-
erties with respect to model selection, in certain cases specific assumptions made in the BIC approximation
of Bayes factors may lead to biased estimates of the number of groups implied by a pattern of network ties.
2.1 Blockmodel Selection Criteria
Given the task of selecting one of a large set of candidate models group structure, empirical researchers
studying networks have employed a variety of selection methods. Kick and Davis (2001: 1566), for instance,
note a “need to be adequately specific while avoiding unwieldy information”, but they do not report a formal
model selection criterion to justify their selection of an eleven-group structure. Some attempts to formalize
3
this logic have included a proposal to use a G2 likelihood-ratio test (Anderson et al. 1992), or using the
extent to which observed relationships correlate with the relationships implied by the group structure to
evaluate candidate groupings (Van Rossem and Vermande 2004: 400)1.
Flandreau and Jobst (2005) base their analysis on a stochastic blockmodel (Nowicki and Snijders 2001)
that explicitly formalizes the ideas of specificity and unwieldiness suggested by Kick and Davis (2001).
In this framework, a parameter Iy is proportional to the average observed log-likelihood of observed ties
conditioned on a model, and a parameter Hx measures the extent to which actors are clearly assigned to
distinct groups by the model. In some sense, Iy is similar to the likelihood function L2(x|θ) incorporated by
general purpose model selection criteria such as the AIC, BIC and MDL, and Hx is similar to the penalty
terms associated with each of these criteria for over-parameterization. However, Nowicki and Snijders do
not propose a method by which these parameters can be combined to formally perform the model selection
task.
Handcock et al. (2007) present the most recent work in this area by proposing a method for identifying
group structure that combines a latent space network model (Hoff et al. 2002) with the BIC and Bayes
factors as general-purpose model selection criteria. In this work, Handcock et al. empirically estimate the
the number of groups in a set of networks. The present article builds on this work by asking whether
the assumptions underlying the applicability of the BIC approximation of Bayes factors (Raftery 1995) are
justified in the specific context of the analysis of group structure within networks. I propose BICcat as a
modified Bayes factor approximation, and then evaluate the comparitive performance of these criteria by
using a multimodel selection approach (Burnham and Anderson 2004) to identify the number of groups in
populations of simulated networks.
2.2 Approximation of Bayes Factors in Categorical Research
A key difference between model selection in the context of identifying group structures in network data and
model selection in the context of much of social science research concerns a difference in assumptions about
the relationship between observations x and model parameters θ = {θ1 . . . θK}. In a wide range of regression-
style social science research, every model parameter θk is assumed to have an effect on all observations x.1Analyses where the similarity of actors is based on the idea of role equivalence (White and Reitz 1983) are faced with a
particular challenge, as it is not clear how to statistically model the likelihood of observed a particular set of network ties xbased on a given assignment of actors to roles θ. One approach has been to only consider models in which sets actors assigned tothe same group are exactly regularly equivalent (Han 2003: 259), though this approach might be less useful in empirical settingswhere ties are observed with any significant degree of error. Alternatively, clusters can be defined by assigning actors that areapproximately regularly equivalent to the same group (Alderson and Beckfield 2004: 835, fn 23), though such a proceduretypically involves an arbitrary choice about a cutoff level of equivalence.
4
This assumption is not generally true of models that seek to identify categories, groups, and boundaries. In
many of these models, there are parameters θk that are theoretically identified as only relevant to a subset
of observations xk ⊆ x. This is perhaps most clearly evident in models of network structure such as the
pair-dependent blockmodel proposed by Holland et al. (1983), where the probability of a tie being sent from
an actor i to an actor j is fully determined by the group r of the sender and the group s of the receiver, such
that
p(xij = 1) = λrs, (1)
where the between group tie densities λrs are i.i.d. In this model, if hr is the number of actors in a group r,
then estimates of λrs are only dependent on the hrhs observations of ties from actors in group r to group s.
Other network models that incorporate group structure such as the p1 stochastic blockmodel (Wang and
Wong 1987) and the p∗ and ERGM (Wasserman and Pattison 1996; Anderson et al. 1999; Snijders et al. 2006)
can incorporate a wide range of other features that may influence the identification of group boundaries, but
in each case these models can incorporate a set of parameters equivalent to λrs. To the extent that these
parallel measures are conceptualized as measuring the independent effect of within- and between-group tie
density on network structure, they should also be thought of as being dependent principally on observations
of ties from actors in the group r to actors in the group s, and independent of all other observed tie values.
There are, of course, models of group and cluster structures in networks, such as the latent space approach
(Hoff et al. 2002; Handcock et al. 2007) in which the estimation of parameters conceptually associated with
group structure are based on all observations of tie data. The intent of this article is not to evaluate the
relative theoretical merit of models such as these and network models based on a more explicitly categorical
view of group structure. The question addressed here is how a model selection criteria might best be applied
in cases where a researcher believes that a model based on explicit categories is theoretically closest to the
empirical phenomenon of interest.
The relationship between estimated parameters θk and observations xk plays a central role in the deriva-
tion of the Bayesian Information Criterion estimate. Raftery (1995: 130) begins this estimation process by
noting that by integrating Bayes’ theorem
p(x) =∫
p(x|θ)p(θ)dθ. (2)
He then defines a function g(θ) = log p(x|θ)p(θ) to represent the integrand, and shows that an approximation
5
of the Taylor expansion of g(θ) around the maximum likelihood estimate θ is
g(θ) ' g(θ) +12(θ − θ)T g′′(θ)(θ − θ). (3)
For large numbers of observations n and cases where the maximum likelihood estimate θ is close to the
“true” value θ, only values of θ that are close to the value of θ will contribute significantly to Equation 2.
In these cases, a reasonable approximation is
p(x) =∫
exp[g(θ)]dθ
' exp[g(θ)]∫
exp[g(θ) +12(θ − θ)T g′′(θ)(θ − θ)]dθ,
(4)
where d is the number of parameters in the model θ. In as much as the integrand in this expression is
proportional to the multivariate normal density this expression can be rewritten as
p(x) = exp[g(θ)](2π)d/2∣∣∣−g′′(θ)
∣∣∣1/2
. (5)
Following this, the logarithm of p(x) can be approximated as
log p(x) = log p(x|θ) + log p(θ) + (d/2) log(2π)− 12
log∣∣∣−g′′(θ)
∣∣∣ + O(n−1). (6)
Here, Raftery (1995: 131) makes a key assumption, namely, that −g′′(θ) can be approximated as ni,
where i is the expected Fisher information matrix for one representative observation. The importance of this
assumption has been noted by Raftery as well as others (Kass and Raftery 1995; Raftery 1999; Weakliem 1999;
Volinsky and Raftery 2000; Handcock et al. 2007), and it has special significance to the problem presented
here. The nature of this assumption is made explicit by Kass and Vaidyanathan (1992:132, Equation 2.6)
who note that one matrix i that justifies this approximation for large n by satisfying
−g′′(θ)n
− i(θ) = O(n−1/2) (7)
would be the Fisher information matrix, but only under the assumption that the observations x are inde-
pendent and identically distributed.
While this is a valid assumption for the regression-style analyses that the BIC is typically applied to, it
is not valid for observations of tie values in networks with group structures based on a distribution such as
6
Equation 1. In such a network, tie values are drawn from distributions that are explicitly dependent on the
group of the sender r and the recipient s—a pair of tie values xij and xi′j′ are only identically distributed
when the sender and recipient groups are the same across the pair, such that r = r′ and s = s′. Intuitively,
in such a model, there is no such thing as a “representative individual observation” across all observations,
but only a representative individual observation for a given sending group r and recipient group s.
If observations of ties from each pair of sending and receiving groups were considered as independent
experiments, the additivity of information suggests that for this model
IX =∑r,s
Irs '∑r,s
hrhsirs. (8)
This logic can be extended to the more general case where the parameters θk are null-orthogonal (Kass and
Vaidyanathan 1992; Kass and Wasserman 1995), but are each on based only on nk = |xk| observations. In
this more general case, this result can be used to define dk = log nk/2 such that
IX '∑
k
dkik. (9)
Substituting this result into Equation 6, while acknowledging the introduction of an O(n1/2) error yields
log p(x) = log p(x|θ) + log p(θ) + (d/2) log(2π)− 12
∑k
log nk −12
∑k
ik + O(n−1/2). (10)
Constant terms in this expression can be dropped such that
log p(x) = log p(x|θ) + log p(θ)− 12
∑k
log nk + O(1). (11)
This in turn leads to an approximation BICcat that should be less biased when applied in the context
of empirical studies, such as the estimation of group structures in social networks, where the number of
observations nk used to estimate many model parameters is significantly less than n. This approximation2
can be expressed as
BICcat(x, θ) = L2(x|θ)− 12
∑k
log nk. (12)
2It is perhaps worth noting that this result, in particular the second term, is consistent with the selection criterion thatwould follow from an MDL or algorithmic complexity-based approach (Shannon 1948; Chaitin 1966; Kolmogorov 1965; Wallaceand Boulton 1968; Rissanen 1983, 1989; Wallace and Dowe 1999; Stine 2004). While approaches to model selection based upthe BIC and MDL or algorithmic complexity have different theoretical underpinnings, the basis of both of these approaches ininformation-theoretic concepts reassuringly leads to selection criteria that are close approximations of one another.
7
If the logic presented here is correct, the fact that nk ≤ n suggests that estimates of the prior p(θ)
based on BIC should be downwardly biased, particularly for those models in which nk is significantly smaller
than n. In the context of selecting blockmodels, where the representative nk is approximately inversely
proportional to b2, where b is the number of blocks in the model, this means that techniques based on BIC
should be biased towards selecting models with fewer blocks. This possibility is explored in the following
section.
3 Simulation
It is difficult to compare the relative performance of different selection criteria using empirical data from real-
world examples, as it is rarely the case that the process by which network ties are generated is known a priori.
In this section, I present comparative structural analyses of networks simulated from known distributions.
The principal objective of this analysis is to compare the bias of BICcat with that of BIC in estimating the
number of groups implied by a given pattern of network ties.
While a generic ERGM (Wasserman and Pattison 1996; Anderson et al. 1999; Snijders et al. 2006), would
allow a wide range of structural network features to be modeled and evaluated, the computational cost of
the Monte Carlo method needed to generate a single network of the size typically studied by social scientists
makes this approach unfeasible for the large number of sample networks needed for a simulation analysis. As
the intention of this analysis is to clarify the distinction between the BICcat and BIC approaches rather than
to illustrate a wide range of structural network features, in the following analysis I generate sample networks
from the pair-dependent stochastic blockmodel (Anderson et al. 1992) characterized by Equation 1, and
from the p1 stochastic blockmodel distribution (Wang and Wong 1987)—a specific extension of this model
described below.
3.1 Estimated Models
The pair-dependent stochastic blockmodel is essentially a collection of Bernoulli distributions between posi-
tions r and s in a given network. Accordingly, simulated networks from this distribution follow straightfor-
wardly from Equation 1. Here I identify features of the p1 stochastic blockmodel most relevant its simulation
in this context, and illustrate the relationship between the p1 and pair-dependent stochastic blockmodels.
The p1 stochastic blockmodel defines a distribution over observed dyads Dij = (xij , xji) where {Dij}
are presumed to be independent of one another conditional on the model. The model allows for variation in
8
the tendency of an actor to send ties αi and receive ties βj , as well as variation in the overall level of dyadic
reciprocity ρ. Like the pair-dependent stochastic blockmodel (Anderson et al. 1992), the λrs parameter of
the p1 stochastic blockmodel allows for variation in the extent to which actors from a group r will send a tie
to actors in group s.
For a given dyad Dij , the model specifies a multinomial distribution over (mij , aij , aji, nij), where
mij = p(Dij = (1, 1)), (13a)
aij = p(Dij = (0, 1)), (13b)
aji = p(Dij = (1, 0)), (13c)
and
nij = p(Dij = (0, 0)), (13d)
such that
mij + aij + aji + nij = 1. (14)
The model specifies the likelihood of observing a specific set of ties x as
p(x|θP1B) =exp{ρm +
∑r,s λrsx++(rs) +
∑i αixi+ +
∑j βjx+j}∏
i<j kij, (15)
where m is the observed number of mutual ties in the network, x++ is the total number of observed ties in
the network, xi+ is the observed number of ties sent by an actor i, x+j is the number of ties received by an
actor j, and x++(rs) represents the total number of ties in the r × s block. In this equation
λij = log(aij/nij) (16a)
and
kij = 1 + eλij + eλji + eρij+λij+λji . (16b)
9
The parameter λij is decomposed as
λij = αi + βj + λrs for all i 6= j, (17)
such that
α+ = β+ = 0. (18)
From this parameterization, it follows that the pair-dependent stochastic blockmodel (Anderson et al.
1992) is a submodel of the p1 stochastic blockmodel where ρ = αi = βj = 0.
3.2 Simulation Design
In order to evaluate the relative bias of the BIC and BICcat measures in the context of blockmodel selection,
I perform two sets of simulations. The first set of simulations is designed to illustrate how the level of bias
in these measures is affected by features of the network being analyzed and the statistical model used in
its evaluation. The second set of simulations is designed to illustrate how the application of these model
selection criteria to networks comparable to those previously analyzed in empirical research might lead an
analyst to substantially different conclusions about the structure of these networks.
To evaluate the basic relationship between network characteristics and the performance of BIC and BICcat
in identifying the number of groups in a newtork, for each type of network model I generate and evaluate 100
random networks of 24 actors each, based on known partitions of actors into 2, 3, 4, 6, 8 or 12 equally-sized
groups. In each of these networks, λrs = 10 if r = s and λrs = −10 if r 6= s, such that in the basic form of
each model, ties are extremely likely within group, and extremely unlikely between groups, corresponding
to a basic model of within-group clustering.
As noted by Nowicki and Snijders (2001), it is more difficult to recover group structures from networks to
the extent that there is not clear separation between the characteristics of the groups. In as much as empir-
ically observed network data often contain unmodeled sources of error that complicate model identification
in this way, networks for this simulation were generated according to a mixed model to illustrate the effect
of noise on the estimation procedures. For instance, the tie probability pij needed to define a pair-dependent
stochastic blockmodel with noise can be defined in terms of the mixed distribution
pij = (1− pn)(mrs + ars) + 0.5pnεij , (19)
10
where 0 ≤ pn ≤ 1 defines the amount of random noise in the model and 0 ≤ εij ≤ 1 is a uniformly distributed
disturbance term. Similarly, the dyadic probabilities mij and aij needed to define a p1 stochastic blockmodel
with noise can be defined in terms of the mixed distribution
mij = (1− pn)eλij+λji
kij+ 0.25pnεij , (20a)
aij = (1− pn)eλij
kij+ 0.25pnεij , (20b)
where λij and kij are defined by Equations 16a and 17. When pn = 0, these models correspond to a
standard stochastic blockmodels of their respective types, and when pn = 1, these model correspond to
random Bernoulli graphs with density λ = 0.5. Intermediate values of pn correspond to models that have
group structure of the proportion (1− pn) and random noise of the proportion pn.
Given an individual network xbi from the population of networks generated by this process, a multimodel
approach (Burnham and Anderson 2004) can be used to estimate the number of distinct groups b that was
used to generate it. Either model selection criterion can be used to determine at least the relative probability
that a given model θb′ with b′ groups generated a network as 2BIC(xbi|θb′ ) or 2BICcat(xbi|θb′ ), respectively. In
principle, a researcher who does not know the actual number of groups b used in generating a network xbi
should assign some prior probability to all possible models of network structure. In this simulation, I consider
a nested set of blockmodels based on 1 to n groups3, and evaluate the expected value of b with respect to
these models as
E(b) =
∑1≤b′≤n p(xbi|θb′)b′∑1≤b′≤n p(xbi|θb′)
. (21)
If a model criterion selection strongly favors a particular model θb∗ , then the prior probability assigned to
that model should be much higher than that for other models, causing E(b) to have a value very close to
b∗. I evaluate the bias of the estimates of b produced by each of the model selection criteria by assessing the
mean of E(b) across the population of radomly generated networks.
3The partition of actors into groups used to generate the sample network is included in this set, and partitions in the set arehierarchically arranged such that a partition with n groups differs from a partition with n + 1 groups only in that the largestgroup in the former is split into two groups in the latter that differ in size by at most one actor. In other words, if within aset of partitions being evaluated, the partition with seven groups has four groups of size 3 and three groups of size 4, then thepartition with eight groups has four groups of size 3, two groups of size 4, and two groups of size 2 formed by splitting one of thefour-member groups in half. This method of generating partitions ensures that the distribution of group sizes within a partitionis minimally skewed, and is consistent with the hierarchical clustering approaches frequently used in empirical analyses of groupstructure in networks.
11
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
3 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
4 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
6 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
8 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
12 Groups
pn
E((b
))b
Figure 1: BIC and BICcat Performance, Pair Dependent Stochastic Blockmodels
12
3.3 Simulation Results
Figure 1 shows the results of analyses of a set of networks generated from the pair-dependent stochastic
blockmodel. In these graphs, the solid line indicates the extent to which the BICcat measure either under-
predicts (E(b)/b < 1) or over-predicts (E(b)/b > 1) the number of groups in the network, and the dotted
line shows the same for the BIC measure.
The graphs illustrate several features of the model selection task as it is affected by the underlying
process being modeled. First, it is clear from the figures that in some sense, in this sample of networks
it was easier to estimate the number of groups in the generating process in cases where there were fewer,
larger groups, and the estimation task was impaired by high rates of unexplained variance from the model
in the data, and that these two features appear to interact with one another. The results suggest that it
was difficult for either method to accurately predict the number of groups in models where the number of
groups was significantly greater than three or four, or when pn was greater than about 20%. These results
are not particularly surprising—as the level of noise grows, the data are more and more consistent with a
model based on a single position with a constant random probability of within group ties. That said, the
BICcat-based measure was substantially less biased than the BIC-based measure across a wide range of noise
rates for models based on an intermediate number of groups.
Figure 2 shows a similar set of analyses from a set of networks generated from the p1 stochastic block-
model. The results are largely the same as those depicted in Figure 1, in that they suggest that it is quite
difficult for either the BICcat or the BIC measure to accurately estimate the number of groups in the gen-
erating stochastic blockmodel for large numbers of groups or high noise rates. The effect of both of these
factors on the ability of these methods to estimate structural features is somewhat more severe for networks
generated from the p1 stochastic blockmodel than it is for the pair-dependent stochastic blockmodel, per-
haps due to the strong simplifying assumptions introduced under p1 in order to ease estimations (Wang and
Wong 1987: 11). These differences notwithstanding, the BICcat-based measure still exhibits a smaller bias
in predicting the number of groups than does the BIC-based measure across a wide range of noise rates.
While these results suggest that the BIC may be more biased in estimating the number of groups in
a network than is the BICcat, the 24-actor simulated networks that these results are based on may not
correspond meaningfully to the kinds of networks that these methods are typically applied to in empirical
social science research. To this end, I here attempt to illustrate the effect of model selection criterion choice
by presenting a set of simulations that should in some ways more closely resemble prior work seeking to
use network analysis to identify group structure. Due to the computational intensity of estimating these
13
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
3 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
4 Groups
pn
E((b
))b
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
6 Groups
pn
E((b
))b
Figure 2: BIC and BICcat Performance, p1 Stochastic Blockmodels
14
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
5 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
10 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
15 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
20
20 Groups
pn
E((b
))
Figure 3: BIC and BICcat Performance—130 Actor Networks
models for large networks, in these analyses I generate 10 networks for each set of parameter values out of
the pair-dependent stochastic blockmodel.
As a first illustration, consider the analysis by Kick and Davis (2001) of networks of interstate relations.
In this analysis, CONCOR is used to divide 130 states into 11 distinct positions. In as much as the CONCOR
procedure is understood to identify actors in structurally equivalent positions with no actor-level variation,
this anlaysis is most consistent with a pair-dependent stochastic blockmodel. To this end, I analyze popu-
lations of 130-actor networks generated out of this distribution, based on even partitions of actors into 5,
10, 15, and 20 groups. The authors do not report within- and between-position tie densities, so I generate
networks across a range of noise rates pn.
Figure 3 presents the results of these analyses. The results for networks generated based on 10 and
15 groups suggest that attempting to use either the BICcat or the unadjusted BIC would likely produce
substantially biased estimates b for networks of this size where the actual number of groups g is roughly in
this range, unless within-group ties are extremely likely, and between-group ties are correspondingly unlikely.
That said, estimates of the number of groups deriving from the BICcat criterion are substantially less biased
15
and much closer to the actual number of groups for significant ranges of noise rates pn.
A second example is based on a similar analysis of how the pattern relations between 163 states can be
used to identify their respective positions in a system of global exchange (Van Rossem 1996). In the original
analysis, the authors use role-equivalence and structural equivalence metrics to partition these states into
four distinct positions. While the selection of a four-position solution seems to be principally theoretically
motivated in this research, it is instructive to consider the conclusions they might have reached had they
attempted to use a model selection criterion to empirically identify the number of groups implicated by the
observed pattern of network ties.
To this end, Figure 4 presents estimations of implied number of groups E(b) for networks of 163 actors
simulated out of the pair-dependent stochastic blockmodel based on 3, 4, 6, 8, and 10 groups. Given the
results from simulations of 24-actor networks, it is perhaps not surprising that both model selection criteria
provide less biased estimates of b in these networks with larger numbers of actors and smaller numbers of
candidate groups. With respect to this specific example, these results suggest that if the pattern of network
ties in the analyzed data really were generated from a four-group structure, then both the BIC and BICcat
criteria would likely esitmate b with more than sufficient accuracy over a wide range of noise rates pn.
On the other hand, this example illustrates a potential concern about the substantive conclusions that
might be drawn due to the downward bias of the BIC criterion in contexts where the number of groups
generating patterns of ties in a network is higher than that postulated by a researcher, particularly when
the proposed group-based model does not predict ties with high accuracy. For instance, considering the
simulations where b = 10 and pn = 0.6, the estimate of b based on BIC is 4.05, while the corresponding
estimate based on BICcat is 9.09. A researcher employing the BIC criterion would reasonably interpret this
result as evidence in support a four-group hypothesis, while a researcher using the BICcat criterion would
likely not. In context with relatively high noise rates pn, this downward bias in general can lead researchers
to find support for models of exchange that may be somewhat more parsimonious than the model that
actually produced the data.
More generally speaking, this example reinforces the idea that, in a multimodel selection context, it is
not sufficient that a selection criteria have minimally-biased performance in the neighborhood of a target
location in the solution space. When applied to this type of estimation task, the bias associated with a
model selection criteria may need to be minimized over the entire solution space. To the extent that the
BICcat does this, as these illustrations suggest that it does, it should outperform the BIC in identifying the
number of groups in a network.
16
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
3.0
3 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
4 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
6
6 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
8 Groups
pn
E((b
))
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
10 Groups
pn
E((b
))
Figure 4: BIC and BICcat Performance—163 Actor Networks
17
4 Discussion
Model selection criteria like the BIC, AIC and MDL are critically important in empirical research in which
the answer to the central theoretical question depends on the model used in analyzing empirical data. The
use of network data to identify the number of groups in a population is certainly an example of this type of
problem, and a domain in which general-purpose model selection criteria have been particularly helpful. In
general, the calculation of exact Bayes factors can be intractible for a wide range of problems—the BICcat
criterion proposed in this paper extends existing work that seeks to identify approximate Bayes Factors in
the specific context of these substantive problems.
While the BIC may be an effective approximation for a wide range of regression-style research in which
all model parameters are estimated on the basis of all observations, it appears that for some range of network
problems, it can produce downwardly-biased estimates of model parameters that are only based on a subset of
the observed data. The BICcat approximation proposed here is based on a set of assumptions that should be
more appropriate for problems like estimating the number of groups b implied by a given pattern of network
ties. The multimodel simulation results presented here bear this out, showing that the bias associated with
estimating the underlying number of groups E(b) is consistently lower when estimates are based on BICcat
rather than the unadjusted BIC.
Moreover, the results presented here suggest that applying the BICcat in a multimodel selection framework
may be an effective approach for other problems where empicial observations are theorized as instances of
distinct categories. Much of contemporary research in sociology has drawn attention to the importance of
boundaries and social categories (Lamont and Molnar 2002), ranging from arguments based on ideal-typical
career paths (Abbott and Hrycak 1990; Stovel et al. 1996; Han and Moen 1999), to arguments about the
benefits and costs of navigating specific social boundaries (Phillips and Zuckerman 2001; Hsu 2006). This
research places special significance on behavior that takes place near or across social boundaries—as such, the
empirical identification of the location of these boundaries may be of key substantive importance. Variations
of the approach outlined here may be useful in such efforts.
While the BICcat may be a less biased estimator of the number of groups implied by a network than does
the BIC, preliminary results in these contexts suggest that it is not necessarily a more efficient estimator.
Estimates E(b) based on BICcat based on a given number of actors n, generating blocks b, and noise level pn
show higher variance than corresponding estimates based on BIC. That said, some amount of the variance
reduction for the BIC-based estimates may be due to its downward bias—differences in variance between
the criteria are not pronounced in populations where estimates E(b) are similar. Further research may more
18
precisely address this issue.
The approach outlined here is also limited in that in can only be applied to identifying groups and
positions that conform to the stucture of specific stochastic network models. This is a limitation principally
because there are sociologically interesting forms of group structure such as those based on the idea of regular
equivalence (White and Reitz 1983) that are yet to be represented as such. If group structure in a particular
social system is in fact driven by regular equivalence rather than structural equivalence, a straightforward
application of the methods presented here could lead to misleading results. Nevertheless, the BICcat can be
applied to the wide range of features of social structure that can be modeled with general purpose stochastic
network models such as the ERGM (Wasserman and Pattison 1996; Anderson et al. 1999; Snijders et al.
2006).
Limitations of the ability to model particular features notwithstanding, the approach presented here
represents a helpful step forward in the statistical analysis of structure in social networks. The results
presented here illustrate how a careful consideration of the relationship between parameter estimates and
the observations upon which they are based can lead to less biased estimations of the parameters that
underpin structural characterizations of these data.
References
Abbott, Andrew and Alexandra Hrycak, 1990. “Measuring Resemblance in Sequence Data: An Optimal
Matching Analysis of Musicians’ Careers.” American Journal of Sociology 96:144–185. ISSN 0002-9602.
Akaike, Hirotugu, 1974. “A New Look at the Statistical Model Identification.” IEEE Transactions on
Automatic Control 19:716–623. ISSN 0018-9286.
Alderson, Arthur S. and Jason Beckfield, 2004. “Power and Position in the World City System.” American
Journal of Sociology 109:811–851.
Anderson, Carolyn J., Stanley Wasserman, and Bradley Crouch, 1999. “A p∗ Primer: Logit Models for
Social Networks.” Social Networks 21:37–66.
Anderson, Carolyn J., Stanley Wasserman, and Katherine Faust, 1992. “Building Stochastic Blockmodels.”
Social Networks 14:137–161.
Burnham, Kenneth P. and David R. Anderson, 2004. “Multimodel Inference: Understanding AIC and BIC
in Model Selection.” Sociological Methods and Research 33:261–304. doi:10.1177/0049124104268644.
19
Chaitin, Gregory J., 1966. “On the Length of Programs for Computing Finite Binary Se-
quences.” Journal of the Association for Computing Machinery 13:547–569. ISSN 0004-5411. doi:
http://doi.acm.org/10.1145/321356.321363.
Flandreau, Marc and Clemens Jobst, 2005. “The Ties that Divide: A Network Analysis of the In-
ternational Monetary System, 1890–1910.” The Journal of Economic History 65:977–1007. doi:
10.1017/S0022050705000379.
Fraley, Chris and Adrian E. Raftery, 1998. “How Many Clusters? Which Clustering Method? Answers Via
Model-Based Cluster Analysis.” The Computer Journal 41:578–588. doi:10.1093/comjnl/41.8.578.
———, 2002. “Model-Based Clustering, Discriminant Analysis, and Density Estimation.” Journal of the
American Statistical Association 97:611–631. ISSN 01621459.
Gerlach, Michael L., 1992. “The Japanese Corporate Network: A Blockmodel Analysis.” Administrative
Science Quarterly 37:105–139.
Grbic, Douglas, 2007. “The source, structure, and stability of control over Japan’s financial sector.” Social
Science Research 36:469–490.
Han, Shin-Kap, 2003. “Tribal regimes in academia: a comparative analysis of market structure across
disciplines.” Social Networks 25:251–280.
Han, Shin-Kap and Phyllis Moen, 1999. “Clocking Out: Temporal Patterning of Retirement.” American
Journal of Sociology 105:191–236. ISSN 0002-9602.
Handcock, Mark S., Adrian E. Raftery, and Jeremy M. Tantrum, 2007. “Model-based clustering for social
networks.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 170:301–354. doi:
doi:10.1111/j.1467-985X.2007.00471.x.
Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock, 2002. “Latent Space Approaches to
Social Network Analysis.” Journal of the American Statistical Association 97:1090–1098. doi:
doi:10.1198/016214502388618906.
Holland, Paul W., Kathryn B. Laskey, and Samuel Leinhardt, 1983. “Stochastic Blockmodels: Some First
Steps.” Social Networks 5:109–137.
20
Hsu, Greta, 2006. “Jacks of All Trades and Masters of None: Audiences’ Reactions to Spanning Genres in
Feature Film Production.” Administrative Science Quarterly 51:420–450. ISSN 0001-8392.
Kass, Robert E. and Adrian E. Raftery, 1995. “Bayes Factors.” Journal of the American Statistical Associ-
ation 90:773–795. ISSN 01621459.
Kass, Robert E. and Suresh K. Vaidyanathan, 1992. “Approximate Bayes Factors and Orthogonal Parame-
ters, with Application to Testing Equality of Two Binomial Proportions.” Journal of the Royal Statistical
Society. Series B (Methodological) 54:129–144. ISSN 00359246.
Kass, Robert E. and Larry Wasserman, 1995. “A Reference Bayesian Test for Nested Hypotheses and its
Relationship to the Schwarz Criterion.” Journal of the American Statistical Association 90:928–934. ISSN
01621459.
Kick, Edward L. and Byron L. Davis, 2001. “World-System Structure and Change: An Analysis of Global
Networks and Economic Growth across Two Time Periods.” American Behavioral Scientist 44:1561–1578.
doi:10.1177/00027640121958050.
Kolmogorov, Andrey N., 1965. “Three approaches to the quantitative definition of complexity.” Problems
in Information Transmission 1:4–7.
Kuha, Jouni, 2004. “AIC and BIC: Comparisons of Assumptions and Performance.” Sociological Methods
and Research 33:188–229. doi:10.1177/0049124103262065.
Lamont, Michele and Virag Molnar, 2002. “The Study of Boundaries in the Social Sciences.” Annual Review
of Sociology 28:167–195.
Laumann, Edward O., Joseph Galaskiewicz, and Peter V. Marsden, 1978. “Community Structure as Interor-
ganizational Linkages.” Annual Review of Sociology 4:455–484. doi:10.1146/annurev.so.04.080178.002323.
Lorrain, Francois P. and Harrison C. White, 1971. “Structural Equivalence of Individuals in Social Networks.”
Journal of Mathematical Sociology 1:48–80.
Nowicki, Krzysztof and Tom A. B. Snijders, 2001. “Estimation and Prediction for Stochastic Blockstruc-
tures.” Journal of the American Statistical Association 96:1077–1087.
Phillips, Damon J. and Ezra W. Zuckerman, 2001. “Middle-Status Conformity: Theoretical Restatement
and Empirical Demonstration in Two Markets.” American Journal of Sociology 107:379–429.
21
Raftery, Adrian E., 1995. “Bayesian Model Selection in Social Research.” In “Sociological Methodology
1995,” , edited by Peter V. Marsden, pp. 111–196. San Francisco: Jossey-Bass.
———, 1999. “Bayes Factors and BIC: Comment on “A Critique of the Bayesian Information Criterion for
Model Selection”.” Sociological Methods and Research 27:411–427. doi:10.1177/0049124199027003005.
Rissanen, Jorma, 1983. “A Universal Prior for Integers and Estimation by Minimum Description Length.”
The Annals of Statistics 11:416–431. ISSN 0090-5364.
———, 1989. Stochastic Complexity in Statistical Inquiry. Teaneck, N.J.: World Scientific.
Schwarz, Gideon, 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6:461–464. ISSN
00905364.
Shannon, Claude E., 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal
27:379–423.
Snijders, Tom A. B., Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock, 2006. “New
Specifications for Exponential Random Graph Models.” Sociological Methodology 36:forthcoming. doi:
10.1111/j.1467-9531.2006.00171.x.
Snyder, David and Edward L. Kick, 1979. “Structural Position in the World System and Economic Growth,
1955-1970: A Multiple-Network Analysis of Transnational Interactions.” American Journal of Sociology
84:1096–1126. ISSN 0002-9602.
Stine, Robert A., 2004. “Model Selection Using Information Theory and the MDL Principle.” Sociological
Methods and Research 33:230–260. doi:10.1177/0049124103262064.
Stovel, Katherine, Michael Savage, and Peter Bearman, 1996. “Ascription into Achievement: Models of
Career Systems at Lloyds Bank, 1890-1970.” American Journal of Sociology 102:358–399. ISSN 0002-
9602.
Van Rossem, Ronan, 1996. “The World System Paradigm as General Theory of Development: A Cross-
National Test.” American Sociological Review 61:508–527. ISSN 00031224.
Van Rossem, Ronan and Marjolijn M. Vermande, 2004. “Classroom Roles and School Adjustment.” Social
Psychology Quarterly 67:396–411. ISSN 01902725.
22
Volinsky, Chris T. and Adrian E. Raftery, 2000. “Bayesian Information Criterion for Censored Survival
Models.” Biometrics 56:256–262. ISSN 0006341X.
Wallace, Christopher S. and David M. Boulton, 1968. “An Information Measure for Classification.” The
Computer Journal 11:185–194.
Wallace, Christopher S. and David L. Dowe, 1999. “Minimum Message Length and Kolmogorov Complexity.”
The Computer Journal 42:270–283.
Wang, Yuchung J. and George Y. Wong, 1987. “Stochastic Blockmodels for Directed Graphs.” Journal of
the American Statistical Association 82:8–19.
Wasserman, Stanley and Phillipa Pattison, 1996. “Logit Models and Logistic Regression for Social Networks:
I. An Introduction to Markov Graphs and p∗.” Psychometrika 61:401–425.
Weakliem, David L., 1999. “A Critique of the Bayesian Information Criterion for Model Selection.” Socio-
logical Methods and Research 27:359–397. doi:10.1177/0049124199027003002.
White, Douglas R. and Karl P. Reitz, 1983. “Graph and Semi-group Homomorphisms on Networks of
Relations.” Social Networks 6:193–235.
White, Harrison C., Scott C. Boorman, and Ronald L. Breiger, 1976. “Social Structure from Multiple
Networks. I. Blockmodels of Roles and Positions.” American Journal of Sociology 81:730–779.
Yang, Yuhong, 2005. “Can the strengths of AIC and BIC be shared? A conflict between model indentification
and regression estimation.” Biometrika 92:937–950. doi:10.1093/biomet/92.4.937.
23