Estimating Psychological Networks and their Stability: a

Estimating Psychological Networks and their Stability: a Tutorial Paper

Authors: Sacha Epskamp*1, Denny Borsboom1 & Eiko I. Fried2

Affiliations: 1Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.

2Faculty of Psychology and Educational Sciences, University of Leuven, Leuven,

Belgium.

*Corresponding author: Sacha Epskamp, Department of Psychological Methods, University of Amsterdam. Nieuwe Achtergracht 129, 1018 WT Amsterdam, The Netherlands; tel: +31 6 25690541, email: [email protected]

STABILITY OF PSYCHOLOGICAL NETWORKS

2

Abstract

Over the course of the last years, network research has gained increasing popularity in psychological

sciences. Especially in clinical psychology and personality research, such psychological networks

have been used to complement more traditional latent variable models. While prior publications have

tackled the two topics of network psychometrics (how to estimate networks) and network inference

(how to interpret networks), the topic of network stability still provides a major challenge: it is unclear

how susceptible network structures and graph theoretical measures derived from network models are

to sampling error and choices made in the estimation method, greatly limiting the certainty of

substantive interpretation. In this tutorial paper, we aim to address these challenges. First, we

introduce the current state-of-the-art of network psychometrics and network inference in psychology

to allow readers to use this tutorial as a stand-alone to estimate and interpret psychopathological

networks in R. Second, we describe how bootstrap routines can be used to assess the stability of

network parameters. To facilitate this process, we developed the freely available R-package bootnet.

We apply bootnet, accompanied by R syntax, to a dataset of 359 women with posttraumatic stress

disorder available online. We show how to estimate confidence intervals around edge weights, and

how to examine stability of centrality indices by subsampling nodes and persons. This tutorial is

aimed at researchers with all levels of statistical experience, and can also be used by beginners.

Keywords: Gaussian Graphical Model; Ising Model; LASSO; network models; bootstrap; tutorial


3

Introduction

In the last five years, network research has gained substantial attention in psychological

sciences (Borsboom & Cramer, 2013). Novel statistical models have been developed that allow for

estimating psychological network structures (Bringmann et al., 2013; van Borkulo et al., 2014).

Psychological networks consist of nodes representing observed variables, connected by edges

representing statistical relationships. Such networks are strikingly different from, e.g., power grids

(Watts & Strogatz, 1998), social networks (Wasserman & Faust, 1994) or ecological networks (Barzel

& Biham, 2009), in which nodes represent entities (e.g., airports, people, organisms) and connections

are generally observed and known (e.g., electricity lines, friendships, mutualistic relationships). In

psychological networks, the goal is to find pairwise interactions, predictive effects or even potentially

causal relationships between observed psychological (and possibly biological) nodes. These

connections are not known, and thus need to be estimated. Recent statistical advances allow for the

estimation of such unknown network structures on datasets typically obtained in psychological

research (e.g., Barber & Drton, 2015; Foygel & Drton, 2010; Friedman, Hastie, & Tibshirani, 2008;

Haslbeck & Waldorp, 2015; van Borkulo et al., 2014). These methods have been implemented in

statistical software specifically aimed at psychologists without advanced statistical knowledge

(Epskamp, Cramer, Waldorp, Schmittmann, & Borsboom, 2012; van Borkulo & Epskamp, 2014), and

tutorials have emerged that detail important steps in estimating and analyzing psychological networks

(Costantini et al., 2015). Adopting this new methodology has allowed researchers to explore

important substantive questions, for instance in personality research (e.g., Costantini et al., 2015) and

clinical psychology (e.g., Cramer, Waldorp, van der Maas, & Borsboom, 2010; Fried et al., 2015;

Goekoop & Goekoop, 2014; Mcnally et al., 2014).

While network psychometrics (the estimation of psychological networks) and network

inference (the interpretation of networks) have been addressed in prior work, an aspect that has

received little attention so far is the parameter stability of networks, which we will simply call

stability. With stability, we mean that it is as of yet unclear how susceptible estimated network

structures and graph theoretical measures derived from network models are to sampling error and

choices made in the estimation method. This problem of stability is omnipresent in statistics. Imagine


4

researchers employ a regression analysis to examine three predictors of depression severity, and

identify one strong, one weak, and one unrelated regressor. If removing one of these three regressors,

or adding a fourth one, substantially changes the regression coefficients of the other regressors, results

are unstable and depend on specific decisions the researchers make, implying a problem of stability.

The same holds for psychological networks. Due to the current uncertainty, there is the danger to

obtain network structures sensitive to specific variables included, or sensitive to specific estimation

methods. This poses a major challenge, especially when substantive interpretations such as treatment

recommendations in the psychopathological literature, or the generalizability of the findings, are

important. The current replication crisis in psychology (Open Science Collaboration, 2015) stresses

the crucial importance of obtaining robust results, and we want the emerging field of

psychopathological networks to start off on the right foot.

This article has two main goals. First, we describe the current state-of-the-art for

psychological network analysis: how are such network structures best estimated, and how can we use

these structures for statistical inference? We include this part to make this tutorial stand-alone

readable for psychological researchers. Second, we describe three different bootstrapping techniques

that cover the most important network stability tests. We do so using the novel package bootnet

(Epskamp, 2016a) developed for the free statistical programming environment R (R Development

Core Team, 2008), and demonstrate the package's functionality in a dataset of 359 women with post-

traumatic stress disorder (PTSD; Hien et al., 2009) that can be downloaded from the Data Share

Website of the National Institute on Drug Abuse. It is important to note that the focus of our tutorial is

on cross-sectional network models that can readily be applied to many current psychological datasets.

We hope that this tutorial will enable researchers to gauge the stability and certainty of the results

obtained from network models, and to provide editors, reviewers, and readers of psychological

network papers the possibility to better judge whether substantive conclusions drawn from such

analyses are defensible.


5

Network psychometrics and network inference

A psychological network is a model in which nodes represent observed psychological variables—

usually psychometric test items such as responses to questions about whether a person suffered from

insomnia or fatigue in the past weeks—which are connected by edges that indicate some statistical

relationship between them. These models are conceptually different from commonly used reflective

latent variable models that explain the co-occurrence among symptoms—the fact that individuals

often suffer from sadness, insomnia, fatigue and concentration problems at the same time—by a

causal of an underlying unobserved latent trait—e.g., depression. Psychological networks offer a

different conceptual interpretation of the data and explain such co-occurrences via direct relationships

between symptoms; someone who sleeps poorly will be tired, and someone who is tired will not

concentrate well (Fried, 2015; Schmittmann et al., 2013). Such relationships can then be more easily

interpreted when drawn as a network structure, in which edges indicate pathways on which nodes can

affect each other. The edges can differ in strength of connection, also termed the edge weight

(Epskamp et al., 2012) indicating if a relationship is strong (commonly visualized with thick edges) or

weak (thin, less saturated edges), and positive (green edges) or negative (red edges). After a network

structure is estimated, the visualization of the graph itself tells the researcher a detailed story of the

multivariate dependencies in the data. Additionally, many inference methods from graph theory can

be used to assess which nodes are the most important nodes in the network, termed the most central

nodes.

Directed and undirected networks

In general, there are two types of edges that can be present in a network; an edge can be

directed, in which case one head of the edge has an arrowhead indicating a one-way effect, or an edge

can be undirected, indicating some mutual relationship. A network that contains only directed edges is

termed a directed network, whereas a network that contains only undirected edges is termed an

undirected network (Newman, 2010). Directed networks have been of great interest in many fields as

they can be used to encode causal structures (Pearl, 2000). For example, the edge insomnia à fatigue

can be taken to indicate that insomnia causes fatigue. The work of Pearl describes that such causal

structures can be tested using only observational cross-sectional data, and even estimated to a certain


6

extend (Kalisch et al., 2012; Scutari, 2010). However, when temporal information is lacking, there is

only limited information present in cross-sectional observational data. Such estimation methods

typically only work under the very strict assumptions that all entities that play a causal role have been

measured and the causal chain of cause and effect is not cyclic (a variable cannot cause itself via any

path). Both assumptions are not very plausible in psychological systems. Furthermore, such directed

networks suffer from the problem that many equivalent models can exist that feature the same

relationships found in the data (MacCallum & Browne, 1993), which makes the interpretation of

structures difficult. For example, the structure insomnia à fatigue à concentration is statistically

equivalent to the structure insomnia ß fatigue à concentration and the structure insomnia ß fatigue

ß concentration: all three only indicate that insomnia and concentration problems are conditionally

independent after controlling for fatigue.

For the reasons outlined above, psychological networks estimated on cross-sectional data are

typically undirected networks. The current state-of-the-art of estimating undirected psychological

network structures involves the estimation of Pairwise Markov Random Fields (PMRF; Costantini,

Epskamp, et al., 2015; van Borkulo et al., 2014). A PMRF is a network model in which edges indicate

the full conditional association between two nodes after conditioning on all other nodes in the

network. This means that, when two nodes are connected, there is a relationship between these two

nodes that cannot be explained by any other node in the network. Simplified, it can be understood as a

partial correlation controlling for all other connections. In addition, the absence of an edge between

two nodes indicates that these nodes are conditionally independent of each other given the other nodes

in the network. Thus, a completely equivalent undirected structure to the structures described above

would be insomnia — fatigue — concentration, indicating that insomnia and concentration problems

are conditionally independent after controlling for fatigue.


7

Figure 1. Example network

Figure 1 shows a PMRF similar to the example described above. In this network, there is a

positive relationship between insomnia and fatigue, and a negative relationship between nodes fatigue

and concentration. The positive edge is thicker and more saturated than the negative edge, indicating

that this interaction effect is stronger than that of the negative edge. This network shows that insomnia

and concentration do not directly interact with each other in any way, only through their common

connection with fatigue. Therefore, fatigue is the most important node in this network – a concept we

will later quantify as centrality. These edges can be interpreted in several different ways. First, as

shown above, the model is in line with causal interpretations of associations among the symptoms.

Second, this model implies that insomnia and fatigue predict each other after controlling for

concentration; even when we know someone is concentrating poorly, that person is more likely to

suffer from insomnia when we observe that person to suffer from fatigue. Similarly, fatigue and

concentration predict each other after controlling for insomnia. Furthermore, after controlling for

fatigue there is no longer any predictive quality between insomnia and concentration, even though

these variables are correlated; fatigue now mediates the prediction between these two symptoms

A

B

C


8

(Epskamp, Maris, Waldorp, & Borsboom, 2016). Finally, these edges can represent genuine

symmetric causal interactions between symptoms. For example, in statistical physics a PRMF, the

Ising model, is used model that particles cause neighboring particles to be aligned.

Before networks can be interpreted as such, however, they must first be constructed. While

this can be rather straightforward in, e.g., social networks (social networking sites now readily supply

a list of connections), train networks (just ask the railway managers), or image processing (pixels next

to each other are connected), it is a challenging task in psychology: what is the structure of a

psychological system such as PTSD? We will refer to the field dedicated to estimating psychological

network structures network psychometrics (Epskamp et al., 2016), and provide a brief overview in the

following section.

Network Psychometrics

Ising model and Gaussian graphical model

When the data is binary, the appropriate PRMF model to use is called the Ising model (van

Borkulo et al., 2014), and requires binary data to be estimated. When the data is multivariate normally

distributed, the appropriate PRMF model is called the Gaussian graphical model (GGM; Costantini,

Epskamp, et al., 2015; Lauritzen, 1996), in which edges can directly be interpreted as partial

correlation coefficients. A GGM in which all possible edges are included in the network is also

termed a partial correlation network, as edge weights then directly equal partial correlation

coefficients. The Ising model can be estimated by using the EstimateIsing function from the R-

package IsingSampler (Epskamp, 2016b) with the option method = “ll”. The GGM, on the other

hand, can be estimated by using the qgraph function from the R-package qgraph (Epskamp et al.,

2012) with the option graph = “pcor” .

When data is neither binary nor multivariate normal, the Ising model and GGM can still be

used. For the Ising model, data need to be binarized (e.g., by median split or based on some

theoretical reasoning). For the GGM, data either need to be transformed to approximate multivariate

normality (e.g., by using the nonparanormal transformation; Liu, Lafferty, & Wasserman, 2009), or,

in case some or all variables are ordinal, an estimate of an underlying normally distributed correlation


9

matrix has to be obtained (e.g., by using polychoric correlations; Olsson, 1979). Alternatively, novel

methodology has been developed to estimate a PMRF directly containing both continuous and

categorical variables (Haslbeck & Waldorp, 2015). This model can be estimated using the mgmfit

function from the mgm package in R.

Dealing with the problem of small N in psychological data

Both the Ising model and the GGM feature a severe limitation: the number of parameters to

estimate grows quickly with the size of the network. In a 10-node network, 55 parameters (10

threshold parameters and 10*9/2=45 pairwise association parameters) need be estimated already. This

number grows to 210 in a network with 20 nodes, and to 1275 in a 50-node network. To reliably

estimate that many parameters, the number of observations needed typically exceeds the number

available in characteristic psychological data. While a lot is a loosely defined statement, and future

research is needed to define strict guidelines, a researcher will, at the very minimum, want to measure

at least as many people as there are parameters in the model. With too few observations, the standard

errors of the estimated parameters can become intolerably large and inference on the estimated

network structure becomes pointless.

Recently, network psychometricians have applied the "least absolute shrinkage and selection

operator" (LASSO; Friedman, Hastie, & Tibshirani, 2008) to remediate the large standard errors in

the Ising model and GGM when estimated with relatively little data. This uncertainty of parameters

also implies that some estimated edges in the network may be spurious and only due to chance. A way

to remedy this problem is to use a penalty function, also termed regularization. The LASSO employs

such a regularizing penalty by limiting the total absolute sum of parameter values, leading many edge

estimates to shrink to exactly zero and dropping out of the model. As such, the LASSO returns a

sparse (or, in substantive terms, conservative) network model: only a relatively small amount of edges

are used to explain the covariation structure in the data. Because of this sparsity, the estimated models

become more interpretable. The LASSO utilizes a tuning parameter to control the degree to which

regularization is applied. This tuning parameter can be selected by minimizing the Extended Bayesian

Information Criterion (EBIC; Foygel & Drton, 2010).


10

Estimating regularized networks in R is straightforward. For the Ising model, LASSO

estimation using EBIC has been implemented in the IsingFit package (van Borkulo & Epskamp,

2014). For GGM networks, a well-established and fast algorithm for estimating LASSO regularization

is the graphical LASSO (glasso; Friedman et al., 2008), which is implemented in the package glasso

(Friedman, Hastie, & Tibshirani, 2014). The qgraph package—using the qgraph function with the

option graph = "glasso"—runs glasso with tuning parameter selection using the EBIC.

Network inference

In the first step of network analysis, the obtained network is typically visualized to show the

structure of the data (Epskamp et al., 2012). Afterwards, inference methods derived from graph theory

can be applied to the network structure. The estimated PRMF is always a weighted network, which

means that we not only look at the structure of the network (are two nodes connected or not), but also

at the strength of connection between pairs of nodes. Because of this, many typical inference methods

that concern the global structure of the network, such as smallworldness, density and global clustering

(Kolaczyk & Csárdi, 2014; Newman, 2010; Watts & Strogatz, 1998), are less useful in the context of

psychological networks, because they only take into account whether nodes are connected or not, and

not the strength of association among nodes. Because the global inference methods for weighted

networks and PRMFs are still in development and no consensus has been reached yet, the network

inference section focuses on local network properties: how are two nodes related, and what is the

influence of a single node?

Relation between two nodes

The relationship between two nodes can be assessed in two ways. First, one can directly

assess the edge weight. This is always a number that is nonzero (as an edge weight of zero would

indicate there is no edge). The sign of the edge weight (positive or negative) indicates the type of

interaction, and the absolute value of the edge weight indicates the strength of the effect; a positive

edge weight of, for example, 0.5 is equal in strength to a negative edge weight of -0.5, and both are

stronger than an edge weight of 0.2. Two strongly connected nodes influence each other more easily


11

than two weakly connected nodes. Similar to how two persons standing closer to each other can

communicate more easily (talking) than two people standing far away from each other (shouting), two

strongly connected nodes are closer to each other. As such, the length of an edge is defined as the

inverse of the edge strength. Finally, the distance between two nodes is equal to the sum of lengths of

all edges on the shortest path between the two nodes (Newman, 2010).

Node centrality

The importance of individual nodes in the network can be assessed by investigating the node

centrality. A visualization of a network—such as the ones shown later in this manuscript in Figure

2—is an abstract rendition of a high-dimensional space in two dimensions. While visualizations of

network models often aim to place highly connected nodes into the center of the graph, for instance

using the Fruchterman-Reingold algorithm (Fruchterman & Reingold, 1991), the two-dimensional

visualization cannot properly reflect the true space of the model. Thus, the metric distance between

the placement of nodes in the two-dimensional space has no direct interpretation as it has in, e.g.,

multidimensional scaling. Therefore, graph theory has developed several methods to more objectively

quantify which node is more central in a network than others. Three such centrality measures have

appropriate weighted generalizations that can be used with psychological networks (Opsahl,

Agneessens, & Skvoretz, 2010).

First, node strength, also called degree in unweighted networks (Newman, 2010), simply adds

up the strength of all connected edges to a node; if the network is made up of partial correlation

coefficients the node strength equals the sum of absolute partial correlation coefficients between a

node and all other nodes. Second, closeness takes the inverse of the sum of all shortest paths between

a node and all other nodes in the network. Thus, where node strength investigates how strongly a node

is directly connected to other nodes in the network, closeness investigates how strongly a node is

indirectly connected to other nodes in the network. Finally, betweenness looks at how many shortest

paths between two nodes go through the node in question; the higher the betweenness, the more

important a node is in connecting other nodes.


12

Network stability

The above description is an overview of the current state of network analysis in psychology.

While network inference is typically performed by assessing edge strengths and node centrality, little

work has been done in investigating how stable these inferences are. Imagine we find that symptom A

has a much higher node strength than symptom B in a psychopathological network, leading to the

clinical interpretation that A may be a more relevant target for treatment than the peripheral symptom

B (Fried, Epskamp, Nesse, Tuerlinckx, & Borsboom, 2016). Clearly, this interpretation relies on the

assumption that the centrality estimates are indeed different from each other. Similarly, while two

edges in a network may be differentially strong, it remains unclear how strong the evidence for that

difference is if we do not take into account the variability of the edge weight estimation. We have

developed the free to use R extension bootnet that implements various methods for assessing the

stability of network models. Bootnet comes with convenient default arguments to estimate the Ising

model and the GGM, and outputs easily indexable data frames as well as interpretable plots.

Edge weight bootstrap

To assess the variability of edge weights, we can estimate confidence intervals (CI); in 95%

of the cases such a CI will contain the true value of the parameter. To construct a CI, we need to know

the sampling distribution of the statistic of interest. While such sampling distributions can be difficult

to obtain for complicated statistics such as centrality measures, there is an alternative and simple way

of constructing CIs: bootstrapping (Efron, 1979). Bootstrapping involves repeatedly taking random

samples with replacement of the original data, followed by estimating the statistic of interest.

Following the bootstrap, a 1-α CI can be constructed as the interval between quantiles 1/2α and 1-

1/2α of the bootstrapped distribution.

Bootstrapping edge weights can be done in two ways: using non-parametric bootstrap and

parametric bootstrap (Bollen & Stine, 1993). Both methods involve sampling a series of new plausible

datasets with the same number of observations and then recomputing the statistic of interest over

these new datasets. In non-parametric bootstrapping, observations in the data are resampled with


13

replacement to create new plausible datasets, whereas parametric bootstrapping samples new

observations from the parametric model that has been estimated from the original data; this creates a

series of values that can be used to estimate the sampling distribution.

Non-parametric bootstrapping can always be applied, whereas parametric bootstrapping

requires a parametric model of the data. This is no problem in network analysis, as we already have a

parametric model for the data; the estimated network structure fully encodes the joint probability

distribution for the data. When we estimate a GGM, data can be sampled by sampling from the

multivariate normal distribution through the use of the R package mvtnorm (Genz et al., 2015); to

sample from the Ising model, we have developed the R package IsingSampler (Epskamp, 2015b).

Using the GGM model, the parametric bootstrap samples continuous multivariate normal data – an

important distinction from ordinal data if the GGM was estimated using polychoric correlations.

Therefore, we advise the researcher to use the non-parametric bootstrap when handling ordinal data.

Furthermore, the non-parametric bootstrap is fully data-driven and requires no theory, whereas the

parametric bootstrap is more theory driven. As such, it is advisable to first investigate the non-

parametric bootstrap results, and examine the parametric bootstrap results if the non-parametric

results prove unstable or to check for correspondence of the CIs between both methods.

Subset bootstrap methods

While the CIs of edge weights can be constructed using the bootstrap, we discovered in the

process of this research that constructing CIs for centrality indices is far from trivial. As discussed in

more detail in the supplementary materials, bootstrapping centrality indices results in highly biased

sampling distributions, and thus the bootstrap cannot readily be used to construct true 95% CIs. While

intervals can be constructed that give a hint of the sampling variation around centrality indices, they

should thus not be interpreted as CIs. Our suggestion is rather to focus on the edge weight CIs to

obtain insight into the stability of the network estimation. For centrality indices, we suggest to instead

look at the stability of network measures based on subsets of the data. There are two ways to do so.

Taking subsets of people in the dataset employs the so called m out of n bootstrap, which is

commonly used to remediate problems with the regular bootstrap (Chernick, 2011). Taking subsets of


14

variables is also a viable option in assessing the stability of centrality measures (Costenbader &

Valente, 2003). While both subsetting methods do not lead to valid CIs, they can be used to assess the

correlation between the original centrality indices and those obtained from subsets, which can guide

the interpretation of stability. If the order of centrality indices of a number of symptoms in the

network completely changes after dropping 10% of the people or nodes, then the order cannot

meaningfully be interpreted.

In network analysis, researchers usually are less interested in discovering the true centrality of

a node than in the order of centrality found in the network: which nodes are the most important? If

that order is not stable, then no meaningful interpretation can be made on the order of centrality in the

network analysis. To assess this stability, we propose to investigate the correlation between the

centrality based on the entire sample and the centrality based on a subset of the sample: the higher this

correlation is, the more stable the results. This can be done in two ways, by dropping nodes from the

data (node-dropping), or by dropping persons from the data (person-dropping). Costenbader and

Valente (Costenbader & Valente, 2003) describe node-dropping in investigating centrality stability,

while person-dropping is comparable to the m out of n bootstrap. We term this general framework,

covering both node-dropping and person-dropping, subset bootstrap.

The bootnet package includes subset bootstrap and produces plots of the correlation of

centrality indices under different levels of subsetting, as described by Costenbader and Valente. The

bootnet package also visualizes the variability around this correlation, showing researchers if

centrality indices are stable in the dataset; if the correlation drasticly changes after only dropping, say,

10% of the nodes or persons, it is very clear that no meaningful interpretation can be made of the

centrality order. In addition to the correlation plot, bootnet can be used to plot the average estimated

centrality index for each node under different sampling levels, giving more detail on the order of

centrality under different subsetting levels.


15

Tutorial

We have implemented the non-parametric, parametric, and subset-dropping bootstrap in a

general framework for network analysis in the freely available R package bootnet. Specifically, this

can be used to bootstrap the edge weights, and also obtain information regarding the stability of

various centrality estimates. With bootnet, users can perform non-parametric and node-dropping

bootstrap for any R routine that estimates a PMRF, and parametric bootstrap can be performed for the

Ising model and the GGM. To exemplify the usage of bootnet in both estimating and investigating

network structures, we use a dataset of 359 women enrolled in community-based substance abuse

treatment programs across the United States (study title: Women's Treatment for Trauma and

Substance Use Disorders; study number: NIDA-CTN-0015; URL:

https://datashare.nida.nih.gov/protocol/nida-ctn-0015). All participants met the criteria for either

posttraumatic stress disorder (PTSD) or sub-threshold PTSD, according to the DSM-IV-TR (APA,

2000). Details of the sample, such as inclusion and exclusion criteria as well as demographic

variables, can be found elsewhere (Hien et al., 2009). We estimate the networks using the 17 PTSD

symptoms from the PTSD Symptom Scale-Self

Report (PSS-SR; Foa, Riggs, Dancu, & Rothbaum, 1993). Participants rated the frequency of

endorsing these symptoms on a scale ranging from 0 (not at all) to 3 (at least 4 or 5 times a week).

In a first step, we will show how to estimate the PTSD network. We then continue to

introduce the estimation of centrality indices. Finally, we provide a step-by-step tutorial to perform

the stability analyses introduced above; this will provide users with the possibility to examine the

stability of the edge weights and the stability of the centrality estimates.

Estimating the networks

Following the steps in Appendix A, the data can be loaded into R in a data frame called

FreqBaseline, which contains the frequency ratings at the baseline measurement point. Since the

PTSD symptoms are ordinal, we need to compute a polychoric correlation matrix to base our

networks on. We do so using the cor_auto function from the qgraph package, which automatically


16

detects ordinal variables and utilizes the R-package Lavaan (Rosseel, 2012) to compute polychoric

(or, if needed, polyserial and Pearson) correlations:

library("qgraph")

corFreq <- cor_auto(FreqBaseline)

Next, we compute a GGM based on the polychoric correlation matrix among symptoms obtained by

cor_auto. As described above, we can do so either by using the LASSO regularization that sets

small edges to exactly zero, or without this regularization technique. To estimate the model without

regularization, we can use the qgraph function together with the option graph = "pcor". Since

edge weights in this network are equal to partial correlation coefficients, we will refer to this network

as the partial correlation network.

graph_pcor <- qgraph(corFreq, graph = "pcor")

If we decide to compute a LASSO regularized GGM model instead, we can use the following codes;

we will term this network the glasso network.

graph_glasso <- qgraph(corFreq, graph = "glasso", sampleSize =

nrow(FreqBaseline))

We can plot both networks using the same layout as follows:

Layout <- averageLayout(graph_pcor, graph_glasso)

layout(t(1:2))

qgraph(graph_pcor, layout = Layout, labels = TRUE, title = "Partial

correlation network")

qgraph(graph_glasso, layout = Layout, labels = TRUE, title = "glasso

network")


17

The averageLayout function averages the node placement (Fruchterman & Reingold, 1991)

obtained from the different estimation methods and places the nodes in the same way in both graphs,

Figure 2. The unregularized partial correlation network (left) and the regularized glasso network

(right) of 17 PTSD symptoms (n=359).

enabling us to compare said networks better with each other. Figure 2 shows these two estimated

networks; the legend of the corresponding nodes can be found in Table 1. When we compare the

networks, we can see that the main connections in both networks are the same, that the partial

correlation network contains negative edges, and that the glasso network is overall much sparser.

Whereas the partial correlation network features all 136 edges, the glasso network only has 78 edges

due to the regularization; other edges are set to exactly 0.

In both networks, especially strong connections emerge among symptoms 3 (being jumpy)

and 4 (being alert), 5 (cut off from people) and 11 (interest loss), and 16 (upset when reminded of the

trauma) and 17 (upsetting thoughts/images). Other connections are very small or absent, for instance

between 7 (irritability) and 15 (reliving the trauma); this implies that these symptoms are statistically

independent when conditioning on all other symptoms.

1

23

4

5

6

7

8

9

1011

12

13

14

15

16

17

glasso network

1

23

4

5

6

7

8

9

1011

12

13

14

15

16

17

Partial correlation network


18

Estimating node centrality

Table 1. Node IDs and corresponding symptom names of the 17 PTSD symptoms.ID Variable

1 Avoid reminds of the trauma

2 Bad dreams about the trauma

3 Being jumpy or easily startled

4 Being over alert

5 Distant or cut off from people

6 Feeling emotionally numb

7 Feeling irritable

8 Feeling plans won’t come true

9 Having trouble concentrating

10 Having trouble sleeping

11 Less interest in activities

12 Not able to remember

13 Not thinking about trauma

14 Physical reactions

15 Reliving the trauma

16 Upset when reminded of trauma

17 Upsetting thoughts or images

To investigate centrality indices in both graphs, we can use the centralityPlot function

from the qgraph package:

centralityPlot(list(pcor = graph_pcor, glasso = graph_glasso))

As we can see in Figure 3, nodes differ quite substantially in their centrality estimates, and centrality

indices are rather similar across the two estimated networks. The most central node in the partial

correlation network is node 3 (“Being jumpy or easily startled”) in all three measures, whereas it is

only the node with the highest closeness in the glasso network; node 17 (“Upsetting thoughts or


19

image”) has the highest strength and betweenness in the glasso network. However, without knowing

the stability of the centrality estimates, we cannot conclude whether the differences of centrality

estimates within a network––or the differences across networks––are meaningful or not.

Figure 3: Centrality indices of partial correlation and glasso networks of 17 PTSD symptoms.

Stability analysis

To investigate the stability of edge weights and centrality estimates, we have to load the

bootnet package:

library("bootnet")

The main function of bootnet is the function bootnet, which performs the bootstrapping of

networks using various estimation methods. Bootnet comes with convenient default settings for the

most commonly used network psychometric methods. The default argument of bootnet can be set to

"pcor" or "EBICglasso" to estimate a GGM using cor_auto followed by nonregularized

(partial correlation network) or regularized (glasso network) estimation of the GGM exactly as shown

Betweenness Closeness Strength

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

1234567891011121314151617

−1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

type●

●

glassopcor


20

above. Next, the type argument can be used to specify the type of bootstrap: "nonparametric"

for the non-parametric bootstrap, "parametric" for the parametric bootstrap, "node" for the

node-dropping bootstrap, and "person" for the person dropping bootstrap.

Edge weight bootstrap

In our case, the PTSD symptoms are ordinal, and as explained above we can only use the non-

parametric bootstrap. The following codes bootstrap the partial correlation network:

nonParametricBoot_pcor <- bootnet(FreqBaseline, default = "pcor",

type = "nonparametric")

Similarly, the glasso network can be bootstrapped as follows:

nonParametricBoot_glasso <- bootnet(FreqBaseline, default =

"EBICglasso", type = "nonparametric")

The print method of these objects gives an overview of characteristics of the sample network (e.g.,

the number of estimated edges) and tips for further investigation, such as how to plot the estimated

sample network or any of the bootstrapped networks. The summary method can be used to create a

summary table of certain statistics containing quantiles of the bootstraps. In this paper, we will only

demonstrate the plot method, which can be used to create a number of plots showing the confidence

intervals of network statistics. To do so, we can obtain the estimated 95% CIs for estimated edge

parameters for both the partial correlation network and the glasso network as follows:

plot(nonParametricBoot_pcor, labels = FALSE, order = "sample")

plot(nonParametricBoot_glasso, labels = FALSE, order = "sample")

Figure 4 shows the resulting plots and reveals sizable CIs around the estimated edge parameters. Of

note, the edges 16 (“Upset when reminded of trauma”) — 17 (”Upsetting thoughts or images”), 3

(“Being jumpy or easily startled”) — 4 (“Being over alert”) and 5 (“Distant of cut off from people”)

— 11 (“Less interest in activities”) are reliably the three strongest edges in both networks. However,


21

the large confidence intervals imply that interpreting the order of most edges in the network should be

done with care. Interestingly, the partial correlation network shows confidence intervals of equal size

for most edges, whereas confidence intervals in the glasso network shrink to be much smaller for the

Figure 4: Bootstrapped confidence intervals of estimated edge weights for the partial correlation

network (left) and the glasso network (right) of 17 PTSD symptoms. The red line indicates the sample

values and the gray area the 95% CIs.

edges that are set to zero; this means that the glasso network more reliably estimates the edge

parameters that are on or near zero.

Subset bootstrap

In the next step, we want to drop nodes from the network to examine whether this impacts on

the centrality estimates. We can perform the node-dropping bootstrap on the partial correlation and

glasso networks as follows:

nodeDropBoot_pcor <- bootnet(FreqBaseline, default = "pcor", type =

"node")

nodeDropBoot_glasso <- bootnet(FreqBaseline, default = "EBICglasso",

type = "node")


22

The plot method can again be used to investigate the results; by default, the average correlation

between centrality indices from the different sampling levels and the original dataset is plotted:

plot(nodeDropBoot_pcor)

plot(nodeDropBoot_glasso)

Figure 5 shows that the correlation drops steadily in the partial correlation network, whereas the

correlation of mostly the node strength statistic remains stable in the glasso network up to 50%

sampling level. This means that even when half of the nodes are dropped, the order of the node

strength centrality remains fairly stable, implying a robust estimation of node strength.

Figure 5: Average correlations between centrality indices of networks sampled with nodes dropped

and the original sample using partial correlation networks (left) and glasso networks (right).

Next, we can investigate the centrality of individual nodes when dropping nodes. We only examine

node strength centrality in the following plots, but betweenness and closeness can also be

investigated:

plot(nodeDropBoot_pcor, "strength", perNode = TRUE, legend = FALSE)

plot(nodeDropBoot_glasso, "strength", perNode = TRUE, legend =

FALSE)


23

Figure 6 shows that the order of node strength estimates is much harder to interpret in the partial

correlation network than in the glasso network (the scale on the y axis is different because glasso

always estimated edge weights closer or equal to zero than the partial correlation network).

Figure 6: Node strength estimates of individual nodes (colored lines) of the partial correlation

network (left) and the glasso network (right) after node-dropping.

We can repeat the same steps for person-dropping by using type = “person” in bootnet:

personDropBoot_pcor <- bootnet(FreqBaseline, default = "pcor", type

= "person")

personDropBoot_glasso <- bootnet(FreqBaseline, default =

"EBICglasso", type = "person")

The same codes as above create the plots for person dropping, which are shown in Figures 7 and 8.

These plots, again, show that only strength in the glasso network can meaningfully be interpreted.

A comparison of Figures 6 and 8 reveals that centrality goes down when subsampling nodes,

but up when subsampling people. Since centrality quantifies how much influence a node has, it is

natural to see that centrality decreases if a node has less potential other nodes to influence. In contrast,

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●●

●

●●

●●

0.5

1.0

1.5

2.0

strength

90% 80% 70% 60% 50% 40% 30% 20% 10%Sampled nodes

●

●

●●

●

●

●●●●

●●●●

●●

●

●●●●

●●●●

●●●

●●●●

●

●

●

●

●●

●

●

●●●●●●●●

●

●

●

●

●●●

●●●●●●●●

●

●

●

●

●

●

●●●

●●

●●●●

●●

●

●

●

●●

●●

●

●●●●●

●●●

●

●

●

●●

●●

●●

●●●●●●●

●

●

●

●

●

●

●●

●

●

●●●

●●●

●

●

●

●

●

●

●

●

●●●

●

●●

●●

●

●

●

●●

●●●

●●

●●●●●●

●

●

●

●

●

●

●

●●●

●

●

●●

●●

●

●

●

●●

●

●●

●●

●●

●●●●

●

●

●

●

●

●

●

●

●●

●●

●●●

●

●

●●

●●

●●

●

●●●

●●●●●

●

●

●

●

●●

●●

●●●●●

●●●

●

●

●●

●

●●●

●●●●

●●●●

●

●

●●

●

●

●●

●●

●●

●●●●

0.00

0.25

0.50

0.75

1.00

strength

90% 80% 70% 60% 50% 40% 30% 20% 10%Sampled nodes


24

centrality increases when dropping participants because if the true network is sparse, taking smaller

samples leads to larger sampling areas around zero, leading to on average higher absolute edge

weights (and thus higher centrality measures).

Figure 7: Average correlations between centrality indices of networks sampled with persons dropped

and the original sample using partial correlation networks (left) and glasso networks (right).

Figure 8: Node strength estimates of individual nodes (colored lines) of the partial correlation

network (left) and the glasso network (right) after person-dropping.

●

●●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●●

●

●●

●

●

●●

●

●●

●●

●●

●

●●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●●●

●●

●

●●

●●●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●●

●●● ●

●●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

1.0

1.5

2.0

2.5

strength

90% 80% 70% 60%Sampled people

●●●●●●●

●

●●●●

●

●

●●●●●●

●

●

●●

●●●

●

●●●

●●●●

●●●●●●●

●

●●●

●●●

●●●●●●●

●●●●●●●

●●●

●●●●

●●

●●●●●

●

●●●●●

●

●●●●●●●

●●●

●●●●

●●

●●●●

●

●

●●●●●●

●●

●●●●

●

0.4

0.6

0.8

1.0

1.2

strength

90% 80% 70% 60%Sampled people


25

Flexible specification of bootnet

In addition to the default sets for glasso and partial correlation networks, bootnet sports two

more default sets to similarly estimate the Ising model. The default sets "IsingLL" and

"IsingFit" can be used for, respectively, nonregularized estimation (via IsingSampler by using a

formulization of the Ising model as a loglinear model; Agresti, 1990; Epskamp et al., 2016) and

regularized estimation (van Borkulo et al., 2014) of the Ising model. Furthermore, the user can

manually set the estimation method of the PMRF in bootnet using a set of arguments to the bootnet

function. First, the method of preprocessing the data must be defined via the prepFun argument,

which must be assigned a function that takes a dataset as input and returns the viable input for the

network estimator. The argument prepArgs can be specified a list of arguments to the prepFun

function. Data preprocessing typically means correlating the data for the GGM or binarizing it for the

Ising model (to this end bootnet provides a binarize function). Next, we estimate the network. To

do so, we assign the estFun argument, a function that takes whatever the output of prepFun was

and estimates a network. The estArgs argument can be used to assign a list of additional arguments

to the function used in estFun. Finally, we need to extract the weights matrix and intercepts.

Assigning functions to the graphFun and intFun arguments respectively can do this. An example

of how these commands work together to run the same bootstrap of the glasso network as is done

above is shown below:

bootnet(FreqBaseline,

prepFun = cor_auto,

prepArgs = list(verbose = FALSE),

estFun = qgraph::EBICglasso,

estArgs = list(n = nrow(bfi)),

graphFun = identity,

intFun= null)


26

Conclusion

In this paper, we have summarized the state-of-the-art in psychometric network modeling, and

have shown how to estimate network models can be estimated and discussed inference methods that

can be used after these models are obtained. Furthermore, we have described, for the first time, how

the stability of these inference methods can be assessed using non-parametric, parametric and subset

bootstrap methods. To aid the researcher in this endeavor, we have developed the freely available R

package bootnet. It is of note that, while we demonstrate the functionality of bootnet in this tutorial

using a GGM network, the package can be used for any estimation technique in R that estimates a

PMRF (such as the Ising model).

The stability analysis of a 17-node symptom network of 359 women with PTSD or sub-

threshold PTSD indicated that glasso network more reliably estimated the edge weights than the

partial correlation network. Centrality stability analysis, on the other hand, revealed that only node

strength based on LASSO regularized GGM networks could reliably be interpreted. Furthermore,

subset bootstrapping showed that the centrality indices of the glasso network were much more

consistent after dropping nodes or persons than the centrality indices of the partial correlation

network. These results are in line with expectations, as the LASSO is designed to yield more stable

inferences in complex models with limited data.

The results highlight the importance of stability analysis when estimating psychological

networks. Figure 3 seemed to indicate that both the non-regularized and the regularized networks are

similar in terms of centrality indices. However, had we stopped analyzing the data at this point, then

we would have erroneously concluded that results of both network estimation methods can be used

for network inference, and that the centrality indices are reliably estimated. However, further analyses

revealed that, in this dataset, only the ordering of the node strength statistic in the glasso network can

reliably be interpreted, and that interpretations of the order of symptoms based on closeness or

betweenness centrality are heavily susceptible to sampling error.


27

Future directions

We only focused on stability analysis of cross-sectional network models. Assessing variability

on longitudinal and multi-level models is much more complicated and beyond the scope of current

paper; it is also not implemented in bootnet as of yet. We refer the reader to Bringmann et al. (2015)

for a demonstration on how confidence intervals can be obtained in a longitudinal multi-level setting.

We also want to point out that the results obtained here, for instance that the glasso method provides a

more robust estimation of the network structure and the centrality estimates than the partial

correlation procedure, may be idiosyncratic to the particular data used. Simulation studies will be

required to investigate how these and similar procedures perform in terms of stability when

parameters such sample size and number of nodes vary. Future development of bootnet will be aimed

to implement functionality for a broader range of network models, and we encourage readers to

submit any such ideas or feedback to the Github Repository

(www.github.com/sachaepskamp/bootnet).

While working on this project, two new research questions emerged: is it possible to form an

unbiased estimator for centrality indices in partial correlation networks, and consequently, how should

true 95% confidence intervals around centrality indices be constructed? As our example highlighted,

centrality indices can be highly unstable due to sampling variability, and the estimated sampling

distribution of centrality indices can be severely biased. At present, we have no definite answer to

these pressing questions. Given the current emergence of network modeling in psychology,

development of adequate CIs for centrality indices should thus have high priority.

Network stability has been a blind spot in psychological network analysis, and the authors are

aware of only one prior paper that has examined network stability (Fried et al., 2016). Remediating

this blind spot is of utmost importance if network analysis is to be added as a full-fletched

methodology to the toolbox of the psychological researcher.


28

References

Agresti, A. (1990). Categorical data analysis. New York, NY: John Wiley & Sons. APA. (2000). Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, text revision.

Washington, DC: American Psychiatric Association. Barzel, B., & Biham, O. (2009). Quantifying the connectivity of a network: The network correlation

function method. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 80(4). http://doi.org/10.1103/PhysRevE.80.046104

Bollen, K., & Stine, R. (1993). Bootstrapping goodness-of-fit measures in structural equation models. In K. Bollen & J. Long (Eds.), Testing structural equation models (pp. 111–135). Newbury Park, CA: Sage.

Borsboom, D., & Cramer, A. O. J. (2013). Network Analysis: An Integrative Approach to the Structure of Psychopathology. Annual Review of Clinical Psychology, 9(1), 91–121. http://doi.org/10.1146/annurev-clinpsy-050212-185608

Bringmann, L. F., Lemmens, L. H. J. M., Huibers, M. J. H., Borsboom, D., & Tuerlinckx, F. (2015). Revealing the dynamic network structure of the Beck Depression Inventory-II. Psychological Medicine, 45(4), 747–57. http://doi.org/10.1017/S0033291714001809

Bringmann, L. F., Vissers, N., Wichers, M., Geschwind, N., Kuppens, P., Peeters, F., … Tuerlinckx, F. (2013). A network approach to psychopathology: new insights into clinical longitudinal data. PloS One, 8(4), e60188.

Chernick, M. R. (2011). Bootstrap methods: A guide for practitioners and researchers (Second Edi). Hoboken, New Jersey: John Wiley & Sons.

Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mõttus, R., Waldorp, L. J., & Cramer, A. O. J. (2015). State of the aRt personality research: A tutorial on network analysis of personality data in R. Journal of Research in Personality, 54, 13–29.

Costantini, G., Richetin, J., Borsboom, D., Fried, E. I., Rhemtulla, M., & Perugini, M. (2015). Development of indirect measures of conscientiousness: Combining a facets approach and network analysis. European Journal of Personality, 29, 548–567. http://doi.org/10.1002/per.2014

Costenbader, E., & Valente, T. W. (2003). The stability of centrality measures when networks are sampled. Social Networks, 25, 283–307. http://doi.org/10.1016/S0378-8733(03)00012-1

Cramer, A. O. J., Waldorp, L. J., van der Maas, H. L. J., & Borsboom, D. (2010). Comorbidity: a network perspective. Behavioral and Brain Sciences, 33(2-3), 137–150.

Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26.

Epskamp, S. (2015a). bootnet: Bootstrap methods for various network estimation routines. Epskamp, S. (2015b). IsingSampler: Sampling Methods and Distribution Functions for the Ising

Model. Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph :

Network Visualizations of Relationships in Psychometric Data. Journal of Statistical Software, 48(4), 1–18. Retrieved from http://www.jstatsoft.org/v48/i04

Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (2016). Network Psychometrics. In P. Irwing, D. Hughes, & T. Booth (Eds.), Handbook of Psychometrics. New York: Wiley.

Foa, E. B., Riggs, D. S., Dancu, C. V, & Rothbaum, B. O. (1993). Reliability and validity of a brief instrument for assessing post-traumatic stress disorder. Journal of Traumatic Stress, 6(4), 459–473.


29

Foygel Barber, R., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567–607.

Foygel, R., & Drton, M. (2010). Extended Bayesian Information Criteria for Gaussian Graphical Models. Arxiv Preprint, 1–14.

Fried, E. I. (2015). Problematic assumptions have slowed down depression research: why symptoms, not syndromes are the way forward. Frontiers in Psychology, 6(306), 1–11. http://doi.org/10.3389/fpsyg.2015.00309

Fried, E. I., Bockting, C., Arjadi, R., Borsboom, D., Amshoff, M., Cramer, O. J., … Stroebe, M. (2015). From loss to loneliness: The relationship between bereavement and depressive symptoms. Journal of Abnormal Psychology, 124(2), 256–265.

Fried, E. I., Epskamp, S., Nesse, R. M., Tuerlinckx, F., & Borsboom, D. (2016). What are “good” depression symptoms? Comparing the centrality of DSM and non-DSM symptoms of depression in a network analysis. Journal of Affective Disorders, 189, 314–320.

Friedman, J. H., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.

Friedman, J., Hastie, T., & Tibshirani, R. (2014). glasso: Graphical lasso-estimation of Gaussian graphical models.

Fruchterman, T., & Reingold, E. (1991). Graph drawing by force-directed placement. Software: Practice and Experience, 21(11), 1129–64.

Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2015). mvtnorm: Multivariate Normal and t Distributions.

Goekoop, R., & Goekoop, J. G. (2014). A Network View on Psychiatric Disorders: Network Clusters of Symptoms as Elementary Syndromes of Psychopathology. PloS One, 9(11), e112734. http://doi.org/10.1371/journal.pone.0112734

Haslbeck, J. M. B., & Waldorp, L. J. (2015). Structure estimation for mixed graphical models in high-dimensional data, (January).

Hien, D. A., Wells, E. A., Jiang, H., Suarez-Morales, L., Campbell, A. N. C., Cohen, L. R., … Nunes, E. V. (2009). Multi-site randomized trial of behavioral interventions for women with co-occurring PTSD and substance use disorders. Journal of Consulting and Clinical Psychology, 77(4), 607–619. http://doi.org/10.1037/a0016227

Kalisch, M., Martin, M., Colombo, D., Hauser, A., Maathuis, M. H., & Bühlmann, P. (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software, 47(11), 1–26.

Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R (Vol. 65). Springer.

Lauritzen, S. L. (1996). Graphical Models. Clarendon Press. Liu, H., Lafferty, J. D., & Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of

high dimensional undirected graphs. The Journal of Machine Learning Research, 10, 2295–2328.

MacCallum, R. C., & Browne, M. W. (1993). The use of causal indicators in covariance structure models: some practical issues. Psychological Bulletin, 114(3), 533–41.

Mcnally, R. J., Robinaugh, D. J., Gwyneth, W. Y., Wang, L., Deserno, M. K., & Borsboom, D. (2014). Mental Disorders as Causal Systems : A Network Approach to Posttraumatic Stress Disorder. Clinical Psychological Science, 1–14. http://doi.org/10.1177/2167702614553230

Newman, M. (2010). Networks: an introduction. Oxford University Press.

Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443–460. http://doi.org/10.1007/BF02296207

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science,


30

349(6251), aac4716–aac4716. http://doi.org/10.1126/science.aac4716

Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.

Pearl, J. (2000). Causality: models, reasoning and inference (Vol. 29). Cambridge, UK: Cambridge Univ Press.

R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36.

Schmittmann, V. D., Cramer, A. O. J., Waldorp, L. J., Epskamp, S., Kievit, R. A., & Borsboom, D. (2013). Deconstructing the construct: A network perspective on psychological phenomena. New Ideas in Psychology, 31(1), 43–53.

Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Softwa, 35(3), 1–22.

van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. a, & Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific Reports, 4, 5918. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/25082149\nhttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4118196&tool=pmcentrez&rendertype=abstract

van Borkulo, C. D., & Epskamp, S. (2014). IsingFit.

Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Structural analysis in the social sciences (Vol. 8). http://doi.org/10.1093/infdis/jir193

Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393(6684), 440–2. http://doi.org/10.1038/30918


31

Appendix A: Loading the PTSD dataset

To download the dataset, go to https://datashare.nida.nih.gov/protocol/nida-ctn-0015 and click on

"CTN-0015 Data Files". The relevant data file is called “qs.csv”, which can be loaded into R by using

the default read.csv function:

Data <- read.csv("qs.csv", stringsAsFactors = FALSE)

This loads the data in long format, which contains a column with subject id's, a column with the

names of the administered items, and a third column containing the item responses. For network

analysis, we need the data to be in wide format. Furthermore, we need to assign that the response

"NOT ANSWERED" indicates a missing response and other responses are ordinal. Finally, we need

to extract relevant dataset at baseline measure for the PTSD symptom frequency scores. To do this,

we can utilize the dplyr and tidyr R packages as follows:

# Load packages:

library("dplyr")

library("tidyr")

# Frequency at baseline:

FreqBaseline <- Data %>%

filter(EPOCH == "BASELINE", grepl("^PSSR\\d+A$",QSTESTCD)) %>%

select(USUBJID,QSTEST,QSORRES) %>%

spread(QSTEST, QSORRES) %>%

select(-USUBJID) %>%

mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>%

mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK","2-4 TIMES

PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST

ALWAYS"))))%>%

'names<-'(seq_len(ncol(.)))


32

Supplementary materials: The bootstrap and centrality indices

The bootstrap is not a foolproof methodology that always works. There are, in fact, many

cases in which the bootstrap fails (Chernick, 2011). One such case is when the parameter of interest

has an expected value at the boundary of the parameter space. In bootstrapping edge weights (as

described in the main manuscript above), such as partial correlation coefficients, this is not a problem

because edge weights at the boundary of 1 or -1 rarely occur. However, when continuing network

inference, we do not use the edge weights directly but instead the edge strength: the absolute value of

the edge weight. Here, a boundary problem is present, as we expect many edges to have an expected

value of exactly zero (implying the absence of an edge). To exemplify, suppose we find an edge

weight of 0.01 in our original sample, with a standard error of 0.1. When bootstrapping, 95% of the

samples should fall roughly in the interval -0.19 to 0.21; the absolute value of these samples will,

almost always, be higher than the original sampled edge strength.

We discovered a two-sided bias when many edge weights are expected to equal zero (sparse

network structure). First, absolute edge weights, and as a result centrality indices, are highly biased

estimators of the true parameter. This makes sense: if the true edge weight is zero, any estimated

absolute edge weight based on a sample will be higher than zero. Second, bootstrapped absolute edge

weights are biased as well when the expected value is zero, because the bootstrap sampling

distribution is likely to be centered near zero leading to many higher absolute edge weights. To

exemplify, we simulated 10,000 pairs of independent variables (true correlation of zero) and

investigated the sample absolute correlation as well as the absolute correlation of a single

nonparametric and parametric bootstrap. The sample correlation was on average 0.081 higher than the

true correlation of 0, the nonparametric bootstrap was on average 0.032 higher than the sampled

absolute correlation and the parametric bootstrap was on average 0.033 higher than the sampled

absolute correlation.

Computing node strength takes a sum and computing closeness takes the inverse of a more

complicated sum of these biased values; adding up these biases leads to highly inaccurate estimates of

the true parameter. Indeed, we found that it is not uncommon for quantiles of the bootstrapped

strength and closeness to not contain the true parameter at all, nor even the parameter value based on


33

the original sample. This bias seemed to be mostly present using non-regularization estimation

methods and low sample sizes. Thus, constructing true CIs for centrality indices—containing the true

parameter 95% of the times—is a challenging research question, involving correcting for bias of the

sample due to true network structure and sample size as well as correcting for the bias in the bootstrap

samples. The current paper does not aim to solve this highly technical question, as we are not so much

interested in estimating a region around the true parameter value as that we are interested in showing

the sampling variability of these centrality indices, even if they are biased.

As explained above, the bootstrap cannot be used to form 95% CIs on centrality indices.

However, we can use the bootstrap to show the reader the importance of taking centrality instability

into account. If we assume that the estimated partial correlation matrix is the true network structure,

then the parametric bootstrap can be used to show the sampling distribution of such a network model:

ParametricBoot_pcor <- bootnet(FreqBaseline, 100,default = "pcor",

type = "parametric")

The 95% quantile regions can be plotted using the following codes:

plot(ParametricBoot_pcor, statistics = c("strength", "closeness",

"betweenness"), CIstyle = "quantiles")

which will return a warning on that the intervals cannot be interpreted as 95% CIs. The argument

CIstyle = "quantiles" makes sure quantiles are plotted. By default, bootnet plots intervals based on the

sample centrality plus and minus two times the standard deviation of the bootstraps. Which are more

interpretable, even though they still are not true 95% CIs.

Figure S1 shows the resulting plot. If the partial correlation network used is the true model,

then the red line shows the true centrality indices and the gray area the 95% quantile region of the

sampling distribution. The placement of the red line indicates the aforementioned bias in both

bootstrapping and sampling of centrality indices. The size of the gray area is large in all indices,

indicating that the order of centrality indices from any sample would be very hard to interpret. In the

sample on which these plots are based, none of the 1000 samples captured the true order of any of the

centrality indices, and the probability that the node with the true strongest centrality was correctly

detected was only 43%, 36% and 33% for strength, closeness and betweenness respectively.


34

These plots only show the sampling distribution if the used network is the true model---which

it is not. Therefore, these plots should be interpreted with care. However, they do highlight the

importance of stability analysis on centrality indices, as the sampling distribution of the true model is

likely not much smaller than the ones shown in these plots.

Figure S1: Parametric bootstrap on centrality indices based on the partial correlation network.

Supplementary references:

Chernick, M. R. (2011). Bootstrap methods: A guide for practitioners and researchers (Second Edi.).

Hoboken, New Jersey: John Wiley & Sons.

Documents

Estimating Psychological Networks and their Stability: a