AURIN BPNet web interface user guidemunk.csse.unimelb.edu.au/~stivalaa/MelNet/bpnet_web_userdoc.pdf · AURIN BPNet web interface user guide June 2013 1 Introduction BPNet is a program

AURIN BPNet web interface user guide

June 2013

1 Introduction

BPNet is a program for the simulation and estimation of exponential random graph (or p∗) models onbipartite networks. This document is the user guide for the web interface developed for BPNet usagewithin the AURIN portal; documentation for the BPNet program itself can be found along with theoriginal BPNet Windows application at http://sna.unimelb.edu.au/PNet. A description ofthe concepts and algorithms implemented by BPNet is given in Wang et al. (2009b); Wang (2013); amore general introduction to exponential random graph models for social networks can be found inRobins et al. (2007); Lusher et al. (2013). Other recommended reading includes Wang et al. (2009a),Harrigan (2007) and Pattison et al. (2011); much of the content of this guide is derived from thesesources.

A bipartite network is one that has two types of nodes, with links only between nodes of differenttypes. Such a network can be used to represent the association between two sets of nodes, where eachset is a different entity, for example between directors and companies. In this case a link between twonodes represents that a director (one node) is director of a company (the other node). More generally,such an “affiliation network” represents relationships between people and groups. We might also use abipartite network to represent other kinds of two-level relationships, such as between people and places.

By convention in social network analysis we refer to the two types of nodes as “A” and “P”, andin drawing networks “A” nodes are represented as blue squares and “P” nodes as red circles. As arunning example in this guide we will use a collaboration network with 51 researchers and 43 projects;a visualization is shown in Figure 1. A link in this network represents that a researcher is a member of aproject.

Given such a social network, why might we be interested in building a statistical model (and specif-ically an exponential random graph model [ERGM]) of it? More detailed descriptions and argumentsare given in Robins et al. (2007); Lusher et al. (2013), but we can summarize by saying that an ERGMhelps us to understand how and why social network ties (links) arise, respecting the assumption thatsocial network ties are likely to be interdependent. Potentially competing processes such as homophily,reciprocity and transitivity can be included in a single principled statistical model, which then allowsus to determine whether there are significantly more or fewer of these types of substructures in the net-work than would be expected purely by chance. When we include attributes of actors (nodes) in thenetwork we can also use the model to determine whether an observed property of the network is due toendogenous (self-organizing) structural effects or to node-level effects (such as homophily).

The three main functions of BPNet are:

Simulation: simulating network distributions with specified model parameter values.

Estimation: estimating specified exponential random graph model (ERGM) parameters for a givennetwork.

1

http://sna.unimelb.edu.au/PNet

Figure 1: Collaboration bipartite network with 51 researchers (blue squares) and 43 projects (red circles).Figure generated by the BPNet web interface, which uses the visualization features of Data-DrivenDocuments (Bostock et al., 2011).

Goodness-of-fit: testing the goodness-of-fit (GoF) of a specified model to a given network with a givenset of parameters.

These three functions are represented by three different tabs on the BPNet panel, as shown in Fig-ure 2. A typical workflow in using BPNet is to load some network data, perform an estimation procedureto estimate some parameters, then perform goodness-of-fit and perhaps simulation with those parame-ters. To facilitate this process, the “Structural Parameters” panel is the same in all three tabs, and thevalue (shown in the “Estimate” column) of any parameters that are estimated remain valid. Hence anestimation can be performed in the Estimate tab, then after switching to the Goodness-of-fit tab, thoseparameters remain in the “Structural Parameters” and will be used in the goodness-of-fit procedure whenthe “Start” button is pressed on the Goodness-of-fit tab. Switching back to the Estimate tab and pressing“Start” again will start the estimate procedure at the current value of the parameters (as shown in the“Structural parameters” panel). (For those familiar with the original BPNet Windows GUI, this is similarto using the “Update” button to start an estimation with previous estimated parameters — in the AURINBPNet web interface pressing “Start” will always use the current parameter values so there is no needfor an “Update” button. To start from a clean slate, the “Reset” button is provided to clear and deselectall parameters, or “Reset estimates” to clear just the current estimate values.)

Note that the value in the “Estimate” column is editable; values for parameters can be manuallyentered here to start the estimation (or other process) at the specified value.

Descriptions of the parameters in the “Structural Parameters” panel are given in Appendix A.We will use a small collaboration network as an example in this manual to demonstrate how the

BPNet interface is used. An example of an actual application of an ERGM fitted to a bipartite networkcan be found in the examination of corporate interlocks (affiliations of directors to corporate boards ofdirectors) in Harrigan and Bond (2013).

2

Figure 2: Initial appearance of the BPNet panel with no network selected, showing the three tabs forEstimate, Goodness-of-Fit, and Simulate. Initially the Estimate tab is at the front, no parameters areselected, and the options have their default values.

3

Figure 3: Visualization of the example collaboration network in the BPNet panel.

2 Selecting and visualizing network data

BPNet requires network data, which consists of at least the network specification (which nodes havelinks to which other nodes), and optionally attributes for the nodes. Network data from AURIN isconnected to BPNet by “shopping” for it in the AURIN portal. Once the network data you want isselected, it can be visualized by pressing the “Visualize” button on the “Network data” panel, as shownin Figure 3, where our example collaboration network has been selected. This visualization is createdwith the D3 software (Bostock et al., 2011), and it is interactive: nodes can be dragged to rearrange thenetwork layout, and the window re-sized to give the network diagram more (or less) space to fill. The“Export” button can be used to export the diagram as SVG, which can be edited with an SVG editor suchas Inkscape1 and converted to other graphics formats for inclusion in documents. This is how Figure 1was created.

3 Estimation

Fitting an ERGM to network data, that is, estimation of the parameters of a model, is performed onthe “Estimation” tab, as shown in Figure 2. We will use the “Exercise 14” network from Pattison et al.(2011) as an example to demonstrate the estimation procedure. It is often difficult to get the model toconverge, and there are some “rules of thumb” on how to best achieve convergence. Some of these are

1http://inkscape.org

4

http://inkscape.org

described in Harrigan (2007) in the context of the original PNet, but with suitable modification theyare often also applicable to BPNet. An ERGM can easily be fitted to the “Exercise 14” collaborationnetwork using just the Markov parameters L (Edge / density) and Sp2 (two-star for class P), however itis more difficult to get convergence with L and Sa2 (two-star for class A) or L, Sa2 and Sp2.

Why are we selecting these particular parameters? First, as described in Lusher and Robins (2013),configurations are often nested inside each other. For instance, the L (edge or arc) configuration is thesimplest, consisting simply of a single edge, and hence all other configurations contain this within them.Hence you would always either select the L parameter for estimation, or select “Fix graph density” inthe “Estimation options” panel to fix the value of the L parameter to the observed value in the network.The “Fix graph density” option will fix the number of ties in the network, hence the L parameter for thedensity of the network is not relevant and should not be selected.

On the assumption that we want to estimate the value for the two-star for class A parameter, thisimplies we must also estimate the edge (L) parameter (or fix its value with the “Fix graph density”option).

But what does the Sa2 parameter mean? In this context it is researcher two-stars (since class “A”nodes are researchers), meaning the propensity of a researcher linked to one project to then obtain asecond tie to another project. In general, the number of two-stars is directly related to degree varianceand centralization in the network (Lusher and Robins, 2013).

First, select “Exercise 14” as the network data for BPNet, as shown in Figure 3. It is not necessaryfor the estimation procedure to view the graph with the “Visualize” button, although it is always usefulto gain some intuition about the sort of structures that might be prevalent in the graph.

Select the parameters to be estimated by selecting the appropriate checkboxes in the “StructuralParameters” panel: L and Sp2 to start off with for this example. Initial estimates for the parameters canalso be entered by clicking in the “Estimate” column for the parameter. By default they will initially bezero, which we will use for this example.

Some estimation configurations can be modified in the “Estimation Options” panel.

Fix graph density: If this is selected, then the L (Edge) parameter is fixed to the actual observed valuein the network. Hence this option is mutually exclusive with estimating the L parameter: if the Lparameter is selected in the “Structural Parameters” panel and the “Fix graph density” option ischecked, then the L parameter will be automatically de-selected.

Number of Subphases: Increasing this from the default value of 5 may help convergence, however theestimation process will then take longer.

Gaining Factor (a-value): A smaller value than the default of 0.01 can be used if the starting parametervalues are close to convergence.

Multiplication Factor: Values between 2 and 100 are reasonable depending on the network size and thenumber of parameters in the model being estimated. Large networks with higher order models tendto need large multiplication factors, however increasing this value means the estimation processwill take longer. The default value is 10.

Number of Iterations in Phase 3: This is the number of samples used in a simulation phase to checkconvergence of the model by comparing graph statistics of simulated samples against the observednetwork. The default value is 500.

Max. Number of Estimation Runs: The estimation process will be repeated up to this many times,using the estimated values at the end of each run as the starting values for the next run. This is anautomated equivalent of pressing the “Start” button multiple times — each estimation will startwith the parameter values estimated from the previous estimation run. It will stop when the model

5

Figure 4: Selected parameters ready to estimate a model for the “Exercise 14” collaboration network.

is converged or the maximum number of estimation runs has been reached. The default value is1, meaning only 1 estimation run is performed.

For this example, leave all the options at their default values to start with. The “Reset” button under“Estimation Options” will reset the options to their default values. Note, it does not reset the structuralparameter estimates; the “Reset” and “Reset Estimates” buttons under the “Structural Parameters” panelwill do this (“Reset” will reset all the estimates and deselect all parameters; “Reset Estimates” will justreset the estimates).

Once the parameters to be estimated (L and Sp2) have been selected, the screen should look likeFigure 4. Press the “Start” button to run the estimation. After a few seconds, the results should appear,looking something like Figure 5.

The parameters we selected now have estimated values, as well as counts (the number of times thatstructure appears in the actual network) and error statistics. An asterisk in the “Significant” columnindicates that the absolute value of the estimate is greater than twice the value of the standard error. Weare also interested in the convergence statistic (t-ratio), which we would like to have an absolute valueof less than 0.1. Values larger than this mean that the model is not properly converged and the estimatesare not suitable for interpreting the network structure. However, they may serve as a better starting pointfor the next estimation run.

In Figure 5 we can see that only the L parameter is marked as significant, and both t-ratios are greaterthan 0.1 in absolute value. Therefore we press “Start” again to continue the estimation procedure (using

6

Figure 5: Results of estimation of the example network.

7

Figure 6: Results of continuing estimation of the example network.

the current estimates as starting values). Now (as shown in Figure 6), both t-ratios are acceptable butstill only one parameter is significant. This is telling us that this network has a significant negative Edgeeffect, but the 2-P-Star effect is not significant.

ADS:Maybe add code to GUI to make t-ratio value or entire row green or something to indicate aconverged parameter.

4 Goodness-of-fit

Switch to the “Goodness-of-Fit” tab. As shown in Figure 7, the parameters just estimated remain in the“Structural Parameters” table.

Some goodness-of-fit options can be modified in the “Goodness-of-Fit Options” panel:

Fix graph density: If this is selected, then the L (Edge) parameter is ignored and the density of thenetwork is fixed to the actual observed value in the network.

Burn-in: This is the number of iterations to burn-in the goodness-of-fit process. That is, the numberof iterations of the Markov chain Mote Carlo (MCMC) process to run before any actual samplesare taken. The default value of 100000 does not usually need to be adjusted, unless the network isquite large (provided the model is well converged).

8

Figure 7: The Goodness-of-Fit tab, after switching from Estimate.

Number of Iterations: This is the number of iterations of the MCMC process to run to obtain graphssimulated from the parameters in order to compare to the observed network data to computegoodness-of-fit statistics. The default value of 1000000 is sufficient for network with less thanaround 100 nodes but may need to be increased to up to ten times that value for networks withhundreds of nodes.

When you press the “Start” button to start the GoF process, the structural parameters visible in the“Structural Parameters” panel will be used as the values for the parameter estimates to test the goodness-of-fit, leading to results similar to those shown in Figure 8.

The most important statistics for the goodness-of-fit is the t-statistic for each configuration (thevalue in the t-ratio column). If the parameter was selected in the model, then the t-ratio should be small,preferably below 0.1, as it was in the estimation. This is definitely the case for the results shown inFigure 8.

However, that only shows the goodness-of-fit for the parameters we estimated. Normally, you wouldcheck the goodness-of-fit for all statistics, not just those in the model we chose. It is easy to selectall parameters by selecting the checkbox to the left to the “Name” heading in the heading row of the“Structural Parameters” panel. After pressing “Start”, the results should look something like those shownin Figure 9.

ADS:Maybe should make the GoF tab select all parameters automatically by default?For a model to be a good fit, we would typically want all the t-ratios to be less than 2 in absolute

9

Figure 8: Results of running the goodness-of-fit procedure with only the estimated parameters selected.

10

Figure 9: Results of running the goodness-of-fit procedure on all parameters.

11

Figure 10: The Simulation tab, after switching from Goodness-of-Fit.

value (and, as mentioned previously, those that were estimated desirably less than 0.1). This is definitelynot the case for this model, where some parameters (such as Sa2 and Sa3) have very high t-ratios andclearly do not fit the data well. If you wish to fit these statistics well, you need to consider furtherparameters in the model.

5 Simulation

The simulation process generates networks based on estimated (or supplied) parameters, to allow prop-erties of the simulated graphs to be examined in order to make decisions about the properties of themodel specification. In other words, how different parameter values (or combinations of them) mayproduce graph distributions with different characteristics.

Switch to the “Simulate” tab. As shown in Figure 10, estimated parameters remain in the “StructuralParameters” table, and any open windows, such as the goodness-of-fit results, also remain visible. Notethat the “Visualize” button is disabled, since no simulated graphs have been generated yet.

Usually, simulation is used with estimated parameters. However, note that (as in all functions),the parameter values in the “Estimate” column are editable, so simulation can be performed with anyparameter values at all.

Some simulation options can modified in the “Simulation Options” panel:

12

Fix graph density: If this is selected, then the L (Edge) parameter is ignored and the density is fixed tothe actual observed value in the network.

Starting graph density: The initial density (between 0 and 1) in the simulation process.

Burn-in: This is the number of iterations to burn-in the goodness-of-fit process. That is, the numberof iterations of the Markov chain Mote Carlo (MCMC) process to run before any actual samplesare taken. The default value is 100000. Larger networks and networks with higher densities mayrequire longer burn-in.

Number of Iterations: This is the number of iterations of the MCMC process to run to obtain graphssimulated from the parameters. The default value of 1000000 is sufficient for network with lessthan around 100 nodes but may need to be increased to up to ten times that value for networkswith hundreds of nodes.

Number of Samples to Return: This is the number of samples to take from the total number generatedby the MCMC process. The default value of 1000 means that one in every 1000 of the generatedgraphs is used (since 1

1000 × 1000000 = 1000).

Get sample graphs: If checked, this will get the actual sampled graphs so that they can be visualizedin the browser by pressing the “Visualize” button.

Display histograms: If checked, histograms of the selected parameters will be displayed, showing thefrequency of counts of each parameter in the sampled graphs generated by the simulation process.

Display all statistics: If checked, a table of the counts of all parameters in all sampled graphs will bedisplayed.

Select the “Get sample graphs” checkbox and then press “Start” to run the simulation process. Aftera minute or so, the results should look something like Figure 11. (Note that if “Get sample graphs” isnot checked then the “Visualize” button will remain disabled, as no sample graphs will be obtained.)

Histograms for each of the selected parameters will be displayed, showing the distribution of eachstructure in the graphs sampled from the simulation process. These can be used to help decide whether ornot the process is generating graphs from the same family as the observed network: the observed countof a structure should be centered in the distribution generated by the simulation process. Histogramsare simply a useful visual aid for this process: quantitative measures are provided by the goodness-of-fitfunction. The “Save histogram” button on each histogram window allows the histogram to be saved toyour computer as a PNG file.

Another visual aid to determining how well the model parameters fit the observed network is tovisualize graphs sampled from the simulation process. Each time the “Visualize” button is pressed, avisualization of a randomly chosen graph from the sampled graphs will be displayed. Figure 12 showstwo such graphs. Note that they do not closely resemble the actual network (Figure 1). This is notsurprising, as we have already determined that this model is not a particularly good fit to the network(Section 4).

Note that the goodness-of-fit function is essentially the same as the simulation function, but thenfollowed by statistical comparisons between simulated graph samples and the actual network. The“Simulate” tab allows you to perform the same simulations as used in goodness-of-fit, but then examineproperties of the simulated graphs in detail.

13

Figure 11: The Simulation tab, showing histograms after the simulation process has been run.

14

Figure 12: The Simulation tab, showing visualizations of some graphs sampled from those generated bythe simulation process.

15

6 Fitting a social circuit model

In the previous sections we fitted a model with only Markov parameters (L and Sp2) and found that themodel was not a good fit to the network. Now, using the same data, we will fit a model with social circuitparameters (see Appendix A) and see if the goodness-of-fit is improved.

Using the same network data as in Section 3, on the Estimation tab select the parameter L, and thethree parameters K-Sa (alternating k-star for class A), K-Sp (alternating k-star for class P) and K-Ca(alternating A cycles). Keep all the starting values at the default value of 0 and the options at theirdefault values. Then press “Start” to run the estimation. On the first run, the t-ratios might not be below0.1. If they are not, keep pressing “Start” until they are (it should only take a few extra runs). The resultsshould come back in a few seconds looking like Figure 13.

What does the K-Sa parameter mean? In the case of this example, it is the researcher k-star parameter(since class A nodes are researchers). This can be interpreted as the “researcher popularity” effect,whereby researchers with ties have an increased probability of receiving further ties. Similarly, K-Sp isthe “project popularity” effect. In general, when this parameter is positive, it describes a situation akin tothe “Matthew effect” or “preferential attachment”, whereby popular nodes become more popular. Moretechnically, a positive parameter can be interpreted as centralization in the degree distribution.

The K-Ca (alternating A cycles) parameter refers in this example to alternating researcher cycles.These are sets of 4-cycles with more researchers than projects (see Appendix A). This parameter is ahigh-order generalization of the C4 (four-cycle) parameter. The latter parameter can be interpreted in thiscontext as saying (if it has a positive value) that the probability of a researcher p joining a project withsome other researcher q on it is higher if that researcher already shares another project with researcher q.In other words, researchers tend to share multiple projects. The more general K-Ca parameter extendsthis to more than one other researcher. Further details of these parameters can be found in Wang et al.(2009b); Wang (2013).

Now switch to the Goodness-of-Fit tab, select all the parameters (by selecting the checkbox at theleft of the heading row in the “Structural Parameters” panel) and press “Start” to run the goodness-of-fitprocedure. After a few seconds, the results should appear looking something like Figure 14.

Note that we now have acceptable t-ratios for all parameters, indicating that this model is a good fitto the data. Also, notice that the Mahalanobis distance for this model is much smaller than for the firstmodel (see Figure 9), indicating that it is a better fit to the network.

The only significant parameter is a positive K-Ca (alternating A cycles). This indicates that a re-searcher is more likely to join a project if researchers with whom he or she is also collaborating on otherprojects are working on that project. In short, researchers tend to cluster around shared projects.

A qualitative assessment of the fit of the model can be obtained by visualizing some samples ofgraphs simulated from the estimated parameters. Switch to the “Simulate” tab, select “Get samplegraphs” and press “Start” to run the simulation. When it has finished, press “Visualize” a couple oftimes to get a couple of sample graph visualizations. The results should look something like Figure 15.Note the resemblance of the sample simulated graphs to the actual network (Figure 1), in contrast to thesample graphs of a model that did not fit well (Figure 12).

7 Fitting models with actor attributes

So far we have only considered structural parameters, that is, those related purely to structural featuresof the network. However, we would usually also want to consider parameters related to attributes ofthe actors (nodes), such as (for example) age, gender, and so on. Three kinds of attributes can be used:binary, continuous, and categorical. They each have their own sets of parameters, detailed in Table 2 inAppendix A.

16

Figure 13: Results of running the estimation procedure with social circuit parameters.

17

Figure 14: Results of running the goodness-of-fit procedure with social circuit parameters.

18

Figure 15: Results of visualizing some graphs sampled from the simulation procedure with social circuitparameters.

19

Figure 16: Actor attribute parameters window.

We will continue the collaboration network example by including two actor attributes (on researchersonly): gender, and whether the researcher is a core member of the group. Both are treated as binaryattributes (with male coded as true). We can then ask some additional questions:

1. Once we take into account the propensity to collaborate within members from a given university,are there any structural effects remaining? That is, is the Core variable sufficient to explain thenetwork structure?

2. Is there any association between gender and network structure?

Actor attribute data is obtained with the network data from the AURIN portal. If the network selectedcontains attribute data, then the “Show attribute parameters” button, which is normally disabled, will beenabled. For example, select the “Exercise 14 with Attributes” network data. The “Show attributeparameters” button will then be enabled as this data contains actor attributes. Pressing this button willdisplay a window allowing actor attribute parameters to be selected, as shown in Figure 16. (Note thatwhen the window is visible, the button is again disabled, since only one instance of the window can existat a time).

As can be seen in Figure 16, the “Gender” and “Core” actor attribute parameters are now availablefor selection. We will use the RA attribute for Gender and Core, meaning respectively male researcheractivity and core member researcher activity. The meanings of the actor attribute parameters can be

20

Figure 17: Estimated parameters for a dyadic independent model.

hard to convey verbally, hence the descriptions in Table 2 in Appendix A are shown graphically. TheRA parameter (Gender-RA and Core-RA in our example) is shown with a shaded square and unshaded(dotted outline) circle linked to it. This conveys that the meaning of the RA parameter is the tendencyof an actor with the attribute to form a link (regardless of the attributes of the other node).

To examine the questions we asked of this data above, we will start with a dyadic independent model.That is, in addition to the actor attribute parameters Core-RA and Gender-RA, we will select only thestructural parameter L (edge). Selecting these parameters and pressing “Start” to run the estimationprocedure should result in something like Figure 17. (You may have to press “Start” two or three timesto get the t-ratios below 0.1).

Evaluating the goodness-of-fit of the model is done by switching to the “Goodness-of-Fit” tab, se-lecting all attributes (using the checkbox at the left of the title row of both the “Structural Parameters”and “Actor Attribute Parameters” panels as a shortcut) and pressing “Start”. The results should look likeFigure 18.

Several parameters (such as Sa3 and Gender-TscA) have very large t-ratios, indicating that the modeldoes not fit very well. We can try to do better by selecting in addition the social circuit structuralparameters K-Sa, K-Sp and K-Ca as we did in the model without attributes. Note that, as we are changingthe model, it is advisable to use the “Reset Estimates” (or “Reset” to clear all selections in addition)buttons on the “Estimate” panel to start again from 0 estimates of the parameters. Once these parametersare reset then selected, and the “Start” button pressed to run the estimation (possibly two or three times

21

Figure 18: Goodness-of-fit results for the dyadic independent model.

22

Figure 19: Estimated parameters for a model with social circuit parameters as well as attributes.

to get the t-ratios below 0.1), the results should look like Figure 19.Evaluating the goodness-of-fit for this model, by selecting all parameters on the “Goodness-of-Fit”

tab as before, should result in something like Figure 20.The model with social circuit structural parameters as well as attributes is a much better fit than the

dyadic independence model: all the t-ratios are acceptable, and the Mahalanobis distance is much lower.We can use this model to answer the questions we posed earlier. First, the Core-RA attribute is

positive and significant, so there is a tendency for collaboration within members from a given university.However there are still significant structural effects remaining, specifically a negative K-Sa parameterand a positive K-Ca parameter. So there is a negative researcher popularity effect evident (when Genderand Core member attributes are considered), as well as the positive alternating A cycles effect that wesaw before. This indicates that a researcher is more likely to join a project if researchers with whom heor she is also collaborating on other projects are working on that project.

Second, the Gender-RA effect is not significant, indicating that gender is not a significant variablein determining the presence of a network tie.

8 Dyadic covariates

So far we have considered both structural properties of the network, and properties of the nodes them-selves (actor attributes). Another type of variable that might be considered is properties defined between

23

Figure 20: Goodness-of-fit results for the model with social circuit parameters as well as attributes.

24

Figure 21: Geographical location of some hypothetical households (blue squares) and social clubs (redcircles)

pairs of nodes, rather than of nodes themselves. Such variables are called dyadic covariates.An example of a dyadic covariate with particular relevance to nodes with geospatial data is the

geographical distance between nodes. For instance consider the (invented) households and social clubpremises located on a map in Figure 21 with the network of membership ties between people and socialclubs shown in Figure 22. Given this data, we might hypothesize that people are more likely to belongto social clubs that are close to their home address than those that are further away. Using the distancebetween household addresses and social club addresses as a dyadic covariate will allow us to affirm ordeny this hypothesis, while controlling for purely endogenous structural effects of the network.

We will create two models, one with only structural parameters, and one including the dyadic co-variate of distances between households and social clubs.

Figure 23 shows the results of running the estimation procedure with only structural parameters (thatis, not using the dyadic covariate at all). ADS:Note that the visualization of the network in this figureshows the same network as Figure 22, however it is laid out purely by the D3 force-driven diagramlayout algorithm without using any geospatial information. In the final AURIN integration we wouldwant nodes with geospatial information to be drawn on a map in the appropriate locations, as in Fig-ure 21, which was created with Quantum GIS (which is what I used to manually construct this example).Figure 22 was created from the distance matrix exported from Quantum GIS with an R script that usesMDS to find co-ordinates of the nodes given the distance matrix.

Figure 24 shows the results of running the goodness-of-fit procedure with this model. We can seethat the model fits quite well, with the exception of the dyadic covariate of distance between homeaddress and social club address. The only significant parameter is a positive value of Sa2 (actor two-star), meaning people have a propensity to belong to a second social club once they already belong toone.

Figure 25 shows the results of running the estimation procedure with the same structural parametersselected as in the previous model, with the dyadic covariate of distance between home addresses andsocial clubs added. Figure 26 shows the results of the goodness-of-fit procedure for this model. We can

25

●

●

●

●

●

●

●●

●

●

12

3

4

5

6

7

8

9

10

11

12 13

14

15

16

17

18

1920

21

221

2

3

4

5

6

7

8

9

10

11

12 13

14

15

16

17

18

1920

21

221

2

3

4

5

6

7

8

9

10

11

12 13

14

15

16

17

18

1920

21

221

2

3

4

5

6

7

8

9

10

11

12 13

14

15

16

17

18

1920

21

22

Figure 22: Bipartite social network showing membership ties between people (blue squares) and socialclubs (red circles), with the location of the square and circles reflecting the geographical location ofhome addresses and social club premises, respectively.

26

Figure 23: Results of estimation of the example social club network, using only structural parameters.

27

Figure 24: Results of running goodness-of-fit on the structural model of the example social club network.

28

Figure 25: Results of estimation of the example social club network, using the dyadic covariate param-eter.

29

Figure 26: Results of running goodness-of-fit on the dyadic covariate model of the example social clubnetwork.

30

see that, compared to the results of the model with only structural parameters, the Mahalanobis distanceis lower, and also the t-ratio of the dyadic covariate is very low (as it should be, since it was estimatedin the model). Hence, this model fits the data better than the previous model which did not include thedyadic covariate. The dyadic covariate parameter is significant and negative, meaning that people areless likely to belong to a social club if the distance between their house and the club is large, confirmingour hypothesis suggested by the visualization of the network in Figure 22.

31

A BPNet graph statistics

Table 1: Structural parameter statistics

Name DescriptionMarkov parameters

L Edge (density)

Sa2 Two-star for class A (2-A-star)

Sp2 Two-star for class P (2-P-star)

Sa3 Three-star for class A (3-A-star)

Sp3 Three-star for class P (3-P-star)

L3 Three-path

C4 Four-cycle

High-order parameters

KSa alternating k-star for class A (k-A-star)

KSp alternating k-star for class P (k-P-star)

KCa alternating A cycles

KCp alternating P cycles

KCa2 same as KCa but allows different λ

32

Table 1: (continued)

Name Description

KCp2 same as KCp but allows different λ

Table 2: Actor attribute parameter statistics. Shaded nodes havethe attribute and unshaded (dotted outline) nodes may or may nothave the attribute, the name of which is indicated with attr.

Name Descriptionactor attribute parameters (A)

binaryattrRA

attrTscA

attrTsoA1

attrTsoA2

attrC4A1

attrC4A2

continuous

attrRAc

attrTscAc

33


Name Description

attrTsoAcs

attrTsoAcd

attrC4Acs

attrC4Acd

categorical

attrMatching 2PathA

attrMismatching 2PathA

attrMatching 4CycleA

attrMismatching 4CycleA

actor attribute parameters (P)binary

attrRP

34


Name Description

attrTscP

attrTsoP1

attrTsoP2

attrC4P1

attrC4P2

continuous

attrRPc

attrTscPc

attrTsoPcs

attrTsoPcd

attrC4Pcs

35


Name Description

attrC4Pcd

categorical

attrMatching 2PathP

attrMismatching 2PathP

attrMatching 4CycleP

attrMismatching 4CycleP

actor attribute parameters (A&P)binary

attrRAPcontinuous

attrRAPc

36

References

M. Bostock, V. Ogievetsky, and J. Heer. D3: Data-driven documents. IEEE Transactions on Visualiza-tion and Computer Graphics, 17(12):2301–2309, 2011.

N. Harrigan. PNet for Dummies: An introduction to estimating exponential random graph (p∗) modelswith PNet, 2007.

N. Harrigan and M. Bond. Differential impact of directors’ social and financial capital on corporateinterlock formation. In Lusher et al. (2013), pages 260–271.

D. Lusher and G. Robins. Formation of social network structure. In Lusher et al. (2013), chapter 3,pages 16–28.

D. Lusher, J. Koskinen, and G. Robins, editors. Exponential Random Graph Models for Social Networks.Number 35 in Structural Analysis in the Social Sciences. Cambridge University Press, New York,2013.

P. Pattison, G. Robins, P. Wang, D. Lusher, and D. Rolls. Computer Exercises and Notes for the Work-shop on Theories and Methods for Understanding Human Social Networks. MelNet Social NetworkGroup, Department of Psychology, The University of Melbourne, July 2011.

G. Robins, P. Pattison, Y. Kalish, and D. Lusher. An introduction to exponential random graph (p∗)models for social networks. Social Networks, 29:173–191, 2007.

P. Wang. Exponential random graph model extensions: Models for multiple networks and bipartitenetworks. In Lusher et al. (2013), chapter 10, pages 115–129.

P. Wang, G. Robins, and P. Pattison. PNet Program for the Simulation and Estimation of ExponentialRandom Graph (p∗) Models: User Manual. Department of Psychology, The University of Melbourne,2009a.

P. Wang, K. Sharpe, G. L. Robins, and P. E. Pattison. Exponential random graph (p∗) models foraffiliation networks. Social Networks, 31:12–25, 2009b.

37