Pip Pattison University of Melbourne UKSNA, University of Greenwich, June 2013 A hierarchy of exponential random graph models for the analysis of social

Embed Size (px)

Citation preview

  • Slide 1
  • Pip Pattison University of Melbourne UKSNA, University of Greenwich, June 2013 A hierarchy of exponential random graph models for the analysis of social networks
  • Slide 2
  • Acknowledgments Joint work with Garry Robins, Peng Wang and Tom Snijders University of Melbourne Garry Robins, Peng Wang, Galina Daraganova, David Rolls University of Oxford Tom Snijders University of Manchester Johan Koskinen Swinburne University Dean Lusher
  • Slide 3
  • Outline 1.Structure in networks 2.The ERGM framework for network modelling 3.Hierarchy of dependence structures for ERGMs 4.Five networks 5.Applications
  • Slide 4
  • 1. Structure in networks
  • Slide 5
  • Cartwright and Harary: Psychological Review, 1956 We expect: negative ties to be bi-partite in form (or k-partite in generalisations) positive ties to be potentially clustered
  • Slide 6
  • Granovetter: American Journal of Sociology, 1973 We expect: closed triangles in strong ties local bridges to be weak
  • Slide 7
  • Jackson & Wolinksy: Journal of Economic Theory, 1996 We expect: disconnected cliques stars
  • Slide 8
  • Watts & Strogatz: Nature, 1998 We expect: High concentration of triangles Short paths Low density Absence of hubs
  • Slide 9
  • Degree effects: degree assortativity and dissassortativity (e.g. Newman, 2003) We expect: relatively high (or low) rates of connection among high- degree nodes
  • Slide 10
  • Burt: American Journal of Sociology, 2004 Robins (2009): We expect to see brokers who are: embedded in groups bridging to other groups
  • Slide 11
  • Bearman, Moody & Stovel: American Journal of Sociology, 2004 We expect: An absence of 4- cycles (and 3-cycles)
  • Slide 12
  • Jackson, Rodriguez-Barraquer & Tan: American Economic review, 2012 We expect: m-cliques but not (m+1)-cycles
  • Slide 13
  • An aside Paper Citations (WoS, June 26, 2013) Cartwright & Harary (1956) 534 Granovetter (1973)5833 Jackson & Wolinsky (1996) 416 Watts & Strogatz (1998)7572 Newman (2003) 507 Burt (2004) 491 Bearman, Moody & Stovel (2004) 133 Jackson, Rodriguez-Barraquer & Tan (2012) 0 Our fascination with network structure runs deep!
  • Slide 14
  • Other regularities in network structure Other hypothesised sources of regularity in network structure include: Homophily and heterophily effects (e.g. McPherson, Smith-Lovin & Cook, 2001) Consequences of social foci and other settings (Feld, 1985; Pattison & Robins, 2002) Embedding in geographical, organisational and sociocultural contexts (e.g. Daraganova et al, 2012; Lomi et al, in press; White, 1992) Interdependence or embeddedness with other networks (e.g. Granovetter, 1985; Padgett & McLean, 2006)
  • Slide 15
  • Harrison White on network ties Notably, almost all of these hypotheses about structural regularity are based on arguments about local interaction in networks: A social tie exists in, and only in, a relation between actors which catenates, that is entails (some) compound relation through other such ties of those actors. Thus it is subject to, and known to be subject to, the hegemonic pressures of others engaged in the social construction of that network (White, 1998)
  • Slide 16
  • 2. General modelling framework
  • Slide 17
  • Network models Network models should: reflect known and hypothesised processes for network tie formation (such as those just mentioned) be dynamic, where possible, and consistent with known or hypothesised dynamics allow us to test propositions about network structure and process allow us to understand the consequences of network structure and process For cross-sectional data, the exponential random graph modelling (ERGM) framework is convenient
  • Slide 18
  • Exponential random graph models (ERGMs) We regard the nodes of a network as fixed, and treat potential ties among nodes as variables that are dependent on exogenous attributes of the nodes and potential ties and, potentially, on one another. The form of assumed dependence among tie variables leads to a general form of a probability model (an exponential random graph model) for the ensemble of tie variables Additional simplifying assumptions The model can be estimated using MCMCMLE from an observation on the network (and relevant node- or dyad-level covariates) - see Snijders (2002)
  • Slide 19
  • Exponential random graph model (ERGM) Y(i,j) is a tie variable: Y(i,j) = 1 if node i is tied to node j, 0 otherwise Ensemble of tie variables: Y = [Y(i,j)] tie variables y = [y(i,j)]realisations P(Y=y) = (1/ ( )) exp{ p p z p (y)} Frank & Strauss (1986) z p (y) are network statistics p are corresponding parameters ( ) is a normalising quantity Network effects
  • Slide 20
  • 3. Dependence structures
  • Slide 21
  • Characterising the proximity of potential network ties Under what circumstances is the tie linking node a and node b conditionally dependent on the tie linking node c and node d? a cd b When each of actors a and b is already linked to both actors c and d, and conversely? Strict inclusion
  • Slide 22
  • Characterising the proximity of potential network ties Under what circumstances is the tie linking node a and node b conditionally dependent on the tie linking node c and node d? a cd b When each of actors a and b is already linked to at least one of actors c and d, and conversely? Inclusion
  • Slide 23
  • Characterising the proximity of potential network ties Under what circumstances is the tie linking node a and node b conditionally dependent on the tie linking node c and node d? a cd b When at least one of actors a and b is already linked to both actors c and d? Partial inclusion
  • Slide 24
  • Characterising the proximity of potential network ties Under what circumstances is the tie linking node a and node b conditionally dependent on the tie linking node c and node d? a cd b When at least one of actors a and b is already linked to at least one of actors c and d, and conversely? Distance criterion
  • Slide 25
  • A second dimension: varying path length a. Strict p-inclusion SI p (p>0) ab c d b. p-inclusion I p ab c d c. Partial p-inclusion PI p ab c d d. p-distance criterion D p ab c d Key: Red lines indicate existing paths of length p or less (p 0) Blue dashed lines indicate potential ties, Y ab and Y cd
  • Slide 26
  • The dependence hierarchy Pattison & Snijders, 2013) SI 1 I 0 = PI 0 D0D0 I1 I1 PI 1 D1 D1 SI 2 I2I2 PI 2 D2D2
  • Slide 27
  • Associated model configurations Each configuration is a subgraph of diameter p (p- club, Mokken, 1979) For p = 1: cohesive subsets a cd b SI p : Strict p-inclusion
  • Slide 28
  • Associated model configurations Each configuration has the property that every pair of edges lies on a cyclic walk of length (2p+2) For p = 1: closure a cd b I p : p-inclusion
  • Slide 29
  • Associated model configurations Each configuration has the property that every pair of edges lies on a cyclic walk of length (2p+2) or on a cyclic walk of length (2p+1) with an edge incident to a node on the cycle For p = 1: brokerage a cd b PI p : Partial p-inclusion
  • Slide 30
  • Associated model configurations Each configuration has the property that every pair of edges lies on a path of length p+2 For p = 1: connectivity a cd b D p : p-distance
  • Slide 31
  • Model configurations for the case of p = 0 SI 0 : not defined I 0 : each configuration is an edge PI 0 : each configuration is an edge D 0 : each configuration is such that every pair of edges lies on a path of length 2 Bernoulli or Erds- Rnyi model: edges are independent Markov model (Frank & Strauss, 1986)
  • Slide 32
  • The dependence hierarchy Pattison & Snijders, 2013) SI 1 (clique) I 0 = PI 0 (Bernoulli) D 0 (Markov) I 1 (social circuit) PI 1 (edge- triangle) D 1 (3-path) SI p (p-club) I p (cyclic walk of length 2p+2) PI p ((r+1)- path-(2(p - r)+1)-cyclic walk, 0 r p-1) D p (path of length p) Cohesion Closure Brokerage Connectivity
  • Slide 33
  • Other assumptions 1.Homogeneity: isomorphic configurations have equal parameters (Frank & Strauss, 1986) 2.Related effects: a single statistic for a family of related configurations, such as: m-stars m-triangles, m-2-paths m-edge-triangles (Snijders et al, 2006; Hunter & Handcock, 2006)
  • Slide 34
  • Resulting model effects often include: Edge: Propensity for edge to occur Alternating star: (Endogenous) propensity for edges to attach to nodes with edges (progressively discounted for additional edges) hence level of dispersion of degree distribution Alternating 2-path: Propensity for presence of shared partners (progressively discounted for additional shared partners) Alternating triangle: Propensity for an association between an edge linking nodes and their propensity for shared partners (progressively discounted for additional shared partners) (closure) Alternating edge-triangle: Propensity for an association between degree and closure (progressively discounted for higher degrees)
  • Slide 35
  • 4. Five networks
  • Slide 36
  • Gift-giving (taro exchange) among households in a Papuan village* (n = 22) Hage P. and Harary F. (1983). Structural models in anthropology. Cambridge: Cambridge University Press. Schwimmer E. (1973). Exchange in the social structure of the Orokaiva. New York: St Martins.
  • Slide 37
  • Interaction network in a university karate club (n = 34) Zachary W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452-473.
  • Slide 38
  • Kapferers tailor shop in Zambia, sociational (friendship and socioemotional) ties, time 2* (n = 39) *Kapferer B. (1972). Strategy and transaction in an African factory. Manchester: Manchester University Press.
  • Slide 39
  • An Australian government organisation (n=60): important ties
  • Slide 40
  • A dolphin community near Doubtful Sound, NZ* (n = 62) *D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Behavioral Ecology and Sociobiology 54, 396-405 (2003).
  • Slide 41
  • 5. Applications
  • Slide 42
  • Gift-giving (taro exchange) among households in a Papuan village* (n = 22) Hage P. and Harary F. (1983). Structural models in anthropology. Cambridge: Cambridge University Press. Schwimmer E. (1973). Exchange in the social structure of the Orokaiva. New York: St Martins.
  • Slide 43
  • Heuristic goodness of fit: degree statistics The t statistic locates the observed value of each statistic in the distribution of statistics associated with the ergm simulated using model parameters: if t 2, the observed statistic is within the envelope expected by the model For example: For the Bernoulli model: edge effect = -1.59 (est se =.17) statistic observed simulated mean (sd) t triangles107.481 (4.151) 0.607
  • Slide 44
  • Taro exchange: Bernoulli effectsestimatesstderr Edge-1.5900.174 effectsobservedmeanstddevt-ratio 2-star109132.539.0-0.604 3-star80141.667.4-0.913 triangles107.4814.151 0.607 SD degrees0.9631.6580.261-2.663 Skew degrees1.2540.2360.405 2.515 GCC*0.2750.1600.057 2.017 Mean LCC*0.3390.1510.066 2.851 Var LCC*0.0450.0440.028 0.044 *GCC is the global clustering coefficient, LCC is the local clustering coefficient
  • Slide 45
  • Taro exchange: edge-triangle models Model 2 effectsestimatesstderr edge-1.1800.524* AT(2.00) 2.2960.602* AET(2.00)-1.1470.385* Model 3 effectsestimatesstderr edge 1.4722.169 2-star-0.3690.401 triangle 4.6181.511* edge-triangle-0.5880.283* Both models suggest: Triadic closure A negative association between participation in closed triads and degree
  • Slide 46
  • Comparison of Models 2 and 3 Model 2Model 3 effectsobs meanSDt-ratio meanSDt-ratio 2-star109125.027.4-0.6108.315.7 0.1 3-star 80127.548.1-1.075.319.9 0.2 Triangles 109.91.9 0.110.02.8 -0.0 SD_deg0.961.50.2 -2.40.90.154 0.2 Skew _deg1.250.590.5 1.30.0420.428 2.8 GCC0.270.240.1 0.60.2860.098 -0.1 Mean LCC 0.340.390.10 -0.60.3460.115 -0.1 Var LCC0.040.110.03 -2.10.0770.033 -1.0 Model 3 appears to be more closely centred on the data
  • Slide 47
  • Taro exchange simulated from Model 3
  • Slide 48
  • The edge-triangle model for Taro exchange effectestimatesstderr edge 1.4722.169 2-star-0.3690.401 triangle 4.6181.511* ET-0.5880.283* A triadic closure effect, accompanied by a negative association between triadic closure and tie formation
  • Slide 49
  • Interaction network in a university karate club (n = 34) Zachary W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452-473.
  • Slide 50
  • Zacharys karate club effect estimatestderr edge-1.9301.553 AS(2.00)-0.5230.459 AT(2.00) 0.6240.191* A2P(2.00) 0.1300.022* Goodness of fit is good except for: effectobservedmeanstddevt-ratio 5-clique20.0800.3255.905 Positive tendencies for closure in both 3- and 4-cycles
  • Slide 51
  • Kapferers tailor shop in Zambia, sociational (friendship and socioemotional) ties, time 2* (n = 39) *Kapferer B. (1972). Strategy and transaction in an African factory. Manchester: Manchester University Press.
  • Slide 52
  • Model 1 effectsestimatesstderr edge-5.0101.567 AS (2.00) 0.1820.478 AT (2.00) 1.3950.286
  • Slide 53
  • Model 1: heuristic goodness of fit effectsobservedmeanstddevt-ratio 2-star29042680.054326.5070.686 3-star1375210781.1071786.5821.663 Triangle451337.12045.8392.484 4-clique448139.80235.0188.801 5-clique23415.2379.22723.709 2-triangle46172164.541461.4985.314 4-cycle38802574.071534.3252.444 1-edge-triangle1846311816.9242197.7263.024 2-edge-triangle13349569665.57316563.0973.854 SD degrees5.5103.9920.4493.382 Skew degrees0.380-0.0880.4181.118 Global CC0.4660.3770.0136.674 Mean Local CC0.4980.4090.0214.188 Var Local CC0.0310.0140.0082.061
  • Slide 54
  • Model 2 EffectParameterStd Err edge 0.22382.07641 AS (2.00)-0.88920.55314 AT (2.00) 1.25920.26601 A2P (2.00)-0.15450.02705
  • Slide 55
  • Model 2: heuristic goodness of fit Network statisticobservedmeanstddevt-ratio 2-star29042330.316975.6880.588 3-star137529240.4315024.2830.898 4-star5173427751.45717889.3001.341 5-star16103766417.66448948.2101.933 triangle451304.266133.4121.100 4-clique448139.81586.0933.580 5-clique23419.12217.80612.068 6-clique560.6991.47537.484 7-clique40.0040.06363.277 2-triangle46172056.6471241.8442.062 3-path3749327640.85315235.9810.647 4-cycle38802369.1841469.8601.028 1-edge-triangle1846310648.4876104.6921.280 2-edge-triangle13349563121.03242879.6381.641 Std Dev degree dist5.5103.8330.6092.753 Skew degree dist0.380-0.1830.4271.317 Global CC0.4660.3890.0213.620 Mean Local CC0.4980.4240.0312.362 Var Local CC0.0310.0240.0170.385
  • Slide 56
  • Model 3 EffectParameterStd Err edge -1.4422.658 AS (2.00)-0.5160.707 AT (2.00) 0.9970.294 * A2P (2.00)-0.0660.049 AET (2.00) 0.022 0.007 *
  • Slide 57
  • Model 3: goodness of fit Effectsobservedmeanstddevt-ratio 2-star29043652.456674.099-1.110 3-star1375217389.0314533.265-0.802 4-star5173462849.41621196.612-0.524 5-star161037182779.58676695.796-0.283 3-clique451519.753121.758-0.565 4-clique448366.373162.5940.502 5-clique234102.19986.6521.521 6-clique5612.45222.8891.903 7-clique40.7493.5340.920 2-triangle46174756.7351806.390-0.077 3-path3749351052.20613613.872-0.996 4-cycle38804963.7871679.661-0.645 1-edge-triangle1846321944.5936957.045-0.500 2-edge-triangle133495156834.44762461.843-0.374 SD degree dist5.5104.4160.4452.458 Skew degree dist0.380-0.0230.3401.183 Global CC0.4660.4230.0251.691 Mean Local CC0.4980.4480.0261.910 Variance Local CC0.0310.0100.0073.096
  • Slide 58
  • An Australian government organisation (n=60): important ties
  • Slide 59
  • Model for Australian government organisation effectsestimatesstderr edge-4.4140.414 * AS(2.00)0.1640.228 AT(2.00)0.5880.165 * A2P(2.00)0.0710.055 The model appears to fit well A modest and non-significant tendency towards dispersed degrees, and a moderate closure effect
  • Slide 60
  • A dolphin community near Doubtful Sound, NZ* (n = 62) *D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Behavioral Ecology and Sociobiology 54, 396-405 (2003).
  • Slide 61
  • Model 1 EffectParameterStd Err edge-1.94460.90718 alt-star(2.00)-0.55450.26904 alt-triangle (2.00) 0.99060.11496
  • Slide 62
  • Model 1: goodness of fit Effectobservedmeanstddev t-ratio # 2-stars923901.4284.6 0.08 # 3-stars18612092.51059.1 -0.22 # 1-triangles9577.0 26.2 0.69 # 2-triangles300166.7100.5 1.33 # 3-paths93259537.52662.3 -0.08 # 4-cycles278196.4121.1 0.67 # (1,1)-coathangers16441443.3767.6 0.26 # cliques of size 4278.66.0 3.03 # alt-k-indpt.2-path(2.00)705.4737.3195.8 -0.16 Std Dev degree dist2.93.00.5 -0.13 Skew degree dist0.290.860.37 -1.52 Global Clustering0.310.260.02 2.69 Mean Local Clustering0.260.300.04 -0.91 Variance Local Clustering0.040.070.02 -1.52
  • Slide 63
  • Model 2 EffectParameterStd Err edge-2.82300.92935 1-triangle 2.21230.26306 2-triangle-0.02420.03311 1-edge-triangle-0.04020.01129 alt-star(2.00)-0.13370.28843
  • Slide 64
  • Model 2: goodness of fit Effectobservedmeanstddev t-ratio # 2-stars923925.7159.7 -0.02 # 3-stars18611847.2469.9 0.03 # 3-paths93259515.51379.7 -0.14 # 4-cycles278268.7100.3 0.09 # cliques of size 42731.620.5 -0.23 # alt-triangle(2.00)177.5181.031.1 -0.11 # alt-indpt-2-path(2.00)705.4719.7113.7 -0.13 Std Dev degree dist2.932.820.25 0.44 Skew degree dist0.290.260.23 0.13 Global Clustering0.310.320.05 -0.18 Mean Local Clustering0.260.290.04 -0.70 Variance Local Clustering0.040.060.016 -1.29
  • Slide 65
  • In conclusion The dependence hierarchy systematically articulates possible proximity-based logics for conditional dependencies between network ties and yields: A versatile modelling framework to reflect a variety of hypothesised tie formation processes The illustrative applications demonstrate the potential value of this flexible framework, and suggest evidence for various hypothesised processes There is, of course, much more to be done, e.g.: evaluating model adequacy comparing models ensuring robust model specifications...
  • Slide 66