84
Statistical Latent Variable and Event Models for Network Data Padhraic Smyth Department of Computer Science University of California, Irvine January 7 th 2016 Workshop on Big Graphs: Theory and Practice UCSD

Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Statistical Latent Variable and Event Models for Network Data

Padhraic SmythDepartment of Computer Science

University of California, Irvine

January 7th 2016Workshop on Big Graphs: Theory and Practice

UCSD

Page 2: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 2

Acknowledgements

Students and Colleagues

Chris Dubois, Jimmy Foulds, Arthur Asuncion, Carter Butts, Zach Butler

Funding

Page 3: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 3

References

Multiplicative latent factor models for description and prediction of social networks

P. D. Hoff, Computational and Mathematical Organization Theory , 15(4), 2009.

Dyadic data analysis with amen

P. D. Hoff, available online, June 2015

A relational event model for social action

C. E. Butts, Sociological Methodology, 2008

A survey of statistical network models

A. Goldenberg, A. Zheng, S. Fienberg, E. Airoldi, Foundations and Trends in Machine Learning, 2009

Page 4: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 4

Email Contact Network

Data from HP Labs

Page 5: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 5

Goals

• Learn a predictive distribution over future events in the network

– Incorporate node and edge attributes

• Be able to answer queries such as

– What will the network look like at time t + k?

– How likely is it that node i will communicate with node j

– How much influence does node i have on node j?

• Understand the dynamics of the network process

Page 6: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 6

C. Butts, Science, 2009

Page 7: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 7

Descriptive/Exploratory Analysis of Networks

Long history in social network analysis, complex systems, etc

– Degree distributions, power laws, scale-free networks

– Clustering effects

– Betweenness and centrality

Often focused on broad network properties

Very useful….but does not support inferential or predictive statements about specific nodes or edges

Page 8: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 8

Statistical Network Modeling

Basic idea: hypothesize a (simple) generative model for the data given parameters….and then infer parameters given observed data

• Learning

– Systematic methods for estimating network parameters

• Prediction/Querying

– reduces to computation of conditional probabilities and expectations

• Noise/Missing Data

– Systematic way to handle real-world noise

• Covariates

– Relatively straightforward to integrate “non-network” information

Page 9: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 9

Modeling Approaches

• Static model

– Aggregate event data into a single network

– e.g., static model for binary edges

• Discrete time models

– Aggregate event data into temporal windows, e.g., per week

• Continuous-time models

– Model event rates directly

– e.g., stationary Poisson (simple)

– e.g., non-stationary Poisson (more complex)

• Sequences of dependent events

– Cascade models

Page 10: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 10

Static Network Models

Page 11: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 11

Network Notation

N actors (node set)

• Generally assume that set of actors is known and fixed

Page 12: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 12

Network Notation

N actors (node set)

• Generally assume that set of actors is known and fixed

Edges between actors: adjacency matrix Y

= 1 : an edge between actors i and j

: real-valued or counts: indication of strength of relationship

Page 13: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 13

Network Notation

N actors (node set)

• Generally assume that set of actors is known and fixed

Edges between actors: adjacency matrix Y

= 1 : an edge between actors i and j

: real-valued or counts: indication of strength of relationship

Covariates/Attributes X

• e.g., for each actor, for each edge

Page 14: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 14

Example of a Y matrix:Counts of 200,000 email messages between 3000 individuals over 3 months

Page 15: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 15

Page 16: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 16

Sidenote: Graphical Models

It is tempting to think of our N x N network as being related to a graphical model on N variables

However, in network modeling, the edges are viewed as the random variables, not the nodes

This hints at the complexity of the problem, i.e., O(N2) variables, and exponential in N possible graph realizations

Page 17: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 17

Network Models via Regression

Page 18: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 18

Network Models via Regression

Mean effectRow effect Column effect

Page 19: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 19

Binary Undirected Edges

Page 20: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 20

Likelihood

Note that edges are conditionally independent given parameters

Page 21: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 21

Special Case: Erdos-Renyi Graph

Page 22: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 22

Special Case: Erdos-Renyi Graph

Page 23: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 23

Likelihood

Page 24: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 24

Likelihood

We can learn the q’s using maximum likelihood or Bayesian methods, using a variety of techniques such as gradient methods, MCMC, variational approximations, etc

Page 25: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 25

Adding Node and Edge Covariates

CovariatesWeights

Example:

Page 26: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 26

Adding Latent (Hidden) Variables

Hypothesize that the nodes are embedded in a latent (hidden) space

The probability of a link is higher if nodes are closer in this space

Given a set of observed links can we infer a set of “good locations”?

Page 27: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 27

Adding Latent (Hidden) Variables

Hypothesize that the nodes are embedded in a latent (hidden) space

The probability of a link is higher if nodes are closer in this space

Given a set of observed links can we infer a set of “good locations”?

Old idea in social science, e.g., McFarland and Brown, “Social distance as metric…”, 1973

See also more recent word embedding methods

Page 28: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 28

Latent Space Model

K-dimensional real-valued latent space vector for each node

Intuition:

• Embed nodes in a K-dimensional latent space, K much smaller than N

• Probability (or log-odds) of edge(i,j) decreases as i and j become further away

Hoff, Raftery, Handcock, JASA, 2002

Page 29: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 29

Figure from Hoff, Raftery, Handcock, 2002

Page 30: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 30

Additive Latent Interactions

This model implies transitivity:

e.g., if (A,B) close and if (B,C) close then (A,C) close (and has high probability)

…but some relations are not transitive, e.g., “conflict”

Page 31: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 31

Multiplicative Latent InteractionsHoff, 2009

K x K real-valued matrix(learned from the data)

Page 32: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 32

Multiplicative Latent InteractionsHoff, 2009

K x K real-valued matrix(learned from the data)

Hoff (NIPS 2008) showed that for a diagonal W matrix (the latent eigenmodel) this model is a strict generalization of the distance model

For directed networks or rectangular matrices we can replace zj with vj , yielding links to matrix factorization

Page 33: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 33

Building Blocks for Network Modeling

See also P. Hoff, Dyadic data analysis with amen, ArXiv, 2015

Page 34: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 34

Building Blocks for Network Modeling

e.g., g = log(p/1-p) Network density

Row and column effects

See also P. Hoff, Dyadic data analysis with amen, ArXiv, 2015

Page 35: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 35

Edge covariates and regression

weights

Building Blocks for Network Modeling

See also P. Hoff, Dyadic data analysis with amen, ArXiv, 2015

Page 36: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 36

K-dimensional latent vector

per node

Similarity function on

latent vectors

Building Blocks for Network Modeling

See also P. Hoff, Dyadic data analysis with amen, ArXiv, 2015

Page 37: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 37

Stochastic Block Model

Each node assumed to belong to 1 of K “stochastically equivalent” blocks

z vectors are K-dimensional indicators, e.g., z = [0, 0, 1, 0]

Within-block and between-block edge probabilities at block level, K x K matrix W

Nowicki and Snijders, 2002

Page 38: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 38

Stochastic Block Model

Each node assumed to belong to 1 of K “stochastically equivalent” blocks

z vectors are K-dimensional indicators, e.g., z = [0, 0, 1, 0]

Within-block and between-block edge probabilities at block level, K x K matrix W

Nowicki and Snijders, 2001

(Figure from Goldenberg et al, 2010)

Page 39: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 39

Stochastic Block Model

Each node assumed to belong to 1 of K “stochastically equivalent” blocks

z vectors are K-dimensional indicators, e.g., z = [0, 0, 1, 0]

Within-block and between-block edge probabilities at block level, K x K matrix W

Example:

Interaction:

Nowicki and Snijders, 2001

Page 40: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 40

Binary Relational Feature Model

Each node can “turn on” any subset of K binary features (latent)

z vectors are K-dimensional binary vectors, e.g., z = [0, 0, 1, 1]

K x K weight matrix W captures feature interactions

Miller, Jordan, Griffiths, NIPS 2009

Page 41: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 41

Binary Relational Feature Model

Hidden Features

Actors

Presence of edge between actor i and actor j is (e.g.)a logistic function of a weighted sum of features they have in common

Estimation: based on MCMC or variational EM

Miller, Jordan, Griffiths, NIPS 2009

Page 42: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 42

Binary Relational Feature Model

Example:

Interaction:

Miller, Jordan, Griffiths, NIPS 2009

(Original proposed as an infinite-dimensional non-parametric model)

Page 43: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 43

Predictions on NIPS Coauthorship Data

From Miller, Griffiths, Jordan, 2009

Page 44: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 45

Other Models

Mixed membership stochastic blockmodel (MMSB), Airoldi et al, 2008

Each node: a probability vector zi over K possible groups

W is a matrix of Bernoulli probabilities

Page 45: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 46

Other Models

Mixed membership stochastic blockmodel (MMSB), Airoldi et al, 2008

Each node: a probability vector zi over K possible groups

W is a matrix of Bernoulli probabilities

Relational topic model, Chang and Blei 2009

For modeling linked documents, e.g., via citations

Each node = document = K-dimensional topic probability vector

Various possible combination functions to reflect topic similarity

Page 46: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 47

General Formulation

e.g., g = log(p/1-p) Network density

Row and column effects

Edge covariates and regression

weights

K-dimensional latent vector

per node

Similarity function on

latent vectors

Page 47: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 48

Scalability

• The O(N2) term in the likelihood is problematic for scalability

• However, there is hope

– In most real-world social networks the number of edges in a network often scales as O(N) not O(N2)

…but the number of non-edges still scales as O(N2)

• This suggests factoring the likelihood into 2 pieces

– A product over edges, with O(N) terms

– A product over non-edges, with O(N2) terms that we approximate with O(N) terms

– This idea has been discovered (and rediscovered) several times

Page 48: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 49

Approximating the Log-Likelihood

Can approximate this term with O(N) randomly-sampled non-edges

See Raftery et al, 2012, J. Computational and Graphical Statistics

This idea can also be combined with stochastic gradient methods

Page 49: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 50

Stochastic Variational Inference: a-MMSB model

From Gopalan et al, 2012

Red: stochastic gradient with mini-batchBlue: conventional gradient batch algorithm

Page 50: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 51

Variations and Extensions

• Sender and receiver effects

– Latent vectors for sender and receiver roles can be different

• Rectangular matrices, bipartite graphs

– rows and columns each get their own latent vectors

• Multi-way arrays and tensors

• Bayesian estimation

– Fully Bayesian methods: infer posterior locations in latent space

– MAP and regularized variations: enforce sparsity in solutions

• Non-linear “deep” models

– Could incorporate non-linearities in various ways

Page 51: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 52

Dynamic Networks…..Adding Time

Page 52: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 53

Networks over Time

• Many network problems are dynamic rather than static

– e.g., social relationships are changing over time

– instantaneous communication events (emails, phone calls)

• Edges, nodes, and covariates may all be evolving over time

– We will assume node set is fixed and edges and covariates may change

– Systematic temporal effects often important (TOD, DOW, seasonality)

• Different ways to define networks over time

– Snapshots at time t

– Aggregation over time windows

– Continuous time models

Page 53: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 54

Discrete-Time Models

Yt represents the network at discrete time t

Data D = {Y1 …… Yt ………. YT }

Example

actors = students in a school

Yt = friendships between students measured in month t, t = 1, … 12

Interest is often in network dynamics and evolution

e.g., Markov models for P( Yt+1 | Y t )

(See work of Tom Snijders, Eric Xing, and others)

Page 54: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 55

Figure from Carter Butts

Page 55: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 56

General Formulation

In principle we can add time-dependence to any or all terms

Page 56: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 57

General Formulation

In principle we can add time-dependence to any or all terms

One approach is to make the z’s time-dependent

i.e., allow latent features of each actor change over time

Example: linear Gaussian dynamics in z-space

- Sarkar and Moore (2005) for actors’ latent-space positions

- Fu, Song, and Xing (2009) for actors’ mixed membership vectors

Page 57: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 58

Dynamic Relational Binary Feature Model

Recall for the static version zi = k-dimensional binary vector, e.g., (1, 0, 1, 0 , 1) f( zi , zj ) = z’i W zj , where W is a k x k matrixCommon set of k features across all actors

Foulds, Asuncion, DuBois, Butts, Smyth 2011

Page 58: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 59

Dynamic Relational Binary Feature Model

Recall for the static version zi = k-dimensional binary vector, e.g., (1, 0, 1, 0 , 1) f( zi , zj ) = z’i W zj , where W is a k x k matrixCommon set of k features across all actors

Dynamic version (Dynamic Relational Features)• Assume discrete time • The kth feature for actor i, zik (t) is a binary hidden Markov process• Features can turn on, persist, or turn off at each time step• For infinite version, new features can be born over time

• Inference via MCMC – tricky, but works

Foulds, Asuncion, DuBois, Butts, Smyth 2011

Page 59: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 60

Hidden Features

Actors

Time

Presence of edge i,j attime t depends on interactionof actor i’s and j’s feature vectors at that time t

Page 60: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 61

Example of DRIFT Predictions on Enron

Page 61: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 62

Continuous-Time Data and Models

Relational events: < i, j, t >

yt is an edge between some pair i and j at time t

Birth-death edges: each yt has start and end times

Instantaneous edges: each yt is (effectively) instantaneous

• Data D = { y1 …… yt ………. yT }

In a certain sense there is no graph!

Example

actors = students in a school

yt = text message between 2 students at time t

Interest is often in rates and patterns of communication

e.g., Poisson rates for y i,j given network history up to time t

Page 62: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 63

Multinomial Models for Relational Events

• Let be the rates of Poisson processes for each pair of nodes in a network

• Assume for simplicity that these processes are conditionally independent given model parameters

• We can decompose the network process into

– A global rate l which generates events globally

– A choice process: given an event, which pair generated it, i.e.

Page 63: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 64

Marginal Product Mixture ModelDuBois and Smyth, 2010

Multinomial over N2

possible edgesMixture over K unobserved groups

Page 64: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 65

Marginal Product Mixture ModelDuBois and Smyth, 2010

Multinomial over N2

possible edgesMixture over K unobserved groups

Distribution over senders

for group k

Distribution over receivers

for group k

Marginal probability of

group k

Page 65: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 66

Marginal Product Mixture ModelDuBois and Smyth, 2010

Multinomial over N2

possible edgesMixture over K unobserved groups

Distribution over senders

for group k

Distribution over receivers

for group k

Marginal probability of

group k

Edge events (rather than nodes) belong to latent groups (unlike MMSB)

Straightforward to learn via EM or collapsed Gibbs sampling

Page 66: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 67

LikelihoodDuBois and Smyth, 2010

Product over events

Product over pairs with non-zero

counts

For large sparse networks number of non-zero pairs << N2

Similar to use of multinomial versus Bernoulli models for text

Page 67: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 68

Application to Email Data:200,000 email messages among 3000 individuals(data from Eckmann, Moses, Sergi, 2004)

Most likely Edge Assignments by Group

Figures from Dubois and Smyth, 2010

Page 68: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 69

International Relations Data40,000 events2700 actors171 action types

(King, 2003)

Page 69: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 70

Prediction and Evaluation

• Use future data to evaluate predictive power and compare models

– e.g., predict network at time t+1 given network up to time t

• Metrics

– Log score = log probability of events that actually occurred

– Brier/MSE style scores

– Ranking/ROC scores

Page 70: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 71

Simple Baseline for Comparison

• We could predict the likelihood of i and j communicating based directly on i and j’s history

– Multinomial with O(N2) entries

– Can use smoothing to combat sparsity

• Problems

– Data can be extremely sparse for large N – smoothing is non-informative, and does not “borrow strength” from the graph

• Nonetheless this is a useful baseline when evaluating predictions

– Historically, few papers evaluate models predictively

– Even fewer compare their models to simple baselines

Page 71: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 72

From DuBois and Smyth, 2010

Page 72: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 74

Relational Event Model

Time-varying Poisson rate for edge i,j

Baserate

Sender and receiver effects

Butts, 2009

Page 73: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 75

Relational Event Model

Time-varying Poisson rate for edge i,j

Baserate

Sender and receiver effects

p-dim vector of regression parameters

p-dim vector of historical statistics

on edge i,j

Butts, 2009

Page 74: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 76

Relational Event Model

Time-varying Poisson rate for edge i,j

Baserate

Sender and receiver effects

p-dim vector of regression parameters

p-dim vector of historical statistics

on edge i,j

Butts, 2009

Edge rates are time-varying functions of historical features

Results in a piecewise constant (between events) Poisson process

Features can include conversation effects, recency, persistence, etc

Page 75: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 77

Parameter Estimation

• Likelihood includes terms for all events that occurred and all events that did not occur, for all inter-event times

– Computation of likelihood is O( T N2 ), T = number of events

– Some computational tricks possible to improve scalability

– See Vu et al (ICML 2011, NIPS 2011) for extensions to large social networks and citation networks

• Can use point estimates (optimization) or Bayesian inference (MCMC)

Page 76: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 78

Applications?

• Modeling classroom interactions in education[DuBois, Butts, McFarland, Smyth, J Math Psych, 2013]

• Understanding and predicting citation patterns among documents[Vu et al, NIPS 2011, ICML 2011; Foulds and Smyth, EMNLP 2013]

• Modeling communication patterns among individuals[DuBois, Smyth, KDD 2010; F oulds et al, AI Stats 2011]

• Clustering individuals in email networks over time[Navaroli, DuBois, Smyth, MLJ, 2013]

Page 77: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 79

Modeling Cascades

• Given a structural network with binary directed/undirected edges

AB

C

D

E

F

Page 78: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 80

Modeling Cascades

• Given a structural network with binary directed/undirected edges

AB

C

D

E

F

Page 79: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 81

Modeling Cascades

• Given a structural network with binary directed/undirected edges

AB

C

D

E

F

Page 80: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 82

Modeling Cascades

• Given a structural network with binary directed/undirected edges

AB

C

D

E

F

Page 81: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 83

Modeling Cascades

• Given a structural network with binary directed/undirected edges

AB

C

D

E

F

Page 82: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 84

Modeling Cascades

• Given a structural network with binary directed/undirected edges

• A cascade is a sequence of “node infections” (may have time-stamps)

– E.g., a post that spreads on a network such as Facebook or LinkedIn

• We observe a set of cascades, e.g.,

{A, B, E}, {B, A, D, F}, {A, B, C, E, F}, ….

• Given cascades …. make inferences about the “infection process”

AB

C

D

E

F

Page 83: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 85

Prior Work

• Ideas based on epidemics in networks

– Analyze how infection spreads as a function of network structure

• e.g, work by Kempe, Kleinberg, Newman, and many others

– Typically assume a single homogenous infection rate b

– Typically does not look at learning from data

• Statistical models (more recent)

– Define a generative model (i.e., likelihood) for cascades on a network

– Example

• Assumes cascades are independent

• Assume heterogeneous infection rates for different edges

• Define a probabilistic model of infection spreads to next node

– Learn parameters, e.g., a matrix of infection rates b

(see work by Manuel Gomez-Rodriguez and colleagues)

Page 84: Statistical Latent Variable and Event Models for Network Datacseweb.ucsd.edu/~slovett/workshops/big-graphs-2016/talks/smyth.pdf · Dyadic data analysis with amen P. D. Hoff, available

Padhraic Smyth, January 2016: 86

Summary

• Static networks

– Statistical models can be built up from basic building blocks

– Latent representations (“node embeddings”) can be broadly useful

• Dynamic networks

– Modeling networks over time can be more straightforward than static case

– More natural representation of the underlying data

– Notion of prediction is clearer

– Can build these models using same building blocks as for static networks

• Scalability of the learning algorithms is a general issue….but there are promising approaches emerging