22
A Statistical Framework for Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David A. Bader, Eric Gilbert

Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

A Statistical Framework forStreaming Graph AnalysisJames Fairbanks, David Ediger, Rob McColl, David A. Bader, Eric Gilbert

Page 2: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Problem

James Fairbanks, ASONAM 2013

In order to understand social media, we must understand the evolution of relationships in streaming data.

● How can we detect change?● What is a significant change?● Are two sets of vertices significantly different?● How can we visualize 10,000+ vertices?● Which vertices look anomalous?

We look to statistical analysis for guidance on these questions.

1

Page 3: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Challenge

James Fairbanks, ASONAM 2013

● Graph data is big, sparse, irregular, messy, high dimensional

● Statistics works best on dense, regular, clean, low dimensional data

2

Page 4: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Solution● Use graph theoretic computations to embed the graph

in a low-dimensional space● Embedding is not topology preserving● Do Machine Learning and Statistics in Euclidean Space

Wikipedia articles Scatter plot of vertices in feature space

3James Fairbanks, ASONAM 2013

Page 5: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Related Work

James Fairbanks, ASONAM 2013

● Tracking Earthquakes [Sakaki, et al., 2010]

● Rumors about Earthquakes [Mendoza, et al., 2010]

● London Riots and Hashtags [Glasgow and Fink, 2013]

● Streaming Clustering Coefficient [Ediger, Riedy, et al., 2011]

● Atlanta Floods, H1N1 [Ediger, et al., 2010]

● Dynamic Visual Analysis [Federico, et al., 2012]

4

Page 6: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Definitions

James Fairbanks, ASONAM 2013

Vertex [Features, Metrics, Statistics]

A vertex statistic associates a number to each vertex at each time step.

Graph Kernels

computational subroutines that compute vertex features or maintain a data structure on top of the graph

5

Page 7: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Examples

James Fairbanks, ASONAM 2013

Vertex Features● Degree● Size of connected

component● Geodesic distance● Local clustering

coefficient● PageRank● Betweenness Centrality

Graph Kernels● Counting neighbors● Shiloach-Vishkin connected

components● Breadth First Search (BFS)● Counting Neighborhood

intersection● Power Iteration ● Brandes 2001

6

Page 8: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

A high performance, dynamic graph data structure withsemantic and temporal properties

● Supports concurrent streaming data sources and analysis● Scalable on shared-memory Intel x86 platforms and Cray XMT● Open source and free (BSD License)● http://www.stingergraph.com

STINGER

James Fairbanks, ASONAM 2013 7

Page 9: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Data Set

James Fairbanks, ASONAM 2013

● Hurricane Sandy public Tweets [28 Oct, 12 Nov 2012]

● 1,238,109 mentions

● 662,575 unique users

● Batches of 10,000 updates

● Update interval: 1 batch represents ~3 hours of Tweets

photo credit: NASA Earth Observatory

8

Page 10: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Clustering Coefficient

James Fairbanks, ASONAM 2013

● Where tri(v) is the number of 3-cycles containing v● Measures how tightly knit the graph is at the local

level[Watts, Strogatz, 98]

● Compute in time

9

Page 11: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Clustering Coefficient

James Fairbanks, ASONAM 2013

NJ Landfall

● Counting vertices that have increasing or decreasing clustering coefficient

● Model as stochastic process for forecasting

10

Page 12: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Temporal Correlation

James Fairbanks, ASONAM 2013

● Defined for any quantity measuring strength of association

● For Pearson’s correlation

● formula

● quantifies strength of association between successive measurements

11

Page 13: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Correlation Decay

James Fairbanks, ASONAM 2013

● New edges change vertex statistics

● Correlation measures of forgetfulness of vertex statistic

● Bigger graph implies less impact

12

Page 14: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

The centered discrete derivative of a vertex feature:

Derivatives

James Fairbanks, ASONAM 2013 13

Page 15: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Anomaly Detection

James Fairbanks, ASONAM 2013

In graphs:● What is an anomalous vertex in a graph?● In Social Media, who uses the service in a novel way?● A vertex with edges that look different.

From statistics:● Outlier: a point in a region of space with very low

probability density. ● Points close to the outliers in space, are rare.● If we can estimate the true density from a finite

sample, then we can find outliers.

14

Page 16: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Outlier Detection Features

James Fairbanks, ASONAM 2013

● Mean(CC)● Var(CC)● Mean(Deriv(CC))● Var(Deriv(CC))

● Gaussian Radial Basis Function● Radius 0.3● By choice 5% of the data is labeled

outlier

15

Page 17: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Outlier Detection

James Fairbanks, ASONAM 2013

Used a one Class SVM because of multimodal features

16

Page 18: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Validation

James Fairbanks, ASONAM 2013

● Inlier and Outlier distributions differ

● Outliers more uniformly distributed

● Mixing in each scatter plot means all dimensions are necessary

17

Page 19: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Conclusions

James Fairbanks, ASONAM 2013

● Separating computation into graph algorithms, then machine learning and statistics phase allows leveraging best techniques from both fields.

● Applying multivariate outlier detection methods to streaming graphs reveals two distinct distributions of vertices.

● These feature based methods enable dynamic visualization of much larger graphs than traditional two dimensional embeddings.

18

Page 20: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Acknowledgment of Support

James Fairbanks, ASONAM 2013

Page 21: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Future Work

James Fairbanks, ASONAM 2013

Explore predictive ability in feature space

Page 22: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David

Signal Processing

James Fairbanks, ASONAM 2013

Estimate Periodicity. Filtering out small deviations and trends.