32
Log Dimension Hypothesis 1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012

The Logarithmic Dimension Hypothesis

  • Upload
    kamin

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

MITACS International Problem Solving Workshop July 2012. The Logarithmic Dimension Hypothesis. Anthony Bonato Ryerson University. Workshop team. David Gleich (Purdue) Dieter Mitsche (Ryerson) Stephen Young (UCSD) Myunghwan Kim (Stanford) Amanda Tian (York). Friendship networks. - PowerPoint PPT Presentation

Citation preview

Page 1: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 1

The Logarithmic Dimension Hypothesis

Anthony BonatoRyerson University

MITACS International Problem Solving Workshop

July 2012

Page 2: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 2

Workshop team• David Gleich (Purdue)

• Dieter Mitsche (Ryerson)

• Stephen Young (UCSD)

• Myunghwan Kim (Stanford)

• Amanda Tian (York)

Page 3: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 3

Friendship networks• network of friends (some real, some virtual) form

a large web of interconnected links

Page 4: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 4

6 degrees of separation

• (Stanley Milgram, 67): famous chain letter experiment

Page 5: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 5

6 Degrees in Facebook?• 900 million users, > 70

billion friendship links• (Backstrom et al., 2012)

– 4 degrees of separation in Facebook

– when considering another person in the world, a friend of your friend knows a friend of their friend, on average

• similar results for Twitter and other OSNs

Page 6: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 6

Complex Networks• web graph, social networks, biological networks, internet

networks, …

Page 7: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 7

The web graph

• nodes: web pages

• edges: links

• over 1 trillion nodes, with billions of nodes added each day

Page 8: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 8

On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+…

Page 9: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 9

Key parameters• degree distribution:

• average distance:

• clustering coefficient:

|})deg(:)({| , iuGVuN ni

)(,

1

2),()(

GVvu

nvudGL

)(

1-1

)()( ,2

)deg(|))((| )(

GVxxcnGC

xxNExc

Page 10: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 10

Properties of Complex Networks• power law degree distribution

(Broder et al, 01)

2 some ,, bniN bni

Page 11: The Logarithmic Dimension Hypothesis

Power laws in OSNs (Mislove et al,07):

Log Dimension Hypothesis 11

Page 12: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 12

Small World Property• small world networks

(Watts & Strogatz,98)– low distances

• diam(G) = O(log n)• L(G) = O(loglog n)

– higher clustering coefficient than random graph with same expected degree

Page 13: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 13

Sample data: Flickr, YouTube, LiveJournal, Orkut

• (Mislove et al,07): short average distances and high clustering coefficients

Page 14: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 14

• (Zachary, 72)

• (Mason et al, 09)

• (Fortunato, 10)

• (Li, Peng, 11): small community property

Community structure

Page 15: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 15

(Leskovec, Kleinberg, Faloutsos,05):• densification power law: average degree is

increasing with time• decreasing distances

• (Kumar et al, 06): observed in Flickr, Yahoo! 360

Page 16: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 16

Geometry of OSNs?

• OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.)

• IDEA: embed OSN in 2-, 3- or higher dimensional space

Page 17: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 17

Dimension of an OSN• dimension of OSN: minimum number of

attributes needed to classify nodes

• like game of “20 Questions”: each question narrows range of possibilities

• what is a credible mathematical formula for the dimension of an OSN?

Page 18: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 18

Geometric model for OSNs• we consider a geometric

model of OSNs, where– nodes are in m-

dimensional Euclidean space

– threshold value variable: a function of ranking of nodes

Page 19: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 19

Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12)

• parameters: α, β in (0,1), α+β < 1; positive integer m• nodes live in m-dimensional hypercube (torus metric)• each node is ranked 1,2, …, n by some function r

– 1 is best, n is worst – we use random initial ranking

• at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated)

• each existing node u has a region of influence with volume

• add edge uv if v is in the region of influence of u nr

Page 20: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 20

Notes on GEO-P model

• models uses both geometry and ranking• number of nodes is static: fixed at n

– order of OSNs at most number of people (roughly…)

• top ranked nodes have larger regions of influence

Page 21: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 21

Simulation with 5000 nodes

Page 22: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 22

Simulation with 5000 nodes

random geometric GEO-P

Page 23: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 23

Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012)

• a.a.s. the GEO-P model generates graphs with the following properties:– power law degree distribution with exponent

b = 1+1/α– average degree d = (1+o(1))n(1-α-β)/21-α

• densification– diameter D = O(nβ/(1-α)m log2α/(1-α)m n)

• small world: constant order if m = Clog n– clustering coefficient larger than in comparable

random graph

Page 24: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 24

Spectral properties• the spectral gap λ of G is defined by the

difference between the two largest eigenvalues of the adjacency matrix of G

• for G(n,p) random graphs, λ tends to 0 as order grows

• in the GEO-P model, λ is close to 1• (Estrada, 06): bad spectral expansion in real

OSN data

Page 25: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 25

Dimension of OSNs

• given the order of the network n, power law exponent b, average degree d, and diameter D, we can calculate m

• gives formula for dimension of OSN:

Dn

nd

bb

Dnm

loglog

loglog

211

loglog

Page 26: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 26

6 Dimensions of Separation

OSN DimensionFacebook 7YouTube 6Twitter 4Flickr 4

Cyworld 7

Page 27: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 27

Uncovering the hidden reality• reverse engineering approach

– given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users

• that is, given the graph structure, we can (theoretically) recover the social space

Page 28: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 28

Logarithmic Dimension Hypothesis

• Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN– theoretical evidence GEO-P and MAG

(Leskovec, Kim,12) models–empirical evidence? – (Sweeney, 2001)

Page 29: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 29

Experimental design• supervised machine learning

– Alternating Decision Trees (ADT)– approach of (Janssen et al, 12+) based on earlier

work on PIN by (Middendorf et al, 05)• classify OSN data vs simulated graphs from GEO-P

model in various dimensions• develop a feature vector (graphlets, degree distribution

percentiles, average distance, etc) to classify the correct dimension

• ADT will classify which dimension best fits the data – cross-validation and robustness testing

Page 30: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 30

Example

Page 31: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 31

• preprints, reprints, contact:search: “Anthony Bonato”

Page 32: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 32