Upload
xiaohan-zeng
View
568
Download
3
Embed Size (px)
Citation preview
Social Network AnalysisWhat it is, what we can learn from it, and how we can do it
Backstrom & Kleinberg, 2014
Predict your partner
1.3m users in a relationship, 379m people, 8.6b connections
Predict your partner
1.3m users, 379m people, 8.6b connections
Accuracy: 60%
Andris et al., PLoS ONE, 2015
What is it?
Social networks connect people
Social network
A social structure that represents the relationships between people
Social network
A graph model that represents the dyadic relationships between people
Nodes and Edges
Nodes and Edges
Nodes and Edges
Nodes and Edges
Network
A graph model that represents the dyadic relationships between entities
Ahn et al.,Sci. Rep., 2011
Weighted network
Links can have a weight
Directed network
Links can be directed
Rosvall & Bergstrom,PNAS, 2009
Network = Graph
Network analysis = Graph theory
Network analysis= Graph
theory+statistics+physics
What can we learn from it?
Six degrees of separation
80-20 rule of social popularity
Grouping: Group similar people together
Ranking: Find the most influential people
Six degrees of separation
Six degrees of separationAny person in the world needs only to take at most six steps to reach any other person
How to measure how separated we are?
Diameter
Longest shortest path among all pairs
How many steps a person has to take to reach anyone
Average path length
Shortest path lengths averaged among all possible pairs
The small-world network
Small-world network
Milgram’s experiment
Omaha & Wichita -> Boston
64 out of 296 reached destination
Average path length 5.5
Watt’s Experiments
60,000 email users to reach 18 targets in 13 countries
Dodds et al., Science, 2003
Back-of-the-envelope calculation
Back-of-the-envelope calculation
Erdos number
Mathematicians’ co-publishing network
Average path length – 4.65
Bacon number
Actors’ co-starring network
Average path length – 3.65
1
2
1
(some mathematicians and papers thatnobody cares
about)
5
2
Be nice to the old lady slowly crossing the street
80-20 rule of social popularity
How to measure the popularity of the nodes?
Degree distribution
The probability distribution of the number of links of a node
N=10000, p=0.02
Degree distribution
http://en.wikipedia.org/wiki/Erdos-Renyi_model
What does a real network look like?
Mahadevan et al., SIGCOMM, 2007
Newman, SIAM Review, 2003
Power-law distribution
Theory vs Reality
H. Jeong et. al, Nature (2000)
Power-law distribution
Long-tail distribution: Nodes with extremely large number of links have non-trivial chance to appear
Power-law distribution
80-20 rule of social network: A minority of nodes have a majority of links
What’s wrong with the random network model?
Barabasi-Albert model
When a new node joins the network, it connects to popular existing nodes
Barabasi-Albert model
When a new node joins the network, it connects to popular existing nodes
The probability is proportional to the existing node’s number of links
Scale-free network
Matthew Effect: Richer-get-richer
Links is the “wealth” in social network
Like wealth, social popularity follows 80-20 rule
Rank the nodes according to their important
PageRank
Relative importance of nodes
PageRank
Probability of opening a page when surfing the Internet
PageRank
Probability of arriving at this page from other pages that link to it
PageRank
Probability of arriving at this page from other pages that link to it, or randomly opening this page
Group similar nodes together
Communities
Friends from middle school, college, and work
How do we group them?
Guimera & Amaral, Nature, 2005
How to measure the tendency to form communities?
Modularity
How well the network can be separated into modules
For every module,
For every module, calculate the number of links within the module
For every module, calculate the number of links within the module, minus its expectation
Compare to its expectation
Maximize modularity
NP hard
Simulated annealing: slow, accurate
Louvain method: greedy, fast, local-minimum
Guimera et al., PNAS, 2005
Dynamics on network
SIR model in NetLogo
Multiplex network
Facebook, Twitter, LinkedIn
Radicchi & Arenas, Nature Physics, 2014
De Domenico et al., PNAS, 2014
How do we do it?
Python + Spark
networkxmatplotlibpyspark
Co-purchase network
Who bought the same deals as you did?
Data
User 1, Deal AUser 1, Deal BUser 1, Deal CUser 2, Deal AUser 3, Deal B…
1 A
2 B
3 C
4 D
1 A
2 B
3 C
4 D
1 A
2 B
3 C
4 D
1
2
3
4
1
2
34
IPython notebook
Diameter: 6
Average path length: 2.57
Largest groups
Group 1- 395 females, 201 males- Average age: 42.4
Group 2- 228 females, 83 males- Average age: 38.9
Group 3- 51 females, 27 males- Average age: 34.1
Group 1 vs 2: p-value = 0.001Group 1 vs 3: p-value < 0.001Group 2 vs 3: p-value < 0.001
IPython notebook
Examples of networks:https://github.com/zengxiaohanzxh/networks-ipython.git
Groupon example: [email protected]:zengxiaohanzxh/networks.git
Xiaohan Zeng <[email protected]>
Quantum Lead, 6th floor near Fishbowl