27
Networkx & Gephi Tutorial #pydata Gilad Lotan | @gilgul

Networkx & Gephi Tutorial #Pydata NYC

Embed Size (px)

DESCRIPTION

Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi Talk abstract: Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.

Citation preview

Page 1: Networkx & Gephi Tutorial #Pydata NYC

Networkx & Gephi Tutorial#pydata

Gilad Lotan | @gilgul

Page 3: Networkx & Gephi Tutorial #Pydata NYC
Page 4: Networkx & Gephi Tutorial #Pydata NYC
Page 5: Networkx & Gephi Tutorial #Pydata NYC
Page 6: Networkx & Gephi Tutorial #Pydata NYC

#gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy

#palestine, #OWS, #immigration,#abortion

#republican, #dems, #economics, #amnesty

Page 7: Networkx & Gephi Tutorial #Pydata NYC

#Debates / Ohio

Page 8: Networkx & Gephi Tutorial #Pydata NYC

#Debates / Ohio

Politicos

OSU Students

Ohio based Media

Page 9: Networkx & Gephi Tutorial #Pydata NYC

• Node network properties– from immediate connections

• indegreehow many directed edges (arcs) are incident on a node

• outdegreehow many directed edges (arcs) originate at a node

• degree (in or out)number of edges incident on a node

– from the entire graph• centrality (betweenness, closeness)

outdegree=2

indegree=3

degree=5

Source: Lada Adamic (SI508-F08)

Page 10: Networkx & Gephi Tutorial #Pydata NYC

Example Graph Types

• Complete Graph

• Bipartite Graph– Vertices can be divided into two disjoint sets– Ex: students & schools

Page 11: Networkx & Gephi Tutorial #Pydata NYC
Page 12: Networkx & Gephi Tutorial #Pydata NYC

Social Network Attributes• Scale Free

– Degree distribution follows a power law– Barabasi et al (‘99): mapped the topology of a portion of

the web

• Small World– Most nodes are not neighbors, but can be reached by

small number of hops– Watts & Strogatz (’98)– Properties: cliques, sub networks with high clustering

coefficient, most pairs of nodes connected by at least one short path

Page 13: Networkx & Gephi Tutorial #Pydata NYC

(Zachary) Karate club graph

social network of friendships between 34 members of a karate club at a US university in the 1970s.

Standard test network for clustering algorithms -> during the observation period the club broke up into two separate clubs over a conflict.

Page 14: Networkx & Gephi Tutorial #Pydata NYC

Graph Measures• Centrality

– Betweenness– Closeness– Eigenvector– Degree

• Clustering Coefficient (clique)• Modularity

Page 15: Networkx & Gephi Tutorial #Pydata NYC

Graph Layout• Open Ord

– Better distinguishes clusters• Yifan Hu• Force Atlas• Fruchterman Reingold

– Graph as a system of mass particles (nodes:particles, edges:springs)

Page 16: Networkx & Gephi Tutorial #Pydata NYC

Networkx

Page 17: Networkx & Gephi Tutorial #Pydata NYC

Graph Generators

Page 18: Networkx & Gephi Tutorial #Pydata NYC

Generate Twitter Graph

Page 19: Networkx & Gephi Tutorial #Pydata NYC
Page 20: Networkx & Gephi Tutorial #Pydata NYC

graphml file

nodes

edges

Page 21: Networkx & Gephi Tutorial #Pydata NYC

Twitter Users with Python in their Bios

• 2 days of Twitter data (Oct 24th and 25th)• Total: 4246 users (62k tweets)• @mikanyan1 tweeted 795 times

Page 22: Networkx & Gephi Tutorial #Pydata NYC

Pythonistas on Twitter

Page 23: Networkx & Gephi Tutorial #Pydata NYC

Pythonistas on Twitter

English / European

Japanese

Python(the snake)

Chinese

Spanish Speakers

Musicians, Artists

Page 24: Networkx & Gephi Tutorial #Pydata NYC
Page 25: Networkx & Gephi Tutorial #Pydata NYC

Twitter User Community: Data Science

• Grepped from Twitter bios over 1 week: "data science|data scientist|machine learning|data strateg”

• 1053 Users• 14k Tweets• Most tweeting users:

– @data_nerd (659)– @Chantel_Esworth (562)– @Da5_12 (253)

Page 26: Networkx & Gephi Tutorial #Pydata NYC

Dataists on Twitter

Page 27: Networkx & Gephi Tutorial #Pydata NYC

Thank You

Gilad LotanTwitter: @gilgul

Github: giladlotan