View
3.120
Download
1
Category
Preview:
Citation preview
MICHAEL J BOMMARITO II DANIEL MARTIN KATZ
Advanced Network Analysis Methods: Community Detec:on
Defini:on – Simple Version
� Broadly: “a group of nodes that are rela&vely densely connected to each other but sparsely connected to other dense groups in the network” ¡ Porter, Onnela, Mucha. Communi&es in Networks. No:ces to the AMS, 2009.
� Examples: ¡ Cliques in a high school social network ¡ Vo:ng coali:ons in Congress ¡ Consumer types in a network of co-‐purchases
Michael J. Bommarito II, Daniel Mar:n Katz
Example – Social Networks
Imagine this Graph ….
Michael J. Bommarito II, Daniel Mar:n Katz
Example – Social Networks
What factors might affect the formaJon of friendships in a high school social network? Ideas: Age, Gender, Class, Race, Interests
How might we assign communiJes to this network?
VerJces: People Edges: Friendship
Michael J. Bommarito II, Daniel Mar:n Katz
Example – Social Networks
What factors might affect the formaJon of friendships in a high school social network? Ideas: Age, Gender, Class, Race, Interests
How might we assign communiJes to this network?
Girls
Boys
VerJces: People Edges: Friendship
Michael J. Bommarito II, Daniel Mar:n Katz
Example – Vo:ng Coali:ons
Michael J. Bommarito II, Daniel Mar:n Katz
VerJces: People Edges: Co-‐voted at least once
Now let’s look at the same network as if it represented co-‐voJng in the Senate. Ideas: Issue posi:on, geography, ethnicity, gender How might we assign communiJes to this network?
Example – Vo:ng Coali:ons
Republicans
Democrats
Independents
Michael J. Bommarito II, Daniel Mar:n Katz
VerJces: People Edges: Co-‐voted at least once
Now let’s look at the same network as if it represented co-‐voJng in the Senate. Ideas: Issue posi:on, geography, ethnicity, gender How might we assign communiJes to this network?
Context!
Note that we have assigned community membership differently despite observing the same graph! Community detecJon is not a concept that can be divorced from context.
Michael J. Bommarito II, Daniel Mar:n Katz
Directedness
Undirected Directed
Michael J. Bommarito II, Daniel Mar:n Katz
Directedness
Many methods do not incorporate direcJon! Many methods that do incorporate direcJon do not allow for bidirected edges. Different soVware packages may implement the same “method” with or without support for directed edges.
Michael J. Bommarito II, Daniel Mar:n Katz
Weights
Unweighted Weighted
• Binary rela:onships • Data limita:ons
• Rela:onship strength • Frequency of rela:onship • Flow
Michael J. Bommarito II, Daniel Mar:n Katz
Weights
Unweighted Weighted
• Binary rela:onships • Data limita:ons
• Rela:onship strength • Frequency of rela:onship • Flow
Note edge thickness.
Michael J. Bommarito II, Daniel Mar:n Katz
Weights
Many methods do not incorporate edge weights! Methods that do incorporate edge weights may differ in acceptable values! • Integers or real weights • Strictly posi:ve weights Different soVware packages may implement the same “method” with or without support for weighted edges.
Michael J. Bommarito II, Daniel Mar:n Katz
Resolu:on
Resolu:on is a concept inherited from op:cs. According to Wiki, Op,cal resolu,on describes the ability of an imaging system to resolve detail in the object that is being imaged.
High resoluJon) Low resoluJon
• Can make out many details! (15.1MP) • But…
• Details may be noise • Some:mes they don’t ma]er!
• Can’t read a word! • But…
• Can focus on broad regions • Noise is out of focus
Michael J. Bommarito II, Daniel Mar:n Katz
Resolu:on
High resoluJon (microscopic) Low resoluJon (macroscopic)
Same graphs!
Michael J. Bommarito II, Daniel Mar:n Katz
Resolu:on
Different hypotheses or quesJons correspond to different resoluJons. Different methods are more or less effecJve at detecJng community structure at different resoluJons. Modularity-‐based methods cannot detect structure below a known resoluJon limit.
Michael J. Bommarito II, Daniel Mar:n Katz
Overlapping Communi:es
Palla, Derenyi, Farkas ,Vicsek. Uncovering the overlapping community structure of complex networks in nature and society
Nature 435, 2005.
Michael J. Bommarito II, Daniel Mar:n Katz
Computa:onal Complexity Refresher
ComputaJonal complexity is a serious issue!
Data is becoming more abundant and more detailed. Many quan:ta:ve research projects hinge on the feasibility of calcula:ons. Understanding computa:onal complexity can allow you to communicate with department IT personnel or computer scien:sts to solve your problem. Make sure your project is feasible before commi[ng the Jme!
Michael J. Bommarito II, Daniel Mar:n Katz
Computa:onal Complexity Refresher
Computa:onal complexity in the context of modern compu:ng is primarily focused on two resources: 1. Time: How long does it take to perform a sequence of opera:ons?
• CPU/GPU • Exact vs. approximate solu:ons
2. Storage: How much space does it take to store our problem? • Memory and “persistent” storage (to a lesser degree) • Data representa:ons
We tend to communicate :me and storage complexity through “Big-‐O nota:on.”
Michael J. Bommarito II, Daniel Mar:n Katz
Computa:onal Complexity Refresher
In computa:onal complexity, “Big-‐O nota:on” conveys informa:on about how :me and storage costs scale with inputs. • O(1): constant -‐ independent of input • O(n): scales linearly with the size of input • O(n^2): scales quadra:cally with the size of input • O(n^3): scales cubically with the size of input
These terms ofen occur with log n terms and are then given the prefix “quasi-‐.”
For graph algorithms, the input n is typically • |V|, the number of ver:ces • |E|, the number of edges
Michael J. Bommarito II, Daniel Mar:n Katz
Taxonomy of Methods
This taxonomy of methods follows the history of their development. • Divisive Methods
• Edge-‐betweenness (2002)
• Modularity Methods • Fast-‐greedy (2004) • Leading Eigenvector (2006)
• Dynamic Methods • Clique percola:on (2005) • Walktrap (2005)
Michael J. Bommarito II, Daniel Mar:n Katz
Edge Betweenness
PublicaJon(s): Girvan, Newman. Community structure in social and biological networks. PNAS, 2002. Basic Idea: Divide the network into subsequently smaller pieces by finding edges that “bridge” communi:es. Constraints: • Can be adapted to directed networks (igraph). • Can be adapted to weights (no public sofware). Time Complexity: O(|V|^3) in general, O(|V|^2 log |V|) for special cases
Michael J. Bommarito II, Daniel Mar:n Katz
Edge Betweenness
From the paper:
Michael J. Bommarito II, Daniel Mar:n Katz
Quick Aside – Zach’s Karate Club
Zachary's Karate Club: Social network of friendships between 34 members of a karate club at a US university in the 1970s
Event: During the observa:on period, the club broke into 2 smaller clubs. This split occurred along a pre-‐exis:ng social division between the two “communi:es” in the network.
Drawn from the Paper: Zachary. An informa&on flow model for conflict and fission in
small groups. Journal of Anthropological Research 33, 1977.
Download the Data: h]p://www-‐personal.umich.edu/~mejn/netdata/
Michael J. Bommarito II, Daniel Mar:n Katz
Edge Betweenness
Only misclassifica:on
Michael J. Bommarito II, Daniel Mar:n Katz
Edge Betweenness
Betweenness tends to get the big picture right. However, resolu:on can be a problem! Do not draw conclusions about small communi:es from this algorithm alone.
Michael J. Bommarito II, Daniel Mar:n Katz
Modularity
• e is the number of edges in module i • d is total degree of ver:ces in module i • m is the total number of edges in network Q is difference between observed connecJvity within modules and EV for the configuraJon model (degree-‐distribuJon fixed)
Michael J. Bommarito II, Daniel Mar:n Katz
Modularity
Remember our previous discussion on computa:onal complexity?
Modularity maximiza:on is an NP-‐hard problem.
This means that there is no polynomial representa:on of :me complexity!
All methods therefore try to solve for approximate solu&ons.
Michael J. Bommarito II, Daniel Mar:n Katz
Modularity
Michael J. Bommarito II, Daniel Mar:n Katz
Benjamin H. Good, Yves-‐Alexandre de Montjoye & Aaron Clauset, The Performance of Modularity Maximiza:on in Prac:cal Contexts, Phys. Rev. E 81, 046106 (2010)
Fast Greedy
PublicaJon(s): • Newman. Fast algorithm for detec&ng community structure in networks. Phys. Rev. E, 2004. • Clauset, Newman, Moore. Finding community structure in very large networks. Phys. Rev. E, 2004. • Wakita, Tsurumi. Finding Community Structure in Mega-‐scale Social Networks. 2007. Basic Idea: Try to randomly assemble a larger and larger communi:es from the ground up. Start by placing each vertex in its own community and then combine communi:es that produce the best modularity at that step. Constraints: • Can be adapted to directed edges (no public). • Can be adapted to weights (igraph). Time Complexity: O(|E||V| log |V|) worst case
Michael J. Bommarito II, Daniel Mar:n Katz
Fast Greedy
Fast-‐Greedy also tends to aggressively create larger communi:es to the detriment of smaller communi:es.
Why is this node red instead of blue?
Michael J. Bommarito II, Daniel Mar:n Katz
Leading Eigenvector
PublicaJon(s): • Newman. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E, 2006. • Leicht, Newman. Community structure in directed networks. Phys. Rev. Le]., 2008. Basic Idea: Use the sign on the components of the leading eigenvector of the Laplacian to sequen:ally divide the network. Constraints: • Can be adapted to directed edges (no public). • Can be adapted to weights (igraph). Time Complexity: O(|V|^2)
Michael J. Bommarito II, Daniel Mar:n Katz
Leading Eigenvector
Note that eigenvector’s results seem to split the difference between edge betweenness and fast-‐greedy in this case.
Why are these nodes not a part of the larger modules?
Michael J. Bommarito II, Daniel Mar:n Katz
Walktrap
PublicaJon(s): Pons, Latapy. Compu&ng communi&es in large networks using random walks. JGAA, 2006. Basic Idea: Simulate many short random walks on the network and compute pairwise similarity measures based on these walks. Use these similarity values to aggregate ver:ces into communi:es. Constraints: • Can be adapted to directed edges (igraph). • Can be adapted to weights (igraph). • Can alter resolu:on by walk length (igraph). Time Complexity: depends on walk length, O(|V|^2 log |V|) typically
Michael J. Bommarito II, Daniel Mar:n Katz
Walktrap
Michael J. Bommarito II, Daniel Mar:n Katz
Walktrap
Walktrap assigns ver:ces to different communi:es than previous algorithms. Note that the simulated walk length can be changed to alter resolu:on. Furthermore, simulaJon is stochasJc and thus results may change even aVer fixing the walk length and input graph!
Michael J. Bommarito II, Daniel Mar:n Katz
Method Comparison
Edge-‐Betweenness Fast-‐Greedy
Leading Eigenvector Walktrap
Michael J. Bommarito II, Daniel Mar:n Katz
Recommended Sofware -‐ igraph
• Core Library: C • Interfaces: Python, R, Ruby • Features: Graph opera:ons & algorithms, random graph genera:on, graph sta:s:cs, community detec:on, visualiza:on layout, ploqng • URL: h]p://igraph.sourceforge.net/ • Documenta:on: h]p://igraph.sourceforge.net/documenta:on.html
Michael J. Bommarito II, Daniel Mar:n Katz
Example Python Source Code
Michael J. Bommarito II, Daniel Mar:n Katz
Fron:ers of Community Detec:on: Temporal Network Dynamics
Michael J. Bommarito II, Daniel Mar:n Katz
Gergely Palla, Albert-Laszlo Barabasi & Tamas Vicsek, Quantifying Social Group Evolution, Nature 446:7136, 664-667 (2007)
Fron:ers of Community Detec:on:
Community Structure Over Scales, Time Period, etc.
Michael J. Bommarito II, Daniel Mar:n Katz
Science 14 May 2010, Vol. 328. no. 5980, pp. 876 - 878
Community Detec:on Review Ar:cles
Some Useful Review ArJcles: Mason A. Porter, Jukka-Pekka Onnela and Peter J. Mucha. 2009. “Communities in Networks.” Notices of the American Mathematical Society 56: 1082-1166. Santo Forunato. 2010. “Community detection in graphs.” Physics Reports. 486: 75-174.
Michael J. Bommarito II, Daniel Mar:n Katz
A Transi:on to Our Sink Method Paper
� Provide a very brief introduc:on to the Exponen:al Random Graph Models (p*)
Michael J. Bommarito II, Daniel Mar:n Katz
� Now we are going to transi:on to a specific project -‐-‐-‐ where we apply some of the ideas contained herein
Our Sink Paper –Physica A
Michael J. Bommarito II, Daniel Mar:n Katz
Dynamic Acyclic Digraphs
Michael J. Bommarito II, Daniel Mar:n Katz
� We are interested in conduc:ng community detec:on in the special case of dynamic acyclic digraphs …
� Before we transi:on to the full presenta:on – some background
� Dynamic = Changing both Locally and Globally � Digraph = Directed Graph � Acyclic = No cycles because current documents generally cannot cite documents in the future
Dynamic Acyclic Digraphs
Michael J. Bommarito II, Daniel Mar:n Katz
Case to Case Judicial Cita:on Networks are Dynamic Acyclic Digraphs
So are Academic Cita:on Networks, Patents, etc.
Recommended