Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Mining Social and Information Networks
Fragkiskos D. Malliaros
Ecole Polytechnique, France→ UC San Diego
E-mail: [email protected]
UCSD Artificial Intelligence Seminar
October 3, 2016
1/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Networks
Networks allow to model relationshipsbetween entities
General-purpose languagefor describing real-world systems
2/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Online Social Networks Source: https://www.facebook.com/zuck
Collaboration networks (Co-authorship)
Weblogs network (Political blogs)
Term co-occurrence network (David Copperfield novel by
Charles Dickens)
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Motivation: Analytics in the Network Science Era
Overview of my work
Models, tools and observations for problems in the area of mining real-world networks
We build upon computationally efficient graph mining methods to:
1. Design models for analyzing the structure and dynamics of real-worldnetworks
� Unravel properties that can further be used in practical applications
2. Develop algorithmic tools for large-scale analytics on data with:
� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)
7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Motivation: Analytics in the Network Science Era
Overview of my work
Models, tools and observations for problems in the area of mining real-world networks
We build upon computationally efficient graph mining methods to:
1. Design models for analyzing the structure and dynamics of real-worldnetworks
� Unravel properties that can further be used in practical applications
2. Develop algorithmic tools for large-scale analytics on data with:
� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)
7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Motivation: Analytics in the Network Science Era
Overview of my work
Models, tools and observations for problems in the area of mining real-world networks
We build upon computationally efficient graph mining methods to:
1. Design models for analyzing the structure and dynamics of real-worldnetworks
� Unravel properties that can further be used in practical applications
2. Develop algorithmic tools for large-scale analytics on data with:
� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)
7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Motivation: Analytics in the Network Science Era
Overview of my work
Models, tools and observations for problems in the area of mining real-world networks
We build upon computationally efficient graph mining methods to:
1. Design models for analyzing the structure and dynamics of real-worldnetworks
� Unravel properties that can further be used in practical applications
2. Develop algorithmic tools for large-scale analytics on data with:
� Inherent graph structure (e.g., social networks)� Without inherent graph structure (e.g., text)
7/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Overview of my Work (1/2)
� Engagement dynamics and vulnerability assessment in socialnetworks
� [w/ Vazirgiannis, CIKM ’13; EPL ’15]
� Graph clustering and community detection� [w/ Vazirgiannis, Phys. Rep. ’13]� [w/ Giatsidis, Thilikos, Vazirgiannis, AAAI ’14]� [w/ Mitri, Vazirgiannis, Manuscript ’16]
� Anonymity models for real networks� [w/ Casas-Roma, Vazirgiannis, Manuscript ’16]
8/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Overview of my Work (2/2)
� Spectral robustness estimation in real networks [This talk]� [w/ Megalooikonomou, Faloutsos, SDM ’12; KAIS ’15]
� Identification of influential spreaders (nodes) in complexnetworks [This talk]
� [w/ Rossi, Vazirgiannis, WWW ’15; Scientific Reports ’16]
� Graph-based text analytics (text categorization and keywordextraction) [This talk]
� [w/ Skianis, Vazirgiannis, ASONAM ’15; Manuscript ’16]� [w/ Tixier, Vazirgiannis, EMNLP ’16]
Main tools: spectral graph theory, core decomposition
9/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Outline
1 Introduction and Research Overview
2 Robustness Estimation in Social Graphs
3 Locating Influential Spreaders in Social Networks
4 Graph-Based Text Analytics
5 Concluding Remarks
10/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Robustness Estimation in Social Graphs
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Graph Robustness Estimation
Main question:
What can we say about the robustness of large real-world social graphs?
Modular network Non modular network
12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Graph Robustness Estimation
Main question:
What can we say about the robustness of large real-world social graphs?
Modular network Non modular network
12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Graph Robustness Estimation
Main question:
What can we say about the robustness of large real-world social graphs?
Modular network Non modular network
12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Graph Robustness Estimation
Main question:
What can we say about the robustness of large real-world social graphs?
Modular network Non modular network
12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Graph Robustness Estimation
Main question:
What can we say about the robustness of large real-world social graphs?
Modular network Non modular network
12/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Expansion Properties
� Quick estimation of the robustness of agraph by examining the expansionproperties
� Graph with good expansion properties:simultaneously sparse and highlyconnected
� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with
|S| ≤ |V |2
is|N(S)||S|
13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Expansion Properties
� Quick estimation of the robustness of agraph by examining the expansionproperties
� Graph with good expansion properties:simultaneously sparse and highlyconnected
� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with
|S| ≤ |V |2
is|N(S)||S|
S ⊂ V
13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Expansion Properties
� Quick estimation of the robustness of agraph by examining the expansionproperties
� Graph with good expansion properties:simultaneously sparse and highlyconnected
� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with
|S| ≤ |V |2
is|N(S)||S|
S ⊂ V
13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Expansion Properties
� Quick estimation of the robustness of agraph by examining the expansionproperties
� Graph with good expansion properties:simultaneously sparse and highlyconnected
� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with
|S| ≤ |V |2
is|N(S)||S|
S ⊂ V
13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Expansion Properties
� Quick estimation of the robustness of agraph by examining the expansionproperties
� Graph with good expansion properties:simultaneously sparse and highlyconnected
� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with
|S| ≤ |V |2
is|N(S)||S|
S ⊂ V
13/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Expansion Properties
� Quick estimation of the robustness of agraph by examining the expansionproperties
� Graph with good expansion properties:simultaneously sparse and highlyconnected
� Given a graph G = (V ,E), theexpansion of a subset of nodes S, with
|S| ≤ |V |2
is|N(S)||S|
Expansion factor
h(G) = min{S:|S|≤ |N|2
} |N(S)||S|
14/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Why the Expansion Properties are Important?
� They can act as a natural measure of the graph’s robustness� Bottlenecks edges inside the network
� Good expansion properties imply high robustness� Any subset S ⊂ V will have a relatively large neighborhood� Poor modular structure
� Poor expansibility reflects absence of robustness� Modular structure→ Better community structure
15/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Measuring Expansion Properties
� Computing the expansion factor is a computational difficultproblem
� Approximation techniques?
� Use the spectrum of the adjacency matrix A
� The expansion properties of a graph can be approximated by thespectral gap ∆λ = λ1 − λ2
Alon - Milman inequality
∆λ
2≤ h(G) ≤
√2λ1∆λ
16/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
First Approach in Estimating the Robustness
� Large spectral gap ∆λ implies good expansion properties→high robustness
� If λ2 is close enough to λ1, the spectral gap will be small→ lowrobustness
Spectral gap based approach
1. Compute the spectral gap2. If this is large, the graph will show high robustness properties
Problem
How large the spectral gap should be for a graph to have goodexpansibility?
17/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
First Approach in Estimating the Robustness
� Large spectral gap ∆λ implies good expansion properties→high robustness
� If λ2 is close enough to λ1, the spectral gap will be small→ lowrobustness
Spectral gap based approach
1. Compute the spectral gap2. If this is large, the graph will show high robustness properties
Problem
How large the spectral gap should be for a graph to have goodexpansibility?
17/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Spectral Gap + Subgraph Centrality
� Combine the spectral gap with the subgraph centrality
� Subgraph centrality:
� Node centrality measure
� It is based on the number of closed walks that a node participates:
SC(i) =∞∑`=0
A`ii`!, ∀i ∈ V
� Using techniques from spectral graph theory
SC(i) =
|V |∑j=1
u2ij sinh(λj), ∀i ∈ V
18/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Characterizing the Robustness : the Idea
SC(i) =∑|V |
j=1 u2ij sinh(λj), ∀i ∈ V
� Good expansion properties→ High robustness→ λ1 � λ2→u2
i1 sinh(λ1)�∑|V |
j=2 u2ij sinh(λj)
� For a graph with high robustness
SC(i) ≈ u2i1 sinh(λ1)
� This implies a power-law relationship
ui1 ∝ sinh−1/2(λ1) SC(i)1/2
� Deviations from this behavior imply existence (or lack thereof) ofhigh robustness
19/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Observations and Shortcoming
1. Scalability issues� It requires the computation of all the pairs (λi , ui ), ∀i ∈ V of the
adjacency matrix A
� Computational bottleneck for large graphs with millions of nodesand edges
2. It cannot be applied directly to bipartite graphs
20/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Proposed Metric: Generalized Robustness Index rkBasic Idea
Q: Can we efficiently approximate the SC(i), ∀i ∈ V?
1. The eigenvalues of A follow a power-lawdistribution [Faloutsos et al., ’99]
2. Almost symmetric eigenvalues around zero
3. sinh(·): odd function
� sinh(−x) = − sinh(x)0 1000 2000 3000 4000 5000 6000 7000
−100
−50
0
50
100
150
RankE
ige
nva
lue
Figure : Skewed spectrum (WIKI-VOTE)
21/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Normalized Subgraph Centrality NSCk
SC(i) =∑|V|
j=1 u2ij sinh(λj ), ∀i ∈ V
� Use only the first top k eigenvalues and their correspondingeigenvectors
� Due to the sinh(·) function, the contribution of each eigenvaluewill be roughly canceled out by its (almost) symmetric eigenvalue
� The contribution of most of the eigenvalues is negligiblecompared with that of the first few eigenvalues
Normalized subgraph centrality
NSCk (i) =∑k
j=1 u2ij sinh(λj), ∀i ∈ V
� Generally, k � |V | for real-world graphs
22/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Normalized Subgraph Centrality NSCk
SC(i) =∑|V|
j=1 u2ij sinh(λj ), ∀i ∈ V
� Use only the first top k eigenvalues and their correspondingeigenvectors
� Due to the sinh(·) function, the contribution of each eigenvaluewill be roughly canceled out by its (almost) symmetric eigenvalue
� The contribution of most of the eigenvalues is negligiblecompared with that of the first few eigenvalues
Normalized subgraph centrality
NSCk (i) =∑k
j=1 u2ij sinh(λj), ∀i ∈ V
� Generally, k � |V | for real-world graphs
22/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Generalized Robustness Index rk
ui1 ∝ sinh−1/2(λ1) SC(i)1/2
Generalized robustness index rk
rk =
(1|V |
∑|V |i=1
{log(ui1)−
(log(sinh−1/2(λ1)) +
12
log(NSCk (i)))}2
)1/2
� Small rk value→ High robustness� For a robust enough graph, mostly (λ1, u1) will contribute to NSCk
� Parameter k� Trade-off between accuracy and required time
� It can be computed very easily in any programming environment
23/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
An Illustrative Example: Random vs. Real Graphs
103
104
105
106
10−2
10−1
100
Normalized Subgraph Centrality
Prin
cip
al E
ige
nve
cto
r
r = 3.5027e−05
(a) ER random graph (b) Discrepancy plot
10−2
100
102
104
10−8
10−6
10−4
10−2
100
Normalized Subgraph Centrality
Princip
al E
igenvecto
r
r = 2.4921
A..−L.B.
G.G.
(c) Network science graph (d) A.-L. Barabasi’s egonet (e) Discrepancy plot
24/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
DatasetsBasic characteristics of real-world networks
Graph # Nodes # Edges
EPINIONS 75, 877 405, 739EMAIL-EUALL 224, 832 340, 795SLASHDOT 77, 360 546, 487WIKI-VOTE 7, 066 100, 736FACEBOOK 63, 392 816, 886YOUTUBE 1, 134, 890 2, 987, 624CA-ASTRO-PH 17, 903 197, 031CA-GR-QC 4, 158 13, 428CA-HEP-TH 8, 638 24, 827DBLP 404, 892 1, 422, 263CIT-HEP-TH 26, 084 334, 091
25/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Efficiency of rk Index
Scalability
0 5 10 15
x 105
0
10
20
30
40
50
60
Number of Edges
Tim
e (
se
co
nd
s)
Observed Time
Linear Fit
� Several snapshots of theDBLP graph
� Set k = 30 eigenpairs
� The rk index scales linearlywith respect to the number ofedges
26/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Robustness of Large Static Graphs (1/2)Discrepancy Plots (Principal eigenvector vs. NSC in log-log scales)
EPINIONS
EMAIL-EUALL
SLASHDOT
WIKI-VOTE
YOUTUBE
27/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Robustness of Large Static Graphs (2/2)Discrepancy Plots (Principal eigenvector vs. NSC in log-log scales)
CA-ASTRO-PH
DBLP-1980
CA-HEP-TH
DBLP-2006
CA-GR-QC
CIT-HEP-TH
28/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Observations: Robustness of Large Social Graphs
� Almost all graphs show good expansion properties→ Highrobustness
� rk values tend to zero
� Not a clear modular structure
� Lack of well-defined clusters that can be easily separated fromthe whole graph
Observation: High Robustness Pattern
Large real-world social graphs exhibit good expansion propertiesand thus high robustness
29/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Are the Observations Expected?
Community structure property
Social networks
� Social Networks ≡ Community Structure� Bad expansion properties→ Low robustness� Previously observed on small scale social graphs
30/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Possible Explanations
DBLP-1980: |V | = 5K , |E| = 9K DBLP-2006: |V | = 405K , |E| = 1, 5M
� Structural differences between different scale networks
31/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution (1/2)
� The graphs evolve over time
� Q: How the robustness of a graph changes over time?
� Study the fragility evolution of a graph
1. For every time point t in the dataset2. Form the graph up to the specific time point t3. For every time snapshot Gt we examine the rk index
� Datasets:� DBLP: 1960-2006 (cumulative graph snapshots per year)� CIT-HEP-TH: February 1993 - April 2003 (cumulative graph
snapshots per month)
32/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution (1/2)
� The graphs evolve over time
� Q: How the robustness of a graph changes over time?
� Study the fragility evolution of a graph
1. For every time point t in the dataset2. Form the graph up to the specific time point t3. For every time snapshot Gt we examine the rk index
� Datasets:� DBLP: 1960-2006 (cumulative graph snapshots per year)� CIT-HEP-TH: February 1993 - April 2003 (cumulative graph
snapshots per month)
32/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution (1/2)
� The graphs evolve over time
� Q: How the robustness of a graph changes over time?
� Study the fragility evolution of a graph
1. For every time point t in the dataset2. Form the graph up to the specific time point t3. For every time snapshot Gt we examine the rk index
� Datasets:� DBLP: 1960-2006 (cumulative graph snapshots per year)� CIT-HEP-TH: February 1993 - April 2003 (cumulative graph
snapshots per month)
32/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution (2/2)DBLP: rk over time
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time [Years]
Ro
bu
stn
ess I
nd
ex r
t=16 (1975)
CIT-HEP-TH: rk over time
0 20 40 60 80 100 120 1400
0.5
1
1.5
2
2.5
3
Time [Months]
Ro
bu
stn
ess I
nd
ex r
t = 17 (June ’94)
33/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution (2/2)DBLP: rk over time
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time [Years]
Ro
bu
stn
ess I
nd
ex r
t=16 (1975)
DBLP: Diameter over time
0 10 20 30 40 502
4
6
8
10
12
14
Time [Years]
Dia
me
ter
t=16 (1975)
CIT-HEP-TH: rk over time
0 20 40 60 80 100 120 1400
0.5
1
1.5
2
2.5
3
Time [Months]
Ro
bu
stn
ess I
nd
ex r
t = 17 (June ’94)
CIT-HEP-TH: Diameter over time
0 20 40 60 80 100 120 1400
2
4
6
8
10
12
14
Time [Months]
Dia
me
ter
t = 17 (June ’94)
34/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution Pattern
Observation: Fragility evolution pattern
The spike of the robustness is aligned with the gelling point
General Observation:� At the first time points rk ↗ gradually→ Low robustness→ Good
community structure
� After a specific time point, rk starts↘ gradually→ The graphs tend tobecome more robust
� The time point that rk changes, corresponds to the gelling point[McGlohon et al., KDD ’08]
Explanation for the structural differences (regarding robust-ness and community structure) between different scale graphs
35/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Time Evolving Graphs: Fragility Evolution Pattern
Observation: Fragility evolution pattern
The spike of the robustness is aligned with the gelling point
General Observation:� At the first time points rk ↗ gradually→ Low robustness→ Good
community structure
� After a specific time point, rk starts↘ gradually→ The graphs tend tobecome more robust
� The time point that rk changes, corresponds to the gelling point[McGlohon et al., KDD ’08]
Explanation for the structural differences (regarding robust-ness and community structure) between different scale graphs
35/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Anomaly Detection (1/2)
Question
Can we spot anomalies over time using the rk index along with thefragility evolution pattern?
1. Examine the rk index over time, trying to identify and track abruptchanges and deviations
2. Sudden deviations from the fragility evolution pattern can possiblycorrespond to anomalies
3. The specific time snapshots (graphs) can be tagged as outliers
36/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Anomaly Detection (2/2)Fragility Evolution of DBLP (lin-log)
0 10 20 30 40 5010
−15
10−10
10−5
100
105
Time [Years]
Robustn
ess Index r
2002 and 2003
� Strange behavior of the rk indexfor 2002 and 2003
Q: Are these two time graphs reallyoutliers?
A: Yes!
� Explanation1:� After 2001 a large number of new publications were introduced→
Robustness↗→ rk ↘� After 2002-2003 new research fields are covered from DBLP→
New fields formed new communities→ rk ↗
1Personal communication with Michael Ley and Florian Reitz of the DBLP server37/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Conclusions
� Fast robustness estimation for large scale social networks
� Efficient and effective robustness index� Take advantage of the special spectral properties
� Patterns of large scale static and time evolving social graphs
� Anomaly detection combining the rk index with the observed patterns
� Robustness properties of network generation models
� Ongoing work: Given a budget of k edges, how to improve therobustness of the graph?
38/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Locating Influential Spreaders
in Social Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Identification of Influential SpreadersMotivation
� Spreading processes in complex networks� Spread of news and ideas� Diffusion of influence� Disease propagation� Viral marketing (word-of-mouth effect)� ...
� Identification of influential spreaders (goal of this work)� Able to diffuse information to a large part of the network
� Understand and control spreading dynamics� E.g, vaccinate individuals with good spreading properties in epidemic
control
40/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Identification of Influential SpreadersMotivation
� Spreading processes in complex networks� Spread of news and ideas� Diffusion of influence� Disease propagation� Viral marketing (word-of-mouth effect)� ...
� Identification of influential spreaders (goal of this work)� Able to diffuse information to a large part of the network
� Understand and control spreading dynamics� E.g, vaccinate individuals with good spreading properties in epidemic
control
40/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Identification of Influential Spreaders: the Process
Typically, a two-step approach:
1. Consider a topological or centrality criterion of the nodes
� Rank the nodes accordingly
� The top-ranked nodes are candidates for the most influential ones
2. Simulate the spreading process over the network to examine theperformance of the chosen nodes
[Pei and Makse, ’13]
41/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Identification of Single Influential SpreadersRelated work
� Straightforward approach: consider degree centrality� High degree nodes are expected to be good spreaders� Hub nodes can trigger big cascades
� However, degree is a local criterion� Bad case: star subgraph
� Core-periphery structure of real-world networks
Equal degree (8)
42/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Identification of Single Influential SpreadersRelated work
� Straightforward approach: consider degree centrality� High degree nodes are expected to be good spreaders� Hub nodes can trigger big cascades
� However, degree is a local criterion� Bad case: star subgraph
� Core-periphery structure of real-world networks
Equal degree (8)
42/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Identification of Single Influential SpreadersRelated work
� Straightforward approach: consider degree centrality� High degree nodes are expected to be good spreaders� Hub nodes can trigger big cascades
� However, degree is a local criterion� Bad case: star subgraph
� Core-periphery structure of real-world networks
Equal degree (8)
42/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Degeneracy and k -core Decomposition
Example
3-core
2-core
1-core
Core number ci = 1
Core number ci = 2
Core number ci = 3
Graph Degeneracy δ∗(G) = 3
G0 = G
G1 = 1-core of G
G2 = 2-core of G
G3 = 3-core of G
G0 ⊇ G1 ⊇ G2 ⊇ G3
Detection of dense and cohesive subgraphs
43/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
The k -core Decomposition Finds Good Spreaders
Degree 96 Core number 63
Degree 96 Core number 26
Node A Node B
� The core number is better spreadingpredictor compared to the degree
� [Kitsak et al., Nature Phys. ’10]
44/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
The k -core Decomposition Finds Good Spreaders
Degree 96 Core number 63
Degree 96 Core number 26
Node A Node B
� The core number is better spreadingpredictor compared to the degree
� [Kitsak et al., Nature Phys. ’10]
44/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Motivation and Contributions
� The k -core decomposition often returns a relatively large number ofcandidate influential spreaders
� Only a small fraction corresponds to highly influential nodes
� Can we further refine the set of the most influential spreaders?
Main contributions
� Propose the K -truss decomposition for locating influential nodes
� Triangle-based extension of the k -core decomposition
� Experimental evaluation
� Better spreading behavior� Faster and wider epidemic spreading
45/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Motivation and Contributions
� The k -core decomposition often returns a relatively large number ofcandidate influential spreaders
� Only a small fraction corresponds to highly influential nodes
� Can we further refine the set of the most influential spreaders?
Main contributions
� Propose the K -truss decomposition for locating influential nodes
� Triangle-based extension of the k -core decomposition
� Experimental evaluation
� Better spreading behavior� Faster and wider epidemic spreading
45/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
K -truss Decomposition (1/2)Definitions
K -truss subgraph TK [Cohen, ’08], [Wang and Cheng, ’12]
K -truss TK = (VTK , ETK ),K ≥ 2: the largest subgraph of G where every edge iscontained in at least K − 2 triangles within the subgraph
Maximal K -truss subgraph
� The K -truss subgraph defined for the maximum value Kmax of K
� The nodes of this subgraph define set T
46/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
K -truss Decomposition (2/2)Maximal k -core and K -truss subgraphs
Set
Set
6
CT
� Set C: nodes of maximal k -core
� 3-core subgraph
� Set T : nodes of maximal K -truss
� 4-truss subgraph
� The maximal k -core and K -truss subgraphs overlap� K -truss is a subgraph of k -core (core of the k -core)� Heuristic to improve execution time
� We argue that set T contains highly influential nodes
47/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
K -truss Decomposition (2/2)Maximal k -core and K -truss subgraphs
Set
Set
6
CT
� Set C: nodes of maximal k -core
� 3-core subgraph
� Set T : nodes of maximal K -truss
� 4-truss subgraph
� The maximal k -core and K -truss subgraphs overlap� K -truss is a subgraph of k -core (core of the k -core)� Heuristic to improve execution time
� We argue that set T contains highly influential nodes
47/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Experimental Set-up (1/2)Baseline methods
� truss method: nodes belonging to set T (Proposed method)
� core method: nodes belonging to set C − T
� top degree method: nodes with highest degree T� Choose |C| − |T | nodes for fair comparison
48/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Experimental Set-up (2/2)How to simulate the spreading process?
Susceptible-Infected-Recovered (SIR) model
1. Set candidate node as infected (I state)
2. An infected node can infect its susceptible neighbors with probability β
� Set β close to the epidemic threshold τ = 1λ1
[Chakrabarti et al., ’08]
3. An infected node can recover (stop being active) with probability γ
� Set γ = 0.8
4. Count the total number of infected individuals (avg. over multiple runs)
S I Rβ γ
1− β 1− γ
State diagram of the SIR model
[Barrat et al., ’08]
49/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Datasets and PropertiesCharacteristics of the K -truss subgraphs
Network Name Nodes Edges kmax Kmax |C| |T | τ
EMAIL-ENRON 33, 696 180, 811 43 22 275 45 0.00840EPINIONS 75, 877 405, 739 67 33 486 61 0.00540WIKI-VOTE 7, 066 100, 736 53 23 336 50 0.00720EMAIL-EUALL 224, 832 340, 795 37 20 292 62 0.00970SLASHDOT 82, 168 582, 533 55 36 134 96 0.00074WIKI-TALK 2, 388, 953 4, 656, 682 131 53 700 237 0.00870
� τ = 1/λ1: epidemic threshold of the graph (λ1: largest eigenvalue of A)
� Set T has significantly smaller size compared to set C
50/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Datasets and PropertiesCharacteristics of the K -truss subgraphs
Network Name Nodes Edges kmax Kmax |C| |T | τ
EMAIL-ENRON 33, 696 180, 811 43 22 275 45 0.00840EPINIONS 75, 877 405, 739 67 33 486 61 0.00540WIKI-VOTE 7, 066 100, 736 53 23 336 50 0.00720EMAIL-EUALL 224, 832 340, 795 37 20 292 62 0.00970SLASHDOT 82, 168 582, 533 55 36 134 96 0.00074WIKI-TALK 2, 388, 953 4, 656, 682 131 53 700 237 0.00870
� τ = 1/λ1: epidemic threshold of the graph (λ1: largest eigenvalue of A)
� Set T has significantly smaller size compared to set C
50/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Evaluation the Spreading Process
Apply SIR simulation starting from a single node v each time
� Number of infected nodes at each time step of the process
� Total number of infected nodes Mv
� The time step where the epidemic fades out
51/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Average Number of Infected Nodes
Time StepMethod 2 4 6 8 10 Final step σ Max step
EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37
top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36
EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43
top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47
WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57
top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62
EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38
top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39
� Performance of truss method
� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier
52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Average Number of Infected Nodes
Time StepMethod 2 4 6 8 10 Final step σ Max step
EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37
top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36
EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43
top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47
WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57
top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62
EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38
top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39
� Performance of truss method
� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier
52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Average Number of Infected Nodes
Time StepMethod 2 4 6 8 10 Final step σ Max step
EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37
top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36
EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43
top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47
WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57
top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62
EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38
top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39
� Performance of truss method
� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier
52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Average Number of Infected Nodes
Time StepMethod 2 4 6 8 10 Final step σ Max step
EMAIL- truss 8.44 46.66 204.08 418.77 355.84 2, 596.52 136.7 33ENRON core 4.78 31.97 152.55 367.28 364.13 2, 465.60 199.6 37
top degree 6.89 34.13 155.48 360.89 357.08 2, 471.67 354.8 36
EPINIONS truss 4.17 19.70 75.04 204.14 329.08 2, 567.69 227.8 37core 3.45 14.72 55.27 158.56 280.03 2, 325.37 327.2 43
top degree 4.22 16.03 58.84 166.23 289.49 2, 414.99 331.7 47
WIKI- truss 2.92 6.92 15.27 28.73 42.46 560.66 114.9 52VOTE core 1.92 4.78 10.65 20.66 32.40 466.01 104.5 57
top degree 2.43 5.46 12.05 23.05 35.55 502.88 104.5 62
EMAIL- truss 11.62 62.25 240.97 584.87 725.42 5, 018.52 487.94 36EUALL core 9.85 40.82 158.72 433.81 644.76 4, 579.84 498.71 38
top degree 17.96 39.93 144.69 503.18 548.25 4, 137.56 1, 174.84 39
� Performance of truss method
� Higher infection rate during the first steps of the process� The total number of infected nodes is larger� The epidemic dies out earlier
52/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Comparison to the Optimal Spreading (1/2)Methodology
OPT1 OPT2 OPT3 · · · OPT|V |
Window W
1. Rank nodes according to the total infection size Mv
� OPT1 ≥ OPT2 ≥ . . . ≥ OPT|V |, where OPT1 = arg maxv∈V Mv
2. Consider window W over the ranked nodes
PTW =|TW |/|T ||W |/|V |
� TW is the set of nodes v ∈ T located in W
53/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Comparison to the Optimal Spreading (1/2)Methodology
OPT1 OPT2 OPT3 · · · OPT|V |
Window W
1. Rank nodes according to the total infection size Mv
� OPT1 ≥ OPT2 ≥ . . . ≥ OPT|V |, where OPT1 = arg maxv∈V Mv
2. Consider window W over the ranked nodes
PTW =|TW |/|T ||W |/|V |
� TW is the set of nodes v ∈ T located in W
53/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Comparison to the Optimal Spreading (2/2)Results
EMAIL-ENRON
0 0.5 1 1.50
20
40
60
80
100
Window W (%)
PW
(%
)
C
T
WIKI-VOTE
0 5 100
20
40
60
80
100
Window W (%)
PW
(%
)
C
T
� PTW reaches the maximum value (i.e., 100%) relatively early and forsmall window sizes, compared to PCW
� The nodes detected by the K -truss decomposition are betterdistributed among the most efficient spreaders
54/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Comparison to the Optimal Spreading (2/2)Results
EMAIL-ENRON
0 0.5 1 1.50
20
40
60
80
100
Window W (%)
PW
(%
)
C
T
WIKI-VOTE
0 5 100
20
40
60
80
100
Window W (%)
PW
(%
)
C
T
� PTW reaches the maximum value (i.e., 100%) relatively early and forsmall window sizes, compared to PCW
� The nodes detected by the K -truss decomposition are betterdistributed among the most efficient spreaders
54/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Discussion
� Core decomposition methods can help towards identifying singleinfluential spreaders
� K -truss decomposition
� Faster and wider epidemic spreading
� Well distributed nodes among those that are achieving the optimalspreading
� Ongoing work: Identification of multiple influential spreaders
55/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
From Influential Spreaders to Influential TermsIn a network of words
Graph mining and core decomposition
Detection of influential users in social networks
↓
Influential (important) entities in an interconnected space
↓
Influential terms (i.e., words) for text analytics
Text as information network
with application to keyword selection and text categorization
56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
From Influential Spreaders to Influential TermsIn a network of words
Graph mining and core decomposition
Detection of influential users in social networks
↓
Influential (important) entities in an interconnected space
↓
Influential terms (i.e., words) for text analytics
Text as information network
with application to keyword selection and text categorization
56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
From Influential Spreaders to Influential TermsIn a network of words
Graph mining and core decomposition
Detection of influential users in social networks
↓
Influential (important) entities in an interconnected space
↓
Influential terms (i.e., words) for text analytics
Text as information network
with application to keyword selection and text categorization
56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
From Influential Spreaders to Influential TermsIn a network of words
Graph mining and core decomposition
Detection of influential users in social networks
↓
Influential (important) entities in an interconnected space
↓
Influential terms (i.e., words) for text analytics
Text as information network
with application to keyword selection and text categorization
56/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Graph-Based Text Analytics
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Text is Everywhere
� Rapid growth of social media and networking platforms� Vast amount of textual data� Analyze and extract useful information
� Text analytics (mining) tasks� Summarization (e.g., keyword extraction)� Categorization (e.g., sentiment analysis)� Information Retrieval� ...
58/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Text is Everywhere
� Rapid growth of social media and networking platforms� Vast amount of textual data� Analyze and extract useful information
� Text analytics (mining) tasks� Summarization (e.g., keyword extraction)� Categorization (e.g., sentiment analysis)� Information Retrieval� ...
58/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Term Weighting in the Bag-of-Words Model
� Vector Space Model� Collection of m documents: D = {d1, d2, . . . , dm}� Set of n distinct terms (dictionary): T = {t1, t2, . . . , tn}
� Feature extraction and term weighting in the Bag-of-words model
� Find: feature vector di = {wi,1,wi,2, . . . ,wi,n}, ∀di ∈ D� Each document is represented as a bag (multiset) of its words
� Limitations� Term independence assumption of frequency-based weighting
schemes (e.g., TF, TF-IDF)� Order of terms is disregarded
59/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Term Weighting in the Bag-of-Words Model
� Vector Space Model� Collection of m documents: D = {d1, d2, . . . , dm}� Set of n distinct terms (dictionary): T = {t1, t2, . . . , tn}
� Feature extraction and term weighting in the Bag-of-words model
� Find: feature vector di = {wi,1,wi,2, . . . ,wi,n}, ∀di ∈ D� Each document is represented as a bag (multiset) of its words
� Limitations� Term independence assumption of frequency-based weighting
schemes (e.g., TF, TF-IDF)� Order of terms is disregarded
59/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Term Weighting in the Bag-of-Words Model
� Vector Space Model� Collection of m documents: D = {d1, d2, . . . , dm}� Set of n distinct terms (dictionary): T = {t1, t2, . . . , tn}
� Feature extraction and term weighting in the Bag-of-words model
� Find: feature vector di = {wi,1,wi,2, . . . ,wi,n}, ∀di ∈ D� Each document is represented as a bag (multiset) of its words
� Limitations� Term independence assumption of frequency-based weighting
schemes (e.g., TF, TF-IDF)� Order of terms is disregarded
59/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
ExampleText to graph representation; The Graph-of-words model
Graph representation of a document (w = 3; undirected graph)
Data Science is the extraction of knowledge from large volumes of datathat are structured or unstructured which is a continuation of the field ofdata mining and predictive analytics, also known as knowledge discoveryand data mining.
data
scienc
extract
knowledg
larg
volum
structur
unstructur
continu
field
mine
predict
analyt
discoveri
known
60/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
ExampleText to graph representation; The Graph-of-words model
Graph representation of a document (w = 3; undirected graph)
Data Science is the extraction of knowledge from large volumes of datathat are structured or unstructured which is a continuation of the field ofdata mining and predictive analytics, also known as knowledge discoveryand data mining.
data
scienc
extract
knowledg
larg
volum
structur
unstructur
continu
field
mine
predict
analyt
discoveri
known
60/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Degeneracy-based Keyword Extraction
●
●
●
●
●
●
●
●
●
●
●
●
●
●
mathemat
aspect
computer−aid
share
trade
problem
statist
analysi
price
probabilist
characterist
seri
method
model
Edge weights
12345
Core numbers
891011
●
●
●
inf
shells
diffe
renc
e in
she
ll si
zes
−6
−
4
−2
0
2
(10 − 11) (9 − 10) (8 − 9)
●
●
●
●
0.38
0.42
0.46
0.50
dens
k−cores
dens
ity
11 10 9 8
●
● ● ●
● ●●
● ●●
●
●
● ●
2 4 6 8 10 12 14
7090
110
130TR
nodes
TR s
core
s
elbowtop 33%
Mathematical aspects of computer-aided share trading. We consider problems of statistical analysis of share prices and propose probabilistic characteristics to describe the price series. We discuss three methods of mathematical modelling of price series with given probabilistic characteristics.
Core numbers TR scoresP R F1 P R F1
MAIN 0.86 0.55 0.67 ELB 1 0.18 0.31INF 0.83 0.91 0.87
PER 1 0.45 0.63DENS 0.88 0.64 0.74
mathemat 11 price .1359
price 11 share .0948
probabilist 11 .0906
characterist 11 .0870
seri 11 .0860
method 11 mathemat .0812
model 11 analysi .0633
share 10 statist .0595
trade 9 method .0569
problem 9 problem .0560
statist 9 trade .0525
analysi 9 model .0493
aspect 8 computer-aid .0453
computer-aid 8 aspect .0417
MAIN
INF
DENS
ELB
probabilist
characterist
seri
CR scoresP R F1
ELB 0.90 0.82 0.86
PER 1 0.45 0.63
mathemat 128
price 120
analysi 119
share 118
probabilist 112
characterist 112
statist 108
trade 97
problem 97
seri 94
method 85
computer-aid 76
model 66
aspect 65
PER
ELB
PER
(a)
(c)
(b)
●
●●
● ●●
●●
● ●●
●●
●
2 4 6 8 10 12 14
0.04
0.08
0.12
CRelbowtop 33%
nodes
CR
sco
res
(a) Graph-of-words (W = 8) for document #1512 of Hulth2003 decomposed with k -core. (b) Keywords extracted by the different methods
(human assigned keywords in bold). (c) Selection criterion of each method.
61/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Concluding Remarks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Mining Large Scale Networks
Network analytics with
spectral techniques and core decomposition
1. Spectral robustness estimation of social graphs
2. Core decomposition based identification of influential spreaders
3. Degeneracy-based keyword extraction
63/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Ongoing and Future Work
� Community detection and dense subgraph detectionalgorithms
� Influence maximization in social networks
� Algorithms for rich network structures� Time, location, strength, trust, text� Probabilistic, signed and multilayer graphs
� Machine Learning tasks on graphs� Link prediction, node classification and graph classification� Node and graph embeddings based on deep learning
� Text analytics using graph-based techniques
64/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Publications I
F. D. Malliaros, V. Megalooikonomou, and C.Faloutsos
Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly DetectiionIn SIAM International Conference on Data Mining (SDM), 2012.
F. D. Malliaros and M. Vazirgiannis
Clustering and Community Detection in Directed Networks: A Survey.Physics Reports, 533(4), Elsevier, 2013.
F. D. Malliaros and M. Vazirgiannis
To Stay or Not to Stay: Modeling Engagement Dynamics in Social Graphs.In ACM International Conference on Information and Knowledge Management (CIKM), 2013.
C. Giatsidis, F. D. Malliaros, D. M. Thilikos, and M. Vazirgiannis
CoreCluster: A Degeneracy Based Graph Clustering Framework.In AAAI Conference on Artificial Intelligence, (AAAI), 2014.
M.-E. G. Rossi, F. D. Malliaros, and M. Vazirgiannis
Spread It Good, Spread It Fast: Identification of Influential Nodes in Social Networks.In International Conference on World Wide Web Companion (WWW), 2015.
F. D. Malliaros and M. Vazirgiannis
Vulnerability Assessment in Social Networks under Cascade-based Node Departures.EPL (Europhysics Letters), 11(6), 2015.
F. D. Malliaros and Michalis Vazirgiannis
Disengagement Social Contagion: Assessing Network Vulnerability under Node Departures.In International Conference on Computational Social Science (ICCSS), 2015.
65/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Publications II
F. D. Malliaros, V. Megalooikonomou, and C.Faloutsos
Estimating Robustness in Large Social Graphs.Knowledge and Information Systems (KAIS), Springer, 2015.
F. D. Malliaros and K. Skianis
Graph-Based Term Weighting for Text Categorization.In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Social Media and RiskWorkshop (SoMeRis), 2015.
F. D. Malliaros, M. Rossi, and M. Vazirgiannis
Locating Influential Nodes in Complex Networks.Scientific Reports, 6(19307), Nature Publishing Group, 2016.
A. J.-P. Tixier, F. D. Malliaros, and M. Vazirgiannis.
A Graph Degeneracy-based Approach to Keyword Extraction.In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
F. D. Malliaros, K. Skianis, and M. Vazirgiannis
A Graph-Based Framework for Text Categorization: Term Weighting and Selection as Ranking of Interconnected Features.Submitted Manuscript, 2016.
J. Casas-Roma, F. D. Malliaros, and M. Vazirgiannis
k -Degree Anonymity on Directed Networks.Manuscript, 2016.
M. Mitri, F. D. Malliaros, and M. Vazirgiannis
Sensitivity of Community Structure to Network Uncertainty.Manuscript, 2016.
66/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
References I
V. Batagelj and M. Zaversnik.
AnO(m) Algorithm for Cores Decomposition of Networks.CoRR, 2003.
A. Barrat, M. Barthlemy, and A. Vespignani.Dynamical Processes on Complex Networks.Cambridge University Press, 2008.
D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and Christos Faloutsos.
Epidemic Thresholds in Real Networks.ACM Trans. Inf. Syst. Secur., 10(4), 2008.
J. Cohen
Trusses: Cohesive Subgraphs for Social Network Analysis.National Security Agency Technical Report, 2008.
D. Easley and J. KleinbergNetworks, Crowds, and Markets: Reasoning About a Highly Connected World.Cambridge University Press, 2010.
M. Faloutsos, P. Faloutsos, and C. Faloutsos
On Power-law Relationships of the Internet Topology.In SIGCOMM, 1999.
D. Kempe, J. Kleinberg, and E. Tardos
Maximizing the Spread of Influence Through a Social Network.In KDD, 2003.
M. Kitsak, L. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. Stanley, and H. Makse
Identification of Influential Spreaders in Complex Networks.Nature Physics, 6(11), 2010.
67/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
References II
S. Pei and H. A. Makse
Spreading Dynamics in Complex Networks.Journal of Statistical Mechanics: Theory and Experiment, 12, 2013.
S. B. Seidman
Network Structure and Minimum Degree.Social Networks 5, 1983.
J. Wang and J. Cheng
Truss Decomposition in Massive Networks.Proc. VLDB Endow., 5(9), 2012.
68/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks
Introduction and Research Overview Robustness Estimation Identification of Influential Spreaders Graph-Based Text Analytics Concluding Remarks
Thank You!
NETWORK AND DATA SCIENCE
Graph Mining
Social Networks Analysis
Spectral Graph Theory
Core Decomposition
Applied Machine Learning
Natural Language Processing
Engagementdynamics
Social vulnerabilityassessment
Graph-basedtext analytics
Privacy in realnetworks
Spectral robust-ness estimation
Communitydetection
Influentialspreaders
69/69 Fragkiskos D. Malliaros UCSD AI Seminar Mining Social and Information Networks