Upload
markus-fensterer
View
768
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Summary of Ceren Budak et al.'s paper "Structural Trend Analysis for Online Social Networks"
Citation preview
Structural Trend Analysis for Online Social Networks paper authors: Ceren Budak, Divyakant Agrawal, Amr El Abbadi
Markus Fensterer 11/23/11
Outline
1. Introduction 2. Problem definition 3. Trend definitions 4. Validating significance
4.1. Model based validation
4.2. Analysis based validation
5. Detecting coordinated trends 6. Detecting uncoordinated trends 7. Sybil attacks 8. Summary 9. Critics
Markus Fensterer 11/23/11
1. Introduction
Markus Fensterer 11/23/11
• trend detection is of significant interest • trends could be seen as
• reflection of societal concerns • collective decision making
• research in temporal and geographical dimensions • ignoring the structure behind network
• Goals of paper: • introduce network structure into trend analysis • overcome vulnerabilities • detect interesting activities in different communities • online algorithms
2. Problem Definition
• graph G = (N, E) • eij: node nj is neighbor of ni in network • topics T: can be shared direct neighbor-nodes • mention: <ni, Tx> node ni mentions topic Tx • stream: history of mentions
Markus Fensterer 11/23/11
3. Trend definitions
• traditional trend o trendiness: number of mentions
• structural trend o “popular topic within a structural subgroup of a network”
• types of structural trends o coordinated trends o uncoordinated trends
Markus Fensterer 11/23/11
Coordinated trends
• trendiness: number of users discussing it • favors topics discussed in clustered nodes • formal:
• favors uniform distribution of mentions per node
sample networks
Markus Fensterer 11/23/11
Uncoordinated trends
• trendiness: number of unrelated persons mentioning it • favors topics with a large number of mentions • no bias towards clustered nodes • notion of trustworthiness of a trend • formal:
sample networks
Markus Fensterer 11/23/11
4. Validating significance
• difference to traditional trends • nature of detected topics
• methods: o Model based validation - using an information diffusion model o Analysis based validation – analyzing a Twitter data set
Markus Fensterer 11/23/11
4.1. Model based validation
• Independent Trend Formation Model • based on Independent Cascade Model • assumptions:
o independent topic diffusion o diffusion in discrete time steps
• external influence pix: will ni mention Tx independently from neighbors?
• peer influence qijx: will ni mention Tx given that neighbor nj mentioned it?
• generate a synthetic graph with Nearest Neighbor Model o Facebook Monterey Bay Network o u = 0.8, k=1 o 500 nodes, 50 topics
Markus Fensterer 11/23/11
Difference to traditional trends
• SRCC – Spearman rank correlation coefficient:
dx: rank difference n: number of topics all topics considered
• AP – Average Precision: D: relevant documents R: ranked documents top-k topics considered
Markus Fensterer 11/23/11
Difference to traditional trends
similarity measures for varying q‘s
similarity measures for varying p‘s
• structural trends diverge from traditional trends
Markus Fensterer 11/23/11
Nature of detected topics
• devide T in two halves T‘, T‘‘ • experiments within T‘‘
o increase in p‘‘ can be balanced by q‘‘ o average traditional ranking of T‘ and T‘‘ should be equal o experiment 1: p‘‘ < 0.1 and q‘‘ > 0.1 o experiment 2: p‘‘ > 0.1 and q‘‘ < 0.1
average ranking of topics
• experiment 1: top-25 coordinated trends come from T‘‘ • experiment 2: top-25 coordinated trends come from T‘
Markus Fensterer 11/23/11
4.2. Analysis based validation
• Twitter data set • hashtags are used as topics • after filtering and categorization
o 2.7 million users o 230 million edges o 2.9 million topics
Markus Fensterer 11/23/11
Difference to traditional trends
Markus Fensterer 11/23/11
SRCC and AP of top-k traditional trend topics in predicting top-k structural trend topics
• traditional trendiness: bad predictor for coordinated trends • uncoordinated trends: more similar to traditional trends
Nature of detected topics
Markus Fensterer 11/23/11
ranking of three topics within the data set
• coordinated trend #hhrs: „Hugh Hewitt Radio Show“ o effect of homophily
• uncoordinated trend #twitterafterdark o idiom: usage depends more on personal experience
• insignificant as structural trend: #apple
Nature of detected topics
visualization for #pawpawty and #mafiawars
• #pawpawty: high coordinated importance o suggestion: social motivation, homophily effect
• #mafiawars: low coordinated importance
Markus Fensterer 11/23/11
Effect of categorical characteristics on trendiness
• categorize 500 hashtags in 7 categories o politics, technology, celebrity, games, idioms, movies, music and none
CDF‘s for politics and idioms
• political hashtags trendiness is improved by coordinated trends • idioms trendiness is improved by uncoordinated trends
Markus Fensterer 11/23/11
5. Detecting coordinated trends
• naive: compute g for each topic
• incremental counting algorithm: o receiving <nl, Tx> o increment Cl,x by 1 o update score of Tx:
o requires O(n) reads o two adjacency lists per node (incoming/outgoing edges) o hashtable per topic: maps users to counts o sorted representation of top-k topics o delivers exact values => computationally expensive
Markus Fensterer 11/23/11
Reduction to counting local triangles
• multigraph G‘ = (N‘, E‘) • N‘ = N u T • E‘ = {(u,v) ⏐ (u,v) ∈ E v <u,v> ∈ S v <v,u> ∈ S} • any three nodes u, v, w with build a triangle in G‘ • g(Tx) = „number of triangles incident to topic node Tx in G‘“
Building G‘ out of G and S
Markus Fensterer 11/23/11
Upper error boundary for sampling
Markus Fensterer 11/23/11
• Chebyshev‘s inequality + case distinction for Var(Xx) • Xx: estimated number of triangles incident to Tx
• ∆x: real number of triangles incident to Tx
• ps: sampling rate • αx: number of pairs of triangles involving Tx and are not edge
disjoint
• number of multiedges has big influence to quality of estimate • if αx gets larger: error becomes linearly worse • but for larger ∆x : estimate becomes quadratically better
=> estimate still better for trendy topics
Average Precision of sampling
AP of sampling for coordinated trends AP of sampling for uncoordinated trends
• even for smaller p‘s Average Precision is still high • sampling works better for uncoordinated trends
Markus Fensterer 11/23/11
6. Detecting uncoordinated trends
• incremental counting algorithm o receiving <nl, Tx> o increment Cl,x by 1 o update score of Tx: o requires O(n) reads o could be optimized by keeping track of traditional trendiness score
• Reduction to counting local triangles o multigraph G‘ = (N‘, E‘) o N‘ = N u T o E‘ = {(u,v) ⏐ (u,v) ∉ E v <u,v> ∈ S}
Markus Fensterer 11/23/11
7. Sybil aQacks
• many virtual identities
• highly connected • small number of connections to real users
• Findings: o coordinated rank > traditional rank > uncoordinated rank o breakpoint of popularity earlier than for a normal coordinated trend o breakpoint is seen with smaller set of nodes than normal
Markus Fensterer 11/23/11
8. Summary
• new trend definitions • significantly different from traditional • focus on coordinated trends • characteristics of topics • online algorithms to detect trends • future research:
o spam not limited to Sybil attacks o evolution of trends throughout time o study in between of coordinated and uncoordinated trends
Markus Fensterer 11/23/11
9. Critics
• good: o real world example o implementation guideline for algorithms o two measures for similarity – SRCC and AP
• bad: o almost no divergence between traditional and uncoordinated trends with
SRCC
o no explanation why #apple is not a structural trend
Markus Fensterer 11/23/11