Transcript
Page 1: Collective Spammer Detection in Evolving Multi-Relational Social Networks

Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD’15)

Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program

In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)

Page 2: Collective Spammer Detection in Evolving Multi-Relational Social Networks

2

Spam in Social Networks• Recent study by Nexgate in 2013:• Spam grew by more than 300% in half a year• 1 in 200 of social messages are spam• 5% of all social apps are spammy

• Social networks:• Spammers have more ways to interact with users

e.g, Comments on photos, send message, and send not verbal signals

• They can split content in several messages and forms• More available info about users on their profiles!

Page 3: Collective Spammer Detection in Evolving Multi-Relational Social Networks

3

Our Approach• Lets look at the graph:• Multi-relational social

network Links = actions at time t (e.g. profile view, message, or poke)

• Time-stamped Links

Page 4: Collective Spammer Detection in Evolving Multi-Relational Social Networks

4

Tagged.com• Tagged.com is a social network founded in 2004 which

focuses on connecting people through different social features and games.

• It has over 300 million registered members

• Data sample for experiments (on a laptop):• 5.6 Million users (%3.9 Labeled Spammers)• 912 Million Links

Page 5: Collective Spammer Detection in Evolving Multi-Relational Social Networks

5

Graph Structure

• Using Graphlab Create• Graph Analytics Toolkit to Generate Features:

• In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle Count

• Gradient Boosted Trees:• Classification (Spam or Legitimate)

Page 6: Collective Spammer Detection in Evolving Multi-Relational Social Networks

6

Graph StructureExperiments AU-PR AU-ROC

1 Relation, k Feature type 0.187 ± 0.004 0.803 ±0.001

n Relations, 1 Feature Type 0.285 ± 0.002 0.809 ± 0.001

n Relations, k Features types 0.328 ± 0.003 0.817 ± 0.001

Page 7: Collective Spammer Detection in Evolving Multi-Relational Social Networks

7

Sequence of Actions• Sequential k-gram Features:

Short sequence segment of k consecutive actions, to capture the order of events

User1 Actions: Message, Profile_view, Message, Friend_Request, ….

Page 8: Collective Spammer Detection in Evolving Multi-Relational Social Networks

8

Sequence of Actions• Mixture of Markov Models (MMM):

Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences

Page 9: Collective Spammer Detection in Evolving Multi-Relational Social Networks

9

Sequence of ActionsExperiments AU-PR AU-ROC

Bigram Features 0.471 ± 0.004 0.859 ± 0.001

MMM 0.246 ± 0.009 0.821 ± 0.003

Page 10: Collective Spammer Detection in Evolving Multi-Relational Social Networks

10

ResultsPrecision-Recall ROC

We can classify 70% of the spammer that need manual labeling with about 90% accuracy

Page 11: Collective Spammer Detection in Evolving Multi-Relational Social Networks

11

Deployment and Example Runtimes• We can: • Run the model on short intervals, with new snapshots of the network• Update some of the features with events.

• Example runtimes with Graphlab CreateTM on a Macbook Pro:• 5.6 million nodes and 350 million links• PageRank: 6.25 minutes• Triangle counting: 17.98 minutes• k-core: 14.3 minutes

Page 12: Collective Spammer Detection in Evolving Multi-Relational Social Networks

12

Refining the abuse reporting systems• Abuse report systems are very noisy.• People have different standards.• Spammers report random people to increase noise.• Personal gain in social games.

• Goal is to clean up the system using• Reporters previous history• Looking at all the reports collectively

Page 13: Collective Spammer Detection in Evolving Multi-Relational Social Networks

13

Probabilistic Soft Logic (PSL)• Probabilistic Soft Logic (PSL) Declarative language based on

first-order logic to express collective probabilistic inference problems.

• Example:

Page 14: Collective Spammer Detection in Evolving Multi-Relational Social Networks

14

Results of Classification Using ReportsExperiments AU-PR AU-ROC

Reports Only 0.674 ± 0.008 0.611 ± 0.007

Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004

Reports & Credibility & Collective Reasoning 0.884 ± 0.005 0.873 ± 0.004

Page 15: Collective Spammer Detection in Evolving Multi-Relational Social Networks

15

Conclusions• Strong signals in multi-relational time-stamped graph

• Graph Structure:• Multi-rationality is more important than extracting more features from one relations

• Sequence:• Bigrams features of actions capture most of the patterns

• Reports:• Incorporating credibility of reporting users and collective reasoning significantly improves

the signal

• Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern

Page 16: Collective Spammer Detection in Evolving Multi-Relational Social Networks

16

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

Page 17: Collective Spammer Detection in Evolving Multi-Relational Social Networks

17

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

Thank you!


Recommended