Collective Spammer Detection in Evolving Multi-Relational Social Networks

Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD’15)

Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program

In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)

2

Spam in Social Networks• Recent study by Nexgate in 2013:• Spam grew by more than 300% in half a year• 1 in 200 of social messages are spam• 5% of all social apps are spammy

• Social networks:• Spammers have more ways to interact with users

e.g, Comments on photos, send message, and send not verbal signals

• They can split content in several messages and forms• More available info about users on their profiles!

3

Our Approach• Lets look at the graph:• Multi-relational social

network Links = actions at time t (e.g. profile view, message, or poke)

• Time-stamped Links

4

Tagged.com• Tagged.com is a social network founded in 2004 which

focuses on connecting people through different social features and games.

• It has over 300 million registered members

• Data sample for experiments (on a laptop):• 5.6 Million users (%3.9 Labeled Spammers)• 912 Million Links

5

Graph Structure

• Using Graphlab Create• Graph Analytics Toolkit to Generate Features:

• In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle Count

• Gradient Boosted Trees:• Classification (Spam or Legitimate)

6

Graph StructureExperiments AU-PR AU-ROC

1 Relation, k Feature type 0.187 ± 0.004 0.803 ±0.001

n Relations, 1 Feature Type 0.285 ± 0.002 0.809 ± 0.001

n Relations, k Features types 0.328 ± 0.003 0.817 ± 0.001

7

Sequence of Actions• Sequential k-gram Features:

Short sequence segment of k consecutive actions, to capture the order of events

User1 Actions: Message, Profile_view, Message, Friend_Request, ….

8

Sequence of Actions• Mixture of Markov Models (MMM):

Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences

9

Sequence of ActionsExperiments AU-PR AU-ROC

Bigram Features 0.471 ± 0.004 0.859 ± 0.001

MMM 0.246 ± 0.009 0.821 ± 0.003

10

ResultsPrecision-Recall ROC

We can classify 70% of the spammer that need manual labeling with about 90% accuracy

11

Deployment and Example Runtimes• We can: • Run the model on short intervals, with new snapshots of the network• Update some of the features with events.

• Example runtimes with Graphlab CreateTM on a Macbook Pro:• 5.6 million nodes and 350 million links• PageRank: 6.25 minutes• Triangle counting: 17.98 minutes• k-core: 14.3 minutes

12

Refining the abuse reporting systems• Abuse report systems are very noisy.• People have different standards.• Spammers report random people to increase noise.• Personal gain in social games.

• Goal is to clean up the system using• Reporters previous history• Looking at all the reports collectively

13

Probabilistic Soft Logic (PSL)• Probabilistic Soft Logic (PSL) Declarative language based on

first-order logic to express collective probabilistic inference problems.

• Example:

14

Results of Classification Using ReportsExperiments AU-PR AU-ROC

Reports Only 0.674 ± 0.008 0.611 ± 0.007

Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004

Reports & Credibility & Collective Reasoning 0.884 ± 0.005 0.873 ± 0.004

15

Conclusions• Strong signals in multi-relational time-stamped graph

• Graph Structure:• Multi-rationality is more important than extracting more features from one relations

• Sequence:• Bigrams features of actions capture most of the patterns

• Reports:• Incorporating credibility of reporting users and collective reasoning significantly improves

the signal

• Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern

16

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

http://www.cs.umd.edu/~shobeir

17

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

Thank you!

http://www.cs.umd.edu/~shobeir

Technology

Collective Spammer Detection in Evolving Multi-Relational Social Networks