Collective Spammer Detection in Evolving Multi-Relational Social Networks

Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD’15)

Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program

In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)

Spam in Social Networks• Recent study by Nexgate in 2013:• Spam grew by more than 300% in half a year• 1 in 200 of social messages are spam• 5% of all social apps are spammy

• Social networks:• Spammers have more ways to interact with users

e.g, Comments on photos, send message, and send not verbal signals

• They can split content in several messages and forms• More available info about users on their profiles!

Our Approach• Lets look at the graph:• Multi-relational social

network Links = actions at time t (e.g. profile view, message, or poke)

• Time-stamped Links

Tagged.com• Tagged.com is a social network founded in 2004 which

focuses on connecting people through different social features and games.

• It has over 300 million registered members

• Data sample for experiments (on a laptop):• 5.6 Million users (%3.9 Labeled Spammers)• 912 Million Links

Graph Structure

• Using Graphlab Create• Graph Analytics Toolkit to Generate Features:

• In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle Count

• Gradient Boosted Trees:• Classification (Spam or Legitimate)

Graph StructureExperiments AU-PR AU-ROC

1 Relation, k Feature type 0.187 ± 0.004 0.803 ±0.001

n Relations, 1 Feature Type 0.285 ± 0.002 0.809 ± 0.001

n Relations, k Features types 0.328 ± 0.003 0.817 ± 0.001

Sequence of Actions• Sequential k-gram Features:

Short sequence segment of k consecutive actions, to capture the order of events

User1 Actions: Message, Profile_view, Message, Friend_Request, ….

Sequence of Actions• Mixture of Markov Models (MMM):

Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences

Sequence of ActionsExperiments AU-PR AU-ROC

Bigram Features 0.471 ± 0.004 0.859 ± 0.001

MMM 0.246 ± 0.009 0.821 ± 0.003

ResultsPrecision-Recall ROC

We can classify 70% of the spammer that need manual labeling with about 90% accuracy

Deployment and Example Runtimes• We can: • Run the model on short intervals, with new snapshots of the network• Update some of the features with events.

• Example runtimes with Graphlab CreateTM on a Macbook Pro:• 5.6 million nodes and 350 million links• PageRank: 6.25 minutes• Triangle counting: 17.98 minutes• k-core: 14.3 minutes

Refining the abuse reporting systems• Abuse report systems are very noisy.• People have different standards.• Spammers report random people to increase noise.• Personal gain in social games.

• Goal is to clean up the system using• Reporters previous history• Looking at all the reports collectively

Probabilistic Soft Logic (PSL)• Probabilistic Soft Logic (PSL) Declarative language based on

first-order logic to express collective probabilistic inference problems.

• Example:

Results of Classification Using ReportsExperiments AU-PR AU-ROC

Reports Only 0.674 ± 0.008 0.611 ± 0.007

Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004

Reports & Credibility & Collective Reasoning 0.884 ± 0.005 0.873 ± 0.004

Conclusions• Strong signals in multi-relational time-stamped graph

• Graph Structure:• Multi-rationality is more important than extracting more features from one relations

• Sequence:• Bigrams features of actions capture most of the patterns

• Reports:• Incorporating credibility of reporting users and collective reasoning significantly improves

the signal

• Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

Thank you!

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Technology

Chapter 3: Relational Model · Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-Algebra-Operations

Time-Evolving Relational Classiﬁcation and Ensemble …ryanrossi.com/pubs/rossi-tenc-pakdd2012.pdfTime-Evolving Relational Classiﬁcation and Ensemble Methods RyanRossiandJenniferNeville

Relational Databases are Evolving To Support New Data Capabilities

Relational Algebra-Relational Calculus-SQL

The Relational Data Model and Relational Database Constraints · »The Relational Data Model »Relational Model Constraints »Update Operations »Relational Algebra . Relational Model

Relational Algebra and Relational Calculus

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-Algebra-Operations

The Relational Algebra. Slide 6- 2 Outline Relational Algebra Unary Relational Operations Relational Algebra Operations From Set Theory Binary Relational

Web2.0 Spammer @ World: Follow me on Twitter!!!...Web2.0 Spammer @ World: Follow me on Twitter!!! Alexandru Cătălin Coşoi Senior Researcher / AntiSpam Laboratory BitDefender Twitter

Collective Spammer Detection in Evolving Multi-Relational ... · 1 a 2 a n-1 a n... y x 1 x 2 x n-1 x n... neighborhood characteristics of each user. Our goal is to capture the structural

Mining Content and Relations for Social Spammer Detection ...€¦ · social media. Complex network interactions and evolving content present great challenges for social spammer detection

Recommendations for Evolving Relational DatabasesRecommendations for Evolving Relational Databases 499 Issue 1: Relational database management systems (RDBMS) do not allow schema inconsistencies

Spammer Presentation Hope Sievert2

Advanced Mail. Computer Center, CS, NCTU 2 Introduction SPAM vs. non-SPAM Mail sent by spammer vs. non-spammer Problem of SPAM mail Over 99% of E-mails

Relational vs. Non-Relational

The Relational Model & Relational Algebra

+ Collective Spammer Detection in Evolving Multi-Relational Social Networks Shobeir Fakhraei (University of Maryland) James Foulds (University of California,

How is Life for a Table in an Evolving Relational Schema? Birth, Death & Everything in Between Panos Vassiliadis, Apostolos Zarras, Ioannis Skoulis Department

Don’t Look Like a Spammer

Relational Model. 2 Structure of Relational Databases Fundamental Relational-Algebra-Operations Additional Relational-Algebra-Operations Extended Relational-Algebra-Operations