Click here to load reader

Collective Spammer Detection in Evolving Multi-Relational Social Networks

  • View
    334

  • Download
    0

Embed Size (px)

Text of Collective Spammer Detection in Evolving Multi-Relational Social Networks

Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD15)

Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program

In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)

Hi,Shobeir, UMDSpam at Social Net.Work done @ ifwe through graphlab internshipPaper will apear at KDD this yearWork with collaborators.

1

Spam in Social NetworksRecent study by Nexgate in 2013:Spam grew by more than 300% in half a year1 in 200 of social messages are spam5% of all social apps are spammy

Social networks:Spammers have more ways to interact with users e.g, Comments on photos, send message, and send not verbal signals They can split content in several messages and formsMore available info about users on their profiles!2

Our ApproachLets look at the graph:Multi-relational social network Links = actions at time t (e.g. profile view, message, or poke)Time-stamped Links

3

Tagged.comTagged.com is a social network founded in 2004 which focuses on connecting people through different social features and games.It has over 300 million registered members

Data sample for experiments (on a laptop):5.6 Million users (%3.9 Labeled Spammers)912 Million Links4

Graph StructureUsing Graphlab CreateGraph Analytics Toolkit to Generate Features:In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle CountGradient Boosted Trees:Classification (Spam or Legitimate)5

Graph Structure6ExperimentsAU-PRAU-ROC1 Relation, k Feature type0.187 0.004 0.803 0.001 n Relations, 1 Feature Type0.285 0.002 0.809 0.001 n Relations, k Features types0.328 0.003 0.817 0.001

Sequence of ActionsSequential k-gram Features: Short sequence segment of k consecutive actions, to capture the order of events

7User1 Actions: Message, Profile_view, Message, Friend_Request, .

Sequence of ActionsMixture of Markov Models (MMM): Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences8

Sequence of Actions9ExperimentsAU-PRAU-ROCBigram Features0.471 0.004 0.859 0.001 MMM0.246 0.009 0.821 0.003

Results10Precision-Recall ROC

We can classify 70% of the spammer that need manual labeling with about 90% accuracy

Deployment and Example RuntimesWe can: Run the model on short intervals, with new snapshots of the networkUpdate some of the features with events.

Example runtimes with Graphlab CreateTM on a Macbook Pro:5.6 million nodes and 350 million linksPageRank: 6.25 minutesTriangle counting: 17.98 minutesk-core: 14.3 minutes11

Refining the abuse reporting systemsAbuse report systems are very noisy.People have different standards.Spammers report random people to increase noise.Personal gain in social games.

Goal is to clean up the system usingReporters previous historyLooking at all the reports collectively 12

Probabilistic Soft Logic (PSL)Probabilistic Soft Logic (PSL) Declarative language based on first-order logic to express collective probabilistic inference problems.

Example:

13

Results of Classification Using Reports14ExperimentsAU-PRAU-ROCReports Only0.674 0.008 0.611 0.007 Reports & Credibility0.869 0.006 0.862 0.004 Reports & Credibility & Collective Reasoning0.884 0.005 0.873 0.004

ConclusionsStrong signals in multi-relational time-stamped graph Graph Structure:Multi-rationality is more important than extracting more features from one relationsSequence:Bigrams features of actions capture most of the patternsReports:Incorporating credibility of reporting users and collective reasoning significantly improves the signal

Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern15

Acknowledgements Co-authors of the paper:James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon HillDato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir 16

Acknowledgements Co-authors of the paper:James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon HillDato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir 17Thank you!

Search related