Click here to load reader
View
334
Download
0
Embed Size (px)
Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD15)
Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program
In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)
Hi,Shobeir, UMDSpam at Social Net.Work done @ ifwe through graphlab internshipPaper will apear at KDD this yearWork with collaborators.
1
Spam in Social NetworksRecent study by Nexgate in 2013:Spam grew by more than 300% in half a year1 in 200 of social messages are spam5% of all social apps are spammy
Social networks:Spammers have more ways to interact with users e.g, Comments on photos, send message, and send not verbal signals They can split content in several messages and formsMore available info about users on their profiles!2
Our ApproachLets look at the graph:Multi-relational social network Links = actions at time t (e.g. profile view, message, or poke)Time-stamped Links
3
Tagged.comTagged.com is a social network founded in 2004 which focuses on connecting people through different social features and games.It has over 300 million registered members
Data sample for experiments (on a laptop):5.6 Million users (%3.9 Labeled Spammers)912 Million Links4
Graph StructureUsing Graphlab CreateGraph Analytics Toolkit to Generate Features:In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle CountGradient Boosted Trees:Classification (Spam or Legitimate)5
Graph Structure6ExperimentsAU-PRAU-ROC1 Relation, k Feature type0.187 0.004 0.803 0.001 n Relations, 1 Feature Type0.285 0.002 0.809 0.001 n Relations, k Features types0.328 0.003 0.817 0.001
Sequence of ActionsSequential k-gram Features: Short sequence segment of k consecutive actions, to capture the order of events
7User1 Actions: Message, Profile_view, Message, Friend_Request, .
Sequence of ActionsMixture of Markov Models (MMM): Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences8
Sequence of Actions9ExperimentsAU-PRAU-ROCBigram Features0.471 0.004 0.859 0.001 MMM0.246 0.009 0.821 0.003
Results10Precision-Recall ROC
We can classify 70% of the spammer that need manual labeling with about 90% accuracy
Deployment and Example RuntimesWe can: Run the model on short intervals, with new snapshots of the networkUpdate some of the features with events.
Example runtimes with Graphlab CreateTM on a Macbook Pro:5.6 million nodes and 350 million linksPageRank: 6.25 minutesTriangle counting: 17.98 minutesk-core: 14.3 minutes11
Refining the abuse reporting systemsAbuse report systems are very noisy.People have different standards.Spammers report random people to increase noise.Personal gain in social games.
Goal is to clean up the system usingReporters previous historyLooking at all the reports collectively 12
Probabilistic Soft Logic (PSL)Probabilistic Soft Logic (PSL) Declarative language based on first-order logic to express collective probabilistic inference problems.
Example:
13
Results of Classification Using Reports14ExperimentsAU-PRAU-ROCReports Only0.674 0.008 0.611 0.007 Reports & Credibility0.869 0.006 0.862 0.004 Reports & Credibility & Collective Reasoning0.884 0.005 0.873 0.004
ConclusionsStrong signals in multi-relational time-stamped graph Graph Structure:Multi-rationality is more important than extracting more features from one relationsSequence:Bigrams features of actions capture most of the patternsReports:Incorporating credibility of reporting users and collective reasoning significantly improves the signal
Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern15
Acknowledgements Co-authors of the paper:James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon HillDato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir 16
Acknowledgements Co-authors of the paper:James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon HillDato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir 17Thank you!