@spam: The Underground on 140 Characters or Less. Chris Grier, Vern Paxson , Michael Zhang University of California, Berkeley Kurt Thomas University of Illinois, Urbana- Champaign ACM CCS 2010. Agenda. Introduction Background Data Collection Spam On Twitter Spam Campaign - PowerPoint PPT Presentation
@spam: The Underground on 140 Characters or Less
@spam: The Underground on 140 Characters or LessChris Grier, Vern Paxson, Michael ZhangUniversity of California, BerkeleyKurt ThomasUniversity of Illinois, Urbana-ChampaignACM CCS 20102011/3/221Twitter1401AgendaIntroductionBackgroundData CollectionSpam On TwitterSpam CampaignBlacklist PerformanceConclusion
2011/3/222IntroductionTwitter has developed a following of 106 million users that post to the site over one billion times per monthThreatForce guessing of weak passwordsPhishingTwitter currently lacks a filtering mechanism to prevent spam, with the exception of malware, blocked using Googles Safebrowsing APITwitter has developed a loose set of heuristics to quantify spamming activity, such as excessive account creation or requests to befriend other users2011/3/223users tweet malwaregoogle sagebrowsing API Twitter3Introduction (cont.)Present the first in-depth look at spam on TwitterFinding that 0.13% of users exposed to spam URLs click though to the spam web siteIdentify a diversity of spam campaigns exploiting a range of Twitter features to attract audiencesBlacklists are currently too slow to stop harmful linksTwo types of spamming accounts on twitter2011/3/2240.13useremail spamSpamfeatureaudiencesBlacklist4BackgroundCommon techniques to filter email spamIP blacklisting domain and URL blacklisting filtering on email contentsSocial network spam requires a large social circleThe challenge of a successful spam campaign in Twitter:Obtaining enough accounts URL shortening services on TwitterHave enough fresh URLs
2011/3/225Redirection servicesURL shorteningemail5Background (cont.)TweetsTwitter restricts these updates to 140 characters or lessURL shorteningFollowerHow to obtain a lot of followersFriendsRelationships in Twitter are not bidirectionalMentions, Retweets, Hashtags2011/3/226
Data collectionCollect data from two separate tapstargets a random sample of Twitter activity specifically targets any tweets containing URLs.use a custom web crawler to follow the URL through HTTP status codes and META tag redirects until reaching the final landingRedirect resolution removes any URL obfuscation that masks the domain of the final landing page2011/3/227http://antbsd.twbbs.org/~ant/wordpress/?p=1107 301,3027Data collection (cont.)We regularly check every landing pages URL in our data set against three blacklists:Google Safebrowsingphishing or malwareURIBL , Joewein domain present in spam emailOnce a landing page is marked as spam, we analyze the associated spam tweets and users involved in the spam operation.We have found that URIBL and Joewein include domains that are not exclusively hosting spam
2011/3/228URIBL and Joewein blacklist 8Data collection (cont.)During this time we gathered over 200 million tweets from the stream Over 3 million tweets were identified as spamCrawled 25 million URLs 8% of all unique links were identified as spam by blacklists5% were malware and phishing95% directed users towards scams2011/3/229Over 90% of Twitter users have public accounts half of 120,000 users with public accounts have sent spam identified by our blacklists
9Data collection (cont.)bit.ly or an affiliated service is used to shorten a spam URLwe use the bit.ly API to download clickthrough statistics and click stream data which allows us to identify highly successful spam pages and the rate of traffic2011/3/2210Spam On TwitterSpammers must coerce Twitter members into following spam accountsspamming botscompromised accountsunwitting participants in spam distribution.
2011/3/2211Spam On Twitter (cont.)Roughly 50% of spam was uncategorized due to using random termsThis table is the other 50%2011/3/2212
11%twitter following12Spam On Twitter (cont.)2011/3/2213
Spam On Twitter (cont.)Call outs : Mentions are used by spammers to personalize messages in an attempt to increase the likelihood a victim follows a spam link.
Retweets : four sources of spam retweets : retweets purchased by spammers from respected Twitter membersspam accounts retweeting other spamhijacked retweetsusers unwittingly retweeting spam.2011/3/2214Example: Win an iTouch AND a $150 Apple gift card @victim!http://spam.comExample: RT @scammer: check out the Ipads there having a giveaway http://spam.comSpam On Twitter (cont.)Tweet hijacking : spammers can hijack tweets posted by other users and retweet them, prepending the tweet with spam URLs.
Trend setting : the anomaly of 70% of phishing and malware spam containing hashtags can be explained by spammers attempting to create a trending topic
Trend hijacking : Rather than generating a unique topic, spammers can append currently trending topics to their own spam.2011/3/2215Example: http://spam.com RT @barackobama A great battle isahead of usExample: Buy more followers! http://spam.com #fwlrSpam On Twitter (cont.)2011/3/2216
97.7URLlink 160visitor16Spam On Twitter (cont.)Coefficient of correlation between clicks and featureaccounts involved in spamming and the number of followers that receive a link ( > 0. 7)Hashtag (=0.74)retweets with hashtags (=0.55)number of times spam is tweeted (=0.28)indicating that repeatedly posting a link does little to increase traffic.
2011/3/2217Spam On Twitter (cont.)To understand the effectiveness of tweeting to entice a follower into visiting a spam URLReach = t f t: the total tweets sentf: the followers exposed to each tweet Averaging of (clicks / reach) for each of the 245,000 URLs in our bit.ly data setfind roughly 0.13% of spam tweets generate a visit, orders of magnitude higher when compared to clickthrough rates of 0.003%0.006% reported for spam email2011/3/2218Spam On Twitter (cont.)A number of factors which may degrade the quality of this estimatebit.ly URLs which may carry an inherent bias of trust as the most popular URL shortening serviceclick data from bit.ly includes the entire history of a link, while our observation of a links usage only account for one month of Twitter activity2011/3/221993%tweetTweet spam email19Spam On Twitter (cont.)Twitter accountscareer spamming accounta compromised account was created by a legitimate userTestsx2 test on timestampTweet text and link entropy
20Compromised spamming accountsan account could have been compromised by means of phishing, malware, or simple password guessing, currently a major trend in Twitter the Koobface botnet2011/3/2221Spam On Twitter (cont.)Pass test compromise tweetspam URL compromisephishing follow spammer21Spam Tools2011/3/2222
Spam Campaigns2011/3/2223Spam Campaigns (cont.)if an account participates in multiple campaigns, the algorithm will automatically group the campaigns into a single supersetAn account is shared by two spammersused for multiple campaigns over time by a single spammercompromised by different services
2011/3/2224Spam Campaigns (cont.)2011/3/2225
Spam Campaigns (cont.)2011/3/2226
Spam Campaigns (cont.)URLs being tweetedSingle hop (shortened landing page)Second hop(shortened URL affiliate link landing page).landing page itself appears in tweetsPhishing for followerswebsites purporting to provide victims with followers if they revealed their account credentialsphished accounts are used to further promote the phishing campaign.Defining featurestweets in this campaign is the extensive use of hashtags, 73%
2011/3/2227Spam Campaigns (cont.)Personalized mentions (http:// twitprize.com)Spam within the campaign would target victims by using mentions and crafting URLs to include the victims Twitter account name to allow for personalized greetingsDefining features99% are a retweet or mentionthis campaign pass the entropy tests since each tweet contains a different username and the links point to distinct twitprize URLs.2011/3/2228Spam Campaigns (cont.)Buying retweetsOne such service, retweet.itDefining featuresunique feature present in all retweet.itDistributing malwareDefining featuresOne difference from other campaigns is this use of redirects to mask the landing page(bit.ly intermediate malware landing site)Nested URL shortening2011/3/2229Blacklist Performance2011/3/2230
figure 630Blacklist Performance(cont.)2011/3/2231
ConclusionThis paper presents the first study of spam on Twitter including spam behavior, clickthrough, and the effectiveness of blacklists to prevent spam propagationBy measuring the clickthrough of these campaigns, we find that Twitter spam is far more successful at coercing users into clicking on spam URLs than email, with an overall clickthrough rate of 0.13%.If blacklists were integrated into Twitter, they would protect only a minority of usersURLs posted to the site must be crawled to unravel potentially long chains of redirects, using the final landing page for blacklisting.2011/3/2232