Upload
anupama-aggarwal
View
74
Download
1
Embed Size (px)
DESCRIPTION
With the advent of online social media, phishers have started using social networks like Twitter, Facebook, Foursquare to spread phishing scams. Twitter is an immensely popular micro-blogging network where people post short messages of 140 characters called tweets. It has over 100 million active users who post about 200 million tweets everyday. Because of this vast information dissemination, phishers have started using Twitter as a medium to spread phishing. It is also difficult to detect phishing on Twitter unlike emails because of the quick spread of phishing links in the network, short size of the content, and use of URL obfuscation to shorten the URL to meet the requirement of 140 character tweet limit. Our technique, PhishAri, detects phishing on Twitter in realtime. We use Twitter specific features along with URL features to detect whether a tweet posted with a URL is phishing or not. Some of the Twitter specific features we use are tweet content and its characteristics like length, hashtags and mentions. Other Twitter features used are the characteristics of the Twitter user posting the tweet such as age of the account, number of tweets and the follower-followee ratio. These twitter specific features coupled with URL based features prove to be a strong mechanism to detect phishing tweets. We use machine learning classification techniques and detect phishing tweets with an accuracy of 92.52%. We have deployed our system for end-users by providing an easy to use Chrome browser extension. The extension works in realtime and classifies a tweet as phishing or safe when it appears in Twitter timeline of a user. In this research, we show that we are able to detect phishing tweets at zero hour with high accuracy which is much faster than public blacklists and as well as Twitter's own defense mechanism to detect malicious content. We also performed a quick user evaluation of PhishAri in a laboratory study to show that users like and are happy to use PhishAri in real-world. To the best of our knowledge, this is the first realtime, comprehensive, and usable system to detect phishing on Twitter.
Citation preview
Automatic Realtime Phishing Detection on
Anupama Aggarwal, Ashwin Rajadesingan,Ponnurangam Kumaraguru
1
Motivation: Some Statistics
• $520 million were lost worldwide from phishing attacks in 2011 alone. (RSA Report)
• In 2012, around 20% of all phishing attacks targeted Facebook
• Social network phishing has jumped 221% attacks during Q1 of 2012
2
Phishing Detection on OSM: Current State-of-Art
3
• Offline Spam Characterization & Detection Studies
• No characterization of phishing on OSM
• Lack of Realtime detection mechanisms
• Absence of end-user deployed systems
• Dependence on Spam/Phishing Blacklists
What Did We Do to Fill the Gap?
• Built a mechanism to Automatically detect phishing on Twitter in Realtime
• No dependency on Blacklists
• Deployed end-user system for Twitter users - Chrome Extension
4
Twitter 101
5
Hey, I am in Puerto Rico
attending @APWG eCrime research
Talking about #phishing on OSN
Tweets<140 char
Earn Money #help #moneyhttp://bit.ly/Pw637z
Twitter 101
6
Hey, I am in Puerto Rico
attending @APWG eCrime research
Talking about #phishing on OSN
Earn Money #help #moneyhttp://bit.ly/Pw637z
@Tag
#Tag
URL in Tweet
To mention/reply to a Twitter user
To mention a topic
To link external media
Twitter 101
7
attending @APWG eCrime research
I’ll follow Grey1!
I’ll follow Grey2!
We’ll follow Blue!
Followers
Followees
attending @APWG eCrime research
Retweet (RT)
Nice! I’ll share this tweet in my network!
Twitter 101
8
attending @APWG eCrime research
I’ll follow Grey1!
I’ll follow Grey2!
We’ll follow Blue!
Nice! I’ll share this tweet in my network!
Followers
Followees
attending @APWG eCrime research
Retweet (RT)
Twitter Timeline
Tweets by FolloweesRetweets by Followees
Tweets by SelfRetweets by Self
Tweets with @Blue
@Blue
Challenges of PhishingDetection on Twitter
• Only 140 Characters - very less information
• Use of short URLs in tweets
• 100,000 Tweets per minute - quick spread
• Phishing Blacklists are slow - not reliable
9
Our Contribution
• PhishAri: Automatic realtime phishing detection mechanism for Twitter
• More efficient than plain blacklisting method
• Better than Twitter’s own phishing detection mechanism
• Real-world implementation of the system - Chrome Extension for Twitter
10
Methodology
• Step 1: Classification Model for Phishing Detection
• Data Collection
• Feature Extraction
• Classification
• Step 2: Realtime end-user Interface
• Using pre-trained classification model
• Chrome Browser Extension
11
Data Collection
12
Wait for 3 days
• 1,589 Phishing Tweets
• 903 Unique phishing URLs
• URL Features - Length, number of dots, characters, redirections
• WHOIs Features - domain name, ownership period
• Tweet Features - Number of #tags, @mentions, length, trending topics
• Network Features - Follower/Followee ratio, Age of account, Number of Tweets
13
Features Used
Classification Results
14
EvaluationMetric Naive Bayes Decision
TreeRandom Forest
Accuracy 87.02% 89.28% 92.52%
Precision(Phishing)
89.21% 88.05% 95.24%
Precision(Safe)
92.12% 94.15% 97.23%
Recall(Phishing)
68.32% 74.51% 92.21%
Precision(Safe)
85.68% 89.20% 95.54%
Evaluation
• Comparison with Blacklists
• 80.6% more phishing tweets detected by PhishAri at zero hour which were caught by blacklists after 3 days.
• Comparison with Twitter’s defense mechanism
• 84.6% more phishing tweets detected by PhishAri at zero hour which were marked as suspicious by Twitter after 3 days
15
Time Evaluation
• Used Intel Xeon 16 core Ubuntu server with 2.67 GHz processor and 32 GB RAM
• Multiprocessing Modules for faster processing
• Time required for the feature extraction & classification of a tweet is a maximum of 0.522 seconds (Min: 0.167 sec, Avg: 0.425 sec, Median 0.384 sec)
16
Text Analysis
17
Legitimate Tweets Phishing Tweets
PhishAri: RESTful API
• Use above classification model to create a RESTful API
• POST requests can be made to API to query a tweet
• Pre-trained classifier model used for classification of new tweets
18
PhishAri Chrome Extension
19
• Red / Green Indicators in front of Tweets with URLs
• Detects phishing tweets on
• User Timeline
• Twitter search results
• Profile of other users
• DMs (Limited as for now)
20
PhishAri Chrome Extension
21
Demo
How Extension Works?
22
• Integration of API with the Browser Extension
PhishAri Extension: User Experience and Statistics
• 78 Active Users
• User study shows that -
• users want support for other browsers, mobile apps
• found useful to use
• more robustness desired
23
• “Phish” + “Ari” = Realtime Automatic Detection
• 92.52% Accuracy with Random Forest Classifier
• Efficient - takes only 0.522 seconds for indicator to appear
• No dependency on Blacklists
• Faster than Blacklists
• Faster than Twitter’s own detection mechanism
24
Conclusion
• Backend database for faster lookup
• Increase the scope of PhishAri from public to all tweets
• Increase response time of PhishAri and appearance of indicators
• Support for other browsers and mobile apps
25
Future Work
Thank You!
26
Questions?Suggestions?