PhishAri: Automatic Realtime Phishing Detection on Twitter

Automatic Realtime Phishing Detection on

Twitter

Anupama Aggarwal, Ashwin Rajadesingan,Ponnurangam Kumaraguru

1

Motivation: Some Statistics

• $520 million were lost worldwide from phishing attacks in 2011 alone. (RSA Report)

• In 2012, around 20% of all phishing attacks targeted Facebook

• Social network phishing has jumped 221% attacks during Q1 of 2012

2

Phishing Detection on OSM: Current State-of-Art

3

• Offline Spam Characterization & Detection Studies

• No characterization of phishing on OSM

• Lack of Realtime detection mechanisms

• Absence of end-user deployed systems

• Dependence on Spam/Phishing Blacklists

What Did We Do to Fill the Gap?

• Built a mechanism to Automatically detect phishing on Twitter in Realtime

• No dependency on Blacklists

• Deployed end-user system for Twitter users - Chrome Extension

4

Twitter 101

5

Hey, I am in Puerto Rico

attending @APWG eCrime research

Talking about #phishing on OSN

Tweets<140 char

Earn Money #help #moneyhttp://bit.ly/Pw637z

http://bit.ly/Pw637z


Twitter 101

6

Hey, I am in Puerto Rico


Talking about #phishing on OSN

Earn Money #help #moneyhttp://bit.ly/Pw637z

@Tag

#Tag

URL in Tweet

To mention/reply to a Twitter user

To mention a topic

To link external media



Twitter 101

7


I’ll follow Grey1!


We’ll follow Blue!

Followers

Followees


Retweet (RT)

Nice! I’ll share this tweet in my network!

Twitter 101

8




We’ll follow Blue!

Nice! I’ll share this tweet in my network!

Followers

Followees


Retweet (RT)

Twitter Timeline

Tweets by FolloweesRetweets by Followees

Tweets by SelfRetweets by Self

Tweets with @Blue

@Blue

Challenges of PhishingDetection on Twitter

• Only 140 Characters - very less information

• Use of short URLs in tweets

• 100,000 Tweets per minute - quick spread

• Phishing Blacklists are slow - not reliable

9

Our Contribution

• PhishAri: Automatic realtime phishing detection mechanism for Twitter

• More efficient than plain blacklisting method

• Better than Twitter’s own phishing detection mechanism

• Real-world implementation of the system - Chrome Extension for Twitter

10

Methodology

• Step 1: Classification Model for Phishing Detection

• Data Collection

• Feature Extraction

• Classification

• Step 2: Realtime end-user Interface

• Using pre-trained classification model

• Chrome Browser Extension

11

Data Collection

12

Wait for 3 days

• 1,589 Phishing Tweets

• 903 Unique phishing URLs

• URL Features - Length, number of dots, characters, redirections

• WHOIs Features - domain name, ownership period

• Tweet Features - Number of #tags, @mentions, length, trending topics

• Network Features - Follower/Followee ratio, Age of account, Number of Tweets

13

Features Used

Classification Results

14

EvaluationMetric Naive Bayes Decision

TreeRandom Forest

Accuracy 87.02% 89.28% 92.52%

Precision(Phishing)

89.21% 88.05% 95.24%

Precision(Safe)

92.12% 94.15% 97.23%

Recall(Phishing)

68.32% 74.51% 92.21%

Precision(Safe)

85.68% 89.20% 95.54%

Evaluation

• Comparison with Blacklists

• 80.6% more phishing tweets detected by PhishAri at zero hour which were caught by blacklists after 3 days.

• Comparison with Twitter’s defense mechanism

• 84.6% more phishing tweets detected by PhishAri at zero hour which were marked as suspicious by Twitter after 3 days

15

Time Evaluation

• Used Intel Xeon 16 core Ubuntu server with 2.67 GHz processor and 32 GB RAM

• Multiprocessing Modules for faster processing

• Time required for the feature extraction & classification of a tweet is a maximum of 0.522 seconds (Min: 0.167 sec, Avg: 0.425 sec, Median 0.384 sec)

16

Text Analysis

17

Legitimate Tweets Phishing Tweets

PhishAri: RESTful API

• Use above classification model to create a RESTful API

• POST requests can be made to API to query a tweet

• Pre-trained classifier model used for classification of new tweets

18

PhishAri Chrome Extension

19

• Red / Green Indicators in front of Tweets with URLs

• Detects phishing tweets on

• User Timeline

• Twitter search results

• Profile of other users

• DMs (Limited as for now)

20

PhishAri Chrome Extension

21

Demo

How Extension Works?

22

• Integration of API with the Browser Extension

PhishAri Extension: User Experience and Statistics

• 78 Active Users

• User study shows that -

• users want support for other browsers, mobile apps

• found useful to use

• more robustness desired

23

• “Phish” + “Ari” = Realtime Automatic Detection

• 92.52% Accuracy with Random Forest Classifier

• Efficient - takes only 0.522 seconds for indicator to appear

• No dependency on Blacklists

• Faster than Blacklists

• Faster than Twitter’s own detection mechanism

24

Conclusion

• Backend database for faster lookup

• Increase the scope of PhishAri from public to all tweets

• Increase response time of PhishAri and appearance of indicators

• Support for other browsers and mobile apps

25

Future Work

Thank You!

26

Questions?Suggestions?

For any further information, please write [email protected]

precog.iiitd.edu.in

27

mailto:[email protected]

mailto:[email protected]

Education

PhishAri: Automatic Realtime Phishing Detection on Twitter