35
A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 December 9, 2010 Spam Mitigation using Spatiotemporal Reputations from Blacklist History* * Note that for conciseness, this version omits the animation and graph annotations that existed in the actual conference version

Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

A.G. West, A.J. Aviv, J. Chang, and I. Lee

ACSAC `10 ‐ December 9, 2010

Spam Mitigation using Spatio‐temporal Reputations from Blacklist History* 

* Note that for conciseness, this version omits the animation and graph annotations that existed in the actual conference version

Page 2: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Motivation

2

• IP spam blacklists – Reactively compiled from major email providers

– Single IPs can be listed, de‐listed, re‐listed

– De‐listing policy? 

time

Spam

 sent

BL BL

d d• Blacklist/spam properties– High re‐listing rates

– Spam IPs are spatially clustered [5‐12]

– Just 20 ASes account for 42% of all spam [7]

Page 3: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA: 

3

Preventative Spatio‐Temporal Aggregation

• Traditional punishment mechanisms are reactive

ASSUM‐PTIONS: 

• Consistent behaviors (temporal) and spatial clustering

GIVEN:• User feedback and spatial grouping functions

PRODUCE:• An extended list of principals ‐‐ thought to be bad now

• The preventative identification of malicious users

PROBLEM

OUTPUT

Page 4: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

4

PreSTA Model &

Reputation Fundamentals

Page 5: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Reputation Algs.

5

• Reputation systems: – EBay, EigenTrust [1], Subjective Logic [2]

– Use cases: P2P networks, access‐control, anti‐spam [3]

– Enum. feedbacks; distributed calculation (transitivity)

• Recommendations: Voting, product suggestions

S

UBCPos: 2

Neg: 2Pos: 6Neg: 2

Pos:  1Neg: 0

Pos: 5Neg: 5

A

Users Objects

B

1

2+‐+

Page 6: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA Reputation

6

• PreSTA‐style reputation: – No positive feedback ‐‐ time‐decay to heal.

– Centralized and trusted feedback provider

– Quantify binary observations into reputations

time

maliciousne

ss

maliciousne

ss

time

Page 7: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA Reputation

7

AMERICA

User‐Space

TEXASAUS

Reputation Value for AUS

EntityBehaviorHistory

BroadBehaviorHistory

BroaderBehaviorHistory

Combination

Rep.Rep. Rep.

Single‐entitycalculation and rep. values are status quo (temporal)

Page 8: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Sample calc. (1)

8

Time Behavior Rep.

TS1

TS2

TS3

TS4

TS5

TS6

Calculate

User mis‐behaves

Calculate

User mis‐behaves

Calculate

Calculaterepu

tatio

ntime

0

1

Page 9: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Sample calc. (2)

9

Time Behavior Rep.

TS1

TS2

TS3

TS4

TS5

TS6

Calculate

User mis‐behaves

Calculate

User mis‐behaves

Calculate

Calculaterepu

tatio

ntime

0

1

Page 10: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Sample calc. (3)

10

Time Behavior Rep.

TS1

TS2

TS3

TS4

TS5

TS6

Calculate

User mis‐behaves

Calculate

User mis‐behaves

Calculate

Calculaterepu

tatio

ntime

0

1

Page 11: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Sample calc. (4)

11

Time Behavior Rep.

TS1

TS2

TS3

TS4

TS5

TS6

Calculate

User mis‐behaves

Calculate

User mis‐behaves

Calculate

Calculaterepu

tatio

ntime

0

1

Page 12: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA (spatial)

12

• Why spatial reputation:

• Exploit homophily

• Overcome the cold‐start problem (Sybil [4])

• Grouping functions define group membership

• Multiple groups/dims.

• Geo‐based/abstract

AMERICA

User‐Space

TEXASAUS

EntityBehaviorHistory

BroadBehaviorHistory

BroaderBehaviorHistory

Reputation Value for AUS

Rep.Rep. Rep.

Combination

Page 13: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA (spatial)

13

AMERICA

User‐Space

TEXASAUS

EntityBehaviorHistory

BroadBehaviorHistory

BroaderBehaviorHistory

Reputation Value for AUS

Combination

Rep.Rep.Rep.

Σall member rep.

group size

rep(group) =

TEXAS

Austin

DallasHouston

…..

Page 14: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA + Wikipedia[19]

14

PURPOSE: Detect vandalism edits to Wikipedia

• Vandal editors are probably repeat offenders• Frequently vandalized articles may be future targetsTEMPORAL

• Group editors by country  (geographical space)• Group articles by category (topical space)SPATIAL

• Gleamed from administrative “undo” functionFEEDBACK

• Live tool  ‐‐ STiki [20] ‐‐ 25,000 vandalisms undoneSUCCESS

Page 15: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

15

Applying PreSTAto spam mitigation

Spatio‐temporal props. of spam email

Page 16: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Temporal Props.

16

Of IPs removed from a popular blacklist, 26% are re‐listed within 10 days, and 47% are relisted within ten weeks.

Consistent listing length permits normalization

Page 17: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Spatial Props.

17

Page 18: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

18

Applying PreSTAto spam mitigationImplementing the model

Page 19: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Grouping Functions

19

• The IANA and RIR granularity are too broad to be of relevant use

IANA/RIR

• What AS(es) are broadcasting IP?• An IP may have 0, 1, or 2+ homesAS

• What  is /24 (256 IP) membership?• Estimation of subnetBLOCK

• Static IP addresses• Due to DHCP; multiple inhabitantsIP

AS|1000's| IPs

SubnetBLK‐Heuristic|768| IPs

IP‐level|1| IP

Page 20: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

PreSTA Workflow

20

Mail Body

Source IP

AS

BLOCK

IP

AS‐REP

BLK‐REP

IP‐REP

SpatialMapping

Plot into3‐D SpaceClassify (SVM)

SPAM or HAM

REPALG

Time‐Decay FN

BehaviorHistories

Page 21: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Data Sources

21

• Subscribe to Spamhaus [13] provider• Process diff between versions into DB

FEEDBACK

• Use RouteViews [14] data to map IP→ASAS‐MAP

• 5 months: 31 mil. UPenn mail headers• Proofpoint [15] for ground truth

EMAIL

Page 22: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

22

Applying PreSTAto spam mitigation

Results

Page 23: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Big‐Picture Result

23

Blacklist alone:91% avg.

PreSTA + Blacklist:94% avg.

Page 24: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Big‐Picture Result

24

Captures up to 50% of spam mails not caught by blacklist

Would have blocked an addl. 650k spam emails

Page 25: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Case Studies (1)

25

Temporal (single IP) example.

Offender sent 150 spam emails  and likely monitored own BL status [16].

Page 26: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Case Studies (2)

26

Temporal and spatial example (AS granularity).

Spam campaign involving  4,500 IP addresses

Page 27: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Other Results (1)

27

Page 28: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Other Results (2)

28

Page 29: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

29

• Intended Purpose – NLP is superior, but computationally expensive

– Initial and lightweight filter

• Scalability– Heavy caching

– All AS‐level reputations are cached offline

– 43%  cache hit rate for IPS, 57% for blocks 

– Handles 500,000+ emails/hour (commodity)

– One month’s BL history = 1 GB

Scalability

Page 30: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Gamesmanship

30

• Avoid FEEDBACK in the first place

• Temporal evasion? Nothing but patience

• Spatial evasion– Move around (reduces IP utility; increases cost)

– Prefix‐hijacking. Fortunately, mostly seen from bogon space [7]. Assign to a special “bad AS”

• Don’t let individuals control group size (sizing attack) and maintain persistent IDs (Sybil [4]) 

Page 31: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Contributions

31

• Formalization of a predictive spatio‐temporal reputation model (PreSTA)– A dynamic access‐control solution for: email, Wikipedia, web‐service mash‐ups, BGP routing

• Implementation of PreSTA for use as a lightweight initial filter for spam email– Blocking up to 50% of spam evading blacklists

– Extremely consistent blockage rates

– Scalability of 500k+ emails/hour

Page 32: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

References

32

[1]   Kamvar, S.D. et al. The EigenTrust Algorithm for Reputation Management in P2P Systems. In WWW, 2003.

[2]   Jøsang, A. et al. Trust Network Analysis with Subjective Logic. In 29th Australasian Computer Science Conference, 2006. 

[3]   Alperovitch, D. et al. Taxonomy of Email Reputation Systems. In Distributed Computing Systems Workshops, 2007.

[4]   Douceur, J. The Sybil Attack. In 1st IFTPS, March 2002.

[5] Krebs, B. Host of Internet Spam Groups is Cut Off. [online] http://www.washingtonpost.com/wp‐dyn/content/article/2008/11/12/AR2008111200658.html, November 2008 (McColo shut‐down).

[6]   Krebs, B. FTC Sues, Shuts Down N. California Web Hosting Firm. [online] http://voices.washingtonpost.com/securityfix/ 2009/06/ftc_sues_shuts_down_n_calif_we.html, June 2009 (3FN shut‐down).

[7]  Hao, S. et al. Detecting Spammers with SNARE: Spatio‐temporal Network…. In USENIX Security Symposium, 2009.

[8]  Ramachandran, A. et al. Understanding the Network‐level Behavior of Spammers. In SIGCOMM, 2006.

[9]   Ramachandran, A. et al. Filtering Spam with Behavioral Blacklisting. In CCS, 2007.

[10] Qian, Z. et al. On Network‐level Clusters for Spam Detection. In NDSS, 2010.

[11] Venkataraman, S. et al. Tracking Dynamic Sources of Malicious Activity at Internet Scale. In NIPS, 2009.

[12] Venkataraman, S. et al. Exploiting Network Structure for Proactive Spam Mitigation. In USENIX Security Symposium, 2007.

[13] Spamhaus Project. [online] http://www.spamhaus.org

[14] Univ. of Oregon Route Views. [online] http://www.routeviews.org

[15] Proofpoint, Inc. [online] http://www.proofpoint.com

[16] Ramachandran, A. et al. Revealing bot‐net membership using DNSBL Counter‐Intelligence. In USENIX Security Workshops: Steps to Reducing Unwanted Traffic on the Internet, 2006.

[17] IronPort Systems Inc.. Reputation‐based Mail Flow Control. White paper for the SenderBase system, 2002.

[18] Symantec Corporation. IP Reputation Investigation. [online] http://ipremoval.sms.symantec.com

[19] West, A. et al.. Detecting Wikipedia Vandalism via Spatio‐temporal Analysis of Revision Metadata. In EUROSEC, 2010.

[20] West, A. STiki: An Anti‐vandalism Tool for Wikipedia. [online] http://en.wikipedia.org/wiki/Wikipedia:STiki

Page 33: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

33

Additional slides

Page 34: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Decay Function

34

• Half‐life function is straightforward exponential decay using 10‐day half life.

• Half‐life was arrived at empirically (sum of CDF areas). 

• TAKEAWAY:  Long punishments lead to few false‐positives.  Don’t led bad guys off the hook too easily!

Page 35: Spam Mitigation using Spatio temporal Reputations from ...A.G. West, A.J. Aviv, J. Chang, and I. Lee ACSAC `10 ‐December 9, 2010 Spam Mitigation using Spatio‐temporal Reputations

Related Work

35

• SNARE (GA‐Tech, Hao et al. [7])– Identifies 13 spatio‐temporal metrics → ML classifier

• Temporally weak aggregation (i.e., mean and variance)

• “Doesn’t need blacklists” → Neither does PreSTA

– Not scalable. PreSTA uses just 1 metric.

• Similar commercial services– Symantec [17] and Ironport SenderBase [18]

– Closed source, but binary APIs indicate correlation