23
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

Embed Size (px)

Citation preview

Page 1: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

Wenke Lee and Nick FeamsterGeorgia Tech

Botnet and Spam Detection in High-Speed Networks

Page 2: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

Overview

• Problem: Botnet and Spam Detection in high-speed networks

• Common theme: Examine network-level properties and build classifier

• Two systems: BotMiner and SNARE– Overview– Integration with SMITE architecture

• Current integration status and plan

Page 3: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

3

BotMiner: Structure and Protocol Independent

• Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …

bot

bot

bot

bot

bot

C&C

bot

bot

bot

bot

bot

bot

(a) (b)

Page 4: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

4

Definition of a Botnet

• “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel”– Hosts that have similar C&C-like traffic and similar

malicious activities

• We need to monitor two planes– C-plane (C&C communication plane): “who is talking

to whom”– A-plane (malicious activity plane): “who is doing what”

Page 5: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

5

BotMiner Architecture

Scan

Spam

A-Plane Monitor

BinaryDownloading

C-Plane Monitor

Flow Log

C-PlaneClustering

NetworkTraffic

Exploit

...

Activity Log

A-PlaneClustering

Cross-PlaneCorrelation

Reports

SensorsAlgorithms

Correlation

Page 6: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

6

Cross-plane Correlation

• Botnet score s(h) for every host h– A host has higher score if it is in more activity

clusters and in both activity and communication clusters

– A host with a high score is a bot

• Similarity score between bot host hi and hj

– Two hosts in the same A-clusters and in at least one common C-cluster are clustered together

– Each cluster is a bot

Page 7: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

7

SMITE Integration: BotMiner

Page 8: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

8

• Sensors– Feature extraction for C-Plane and A-Plane

clustering– C-Flow temporal and statistical features

(SMITE flow analysis sensors)• Counting packets and connections between each

pair of endpoints: bytes per second, flows per hour, bytes per packet, packets per flow

– A-Plane header and payload features (SMITE flow sensors, AVIES)

• Destination IP addresses and ports, payload bytes/strings

Integrating BotMiner and SMITE

Page 9: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

9

• Algorithms– C-plane clustering

• Multi-step clustering based on statistical and temporal C-flow features

– A-plane clustering• Based on activity-specific similarity measures: e.g., spread of

destination IP addresses and ports, and payload similarity• Analyze additional alerts from other detection algorithms

– Bot scoring and botnet clustering methods• Scoring based on participation in C-plane and A-plane

clusters• Clustering based on common memberships in the C-plane

and A-plane clusters

Integrating BotMiner and SMITE

Page 10: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

10

• Cross-plane correlation – Botnet detection involves both vertical and horizontal

analysis/clustering:• Vertical: what activities a host has been involved in

– Bot detection

• Horizontal: what other hosts have similar (vertical) behavior patterns

– Botnet detection

Integrating BotMiner and SMITE

Page 11: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

11

• Filter email based on how it is sent, in addition to simply what is sent.

• Network-level properties are less malleable– Hosting or upstream ISP (AS number)– Membership in a botnet (spammer, hosting

infrastructure)– Network location of sender and receiver– Set of target recipients

Network-Based Spam Detection

Page 12: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

12

Finding the Right Features

• Goal: Sender reputation from a single packet header?– Low overhead– Fast classification– In-network– Perhaps more evasion resistant

• Key challenge– What features satisfy these properties and can

distinguish spammers from legitimate senders?

Page 13: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

13

Network-Level Features

• Single-Packet– AS of sender’s IP– Distance to k nearest senders– Status of email service ports– Geodesic distance– Time of day

• Single-Message– Number of recipients– Length of message

• Aggregate (Multiple Message/Recipient)

Page 14: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

14

Sender-Receiver Geodesic Distance

90% of legitimate messages travel 2,200 miles or less

Page 15: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

15

Density of Senders in IP Space

For spammers, k nearest senders are much closer in IP space

Page 16: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

16

Local Time of Day at Sender

Spammers “peak” at different local times of day

Page 17: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

17

Combining Features: RuleFit

• Put features into the RuleFit classifier• 10-fold cross validation on one day of query logs

from a large spam filtering appliance provider

• Comparable performance to SpamHaus– Incorporating into the system can further reduce FPs

• Using only network-level features• Completely automated

Page 18: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

18

Sample Results

False positives reduced to 0.14%

Page 19: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

19

Integrating SNARE and SMITE

Sensors Algorithms

Page 20: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

20

SMITE Integration Challenges

• Sources of labeled data– SNARE requires clean sources of labeled data

for training

• Data collection– SNARE’s performance improves when behavior

can be observed across multiple domains

• Availability of external data in RTEN testbed

Page 21: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

21

SMITE Integration: Current Work

• Study pipeline architecture and code

• Modify flow-analyzer to dump 5-tuple flow information

Page 22: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

22

SMITE Integration: Step 1

• Modify flow-analyzer with SMITE team to generate 5-tuple flow information (mid-March)

• Spam/scan detection, flow aggregation in BotMiner; Spam feature extraction in SNARE (end of March)

• Clustering and correlation in BotMiner; Classifier in SNARE (end of April)

Page 23: Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

23

SMITE Integration: Step 2

• Evaluate performance of BotMiner and SNARE– How many hours to process one-day of traffic, or what is

the “lag” time between event and detection?

• Design real-time detection algorithms– A two-tier system: off-line module output lists of suspicious

hosts, and real-time module inspects all packets of these hosts; or, off-line module output clusters

• Design algorithms to handle asymmetric traffic– Cluster on each direction of traffic and cross-correlate