35
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13 th , 2009 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection

Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee

  • Upload
    corin

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13 th , 2009. - PowerPoint PPT Presentation

Citation preview

Page 1: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee

College of Computing, Georgia Institute of Technology

USENIX Security '08Presented by Lei Wu

April 13th, 2009

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent

Botnet Detection

Page 2: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Motivation and Background

System description

Experimental analysis

Conclusion

Outline

Page 3: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Motivation and Background

System description

Experimental analysis

Conclusion

Outline

Page 4: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

This paper proposes a general detection framework BotMiner that is independent of botnet Command and Control (C&C) protocol and structure, and requires no a priori knowledge of botnets

Motivation and Background

Page 5: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

BotA malware instance that runs

autonomously and automatically on a compromised computer (zombie) without owner’s consent

Botnet: network of bots controlled by criminalsDefinition: “A coordinated group

of malware instances that are controlled by a botmaster via some C&C channel”

25% of Internet PCs are part of a botnet!

Motivation and Background

Page 6: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Why BotMiner?Traditional methods are not enough.

Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …

Motivation and Background

Page 7: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Cluster similar communication traffic and similar malicious traffic, and performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns

Basic idea

Page 8: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Revisit the definition of Botnet again“A coordinated group of malware instances that are

controlled by a botmaster via some C&C channel”We need to monitor two planes

C-plane (C&C communication plane): “who is talking to whom”

A-plane (malicious activity plane): “who is doing what”Horizontal correlation

Bots are for long-term useBotnet: communication and activities are

coordinated/similar

How does it work?

Page 9: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Motivation and Background

System description

Experimental analysis

Conclusion

Outline

Page 10: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Architecture overview

Page 11: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Simplified Architecture

A-PlaneMonitor + Clustering

C-PlaneMonitor + Clustering

Cross-Plane Correlation

Network

TrafficReport

Page 12: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

A-Plane

A-PlaneMonitor + Clustering

C-PlaneMonitor + Clustering

Cross-Plane Correlation

Network

TrafficReport

Page 13: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Log information on who is doing whatMonitor four types of malicious activities

Scanning SpammingBinary downloadingExploit attempts

Based on Snort, adapt some existing intrusion detection techniques (e.g. BotHunter, PEHunter)

A-Plane Monitor

Page 14: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Two-layer clustering on activity logs

A-Plane Clustering

Page 15: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

C-Plane

A-PlaneMonitor + Clustering

C-PlaneMonitor + Clustering

Cross-Plane Correlation

Network

TrafficReport

Page 16: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Capture network flows and records information on who is talking to whom

Adapt an efficient network flow capture tool named fcapture, which is based on Judy library

Each flow record contains the following information: time, duration, source IP, source port, destination IP, destination port, and the number of packets and bytes transferred in both directions

C-Plane Monitor

Page 17: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Architecture of the C-plane clustering

First two steps are not critical, however, they can reduce the traffic workload and make the actual clustering process more efficient

In the third step, given an epoch E (typically one day), all TCP/UDP flows that shares the same protocol, source IP, destination IP and port, are aggregated into the same C-flow

C-Plane Clustering

Page 18: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Extract a number of statistical features from each C-flow and translate them into d-dimensional pattern vectors compute the discrete sample distribution of (currently) four random variablesthe number of flows per hour (fph)the number of packets per flow (ppf)the average number of bytes per packets (bpp)the average number of bytes per second (bps)

Feature Extraction

Temporal related statistical distribution

information: FPH and BPS

Spatial related statistical distribution

information: BPP and PPF

Page 19: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Compute the overall discrete sample distribution of the random variable considering all the C-flows in the traffic for an epoch E, then describe that random variable (approximate) distribution as a vector of 13 elements.

Apply the same algorithm for all four random variables, and therefore we map each C-flow into a pattern vector of d = 52 elements

Feature Extraction Algorithm

Page 20: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee
Page 21: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Why multi-step?Coarse-grained clustering

Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8)

Efficient clustering algorithm: X-means

Fine-grained clusteringUsing full feature space

(13*4=52)

Two-step Clustering of C-flows

Page 22: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Cross-Plane Correlation

A-PlaneMonitor + Clustering

C-PlaneMonitor + Clustering

Cross-Plane Correlation

Network

TrafficReport

Page 23: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Botnet score s(h) for every host hh will receive a high score if it has performed

multiple types of suspicious activities, and if other hosts that were clustered with h also show the same multiple types of activities

Similarity score between host hi and hjTwo hosts in the same A-clusters and in at least one

common C-cluster are clustered together

Cross-Plane Correlation

Page 24: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Use the Davies-Bouldin (DB) validation index to find the best dendrogram cut, which produces the most compact and well separated clusters

Hierarchical clustering

Page 25: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Motivation and Background

System description

Experimental analysis

Conclusion

Outline

Page 26: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Data collected

Page 27: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Results

Page 28: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Motivation and Background

System description

Experimental analysis

Conclusion

Outline

Page 29: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Evading C-plane monitoring and clusteringMisuse whitelistManipulate communication patterns

Evading A-plane monitoring and clusteringVery stealthy activityIndividualize bots’ communication/activity

Evading cross-plane analysisExtremely delayed task

Limitation and Discussion

Page 30: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Related Work

Page 31: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Propose a detection framework which is independent of botnet C&C protocol and structure, and requires no a priori knowledge of specific botnets

Build a prototype system based on the general detection framework, and evaluate it with multiple real-world network traces including normal traffic and several real-world botnet traces

Contribution

Page 32: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Offline systemLong time data collection and analysisNo incremental ability of analysis

The experiment is not convincing enough Only shows the system performance on day-2,

what about the other days?Not a real “real world experiment”

Weakness

Page 33: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Fast detection and online analysis

More efficient clustering, more robust features

More experiments in different and real network environment

Improvement

Page 34: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Sides of the paper in USENIX Security’08 http://faculty.cs.tamu.edu/guofei/paper/botMiner-Security08-slides.pdf

Sad Planet, Kayak Adventure. Botnets on the Rampage http://birdhouse.org/blog/2006/11/16/botnets-on-the-rampage/

Beware of Potential Confickor BotNet Chaos http://thejunction.net/2009/03/25/april-1st-beware-of-potential-botnet-

chaos/ Oracle Data Mining Mining Techniques and Algorithms

http://www.oracle.com/technology/products/bi/odm/odm_techniques_algorithms.html

Reference

Page 35: Guofei Gu , Roberto  Perdisci ,  Junjie  Zhang, and  Wenke  Lee

Question?