Upload
corin
View
58
Download
0
Tags:
Embed Size (px)
DESCRIPTION
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13 th , 2009. - PowerPoint PPT Presentation
Citation preview
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
College of Computing, Georgia Institute of Technology
USENIX Security '08Presented by Lei Wu
April 13th, 2009
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent
Botnet Detection
Motivation and Background
System description
Experimental analysis
Conclusion
Outline
Motivation and Background
System description
Experimental analysis
Conclusion
Outline
This paper proposes a general detection framework BotMiner that is independent of botnet Command and Control (C&C) protocol and structure, and requires no a priori knowledge of botnets
Motivation and Background
BotA malware instance that runs
autonomously and automatically on a compromised computer (zombie) without owner’s consent
Botnet: network of bots controlled by criminalsDefinition: “A coordinated group
of malware instances that are controlled by a botmaster via some C&C channel”
25% of Internet PCs are part of a botnet!
Motivation and Background
Why BotMiner?Traditional methods are not enough.
Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
Motivation and Background
Cluster similar communication traffic and similar malicious traffic, and performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns
Basic idea
Revisit the definition of Botnet again“A coordinated group of malware instances that are
controlled by a botmaster via some C&C channel”We need to monitor two planes
C-plane (C&C communication plane): “who is talking to whom”
A-plane (malicious activity plane): “who is doing what”Horizontal correlation
Bots are for long-term useBotnet: communication and activities are
coordinated/similar
How does it work?
Motivation and Background
System description
Experimental analysis
Conclusion
Outline
Architecture overview
Simplified Architecture
A-PlaneMonitor + Clustering
C-PlaneMonitor + Clustering
Cross-Plane Correlation
Network
TrafficReport
A-Plane
A-PlaneMonitor + Clustering
C-PlaneMonitor + Clustering
Cross-Plane Correlation
Network
TrafficReport
Log information on who is doing whatMonitor four types of malicious activities
Scanning SpammingBinary downloadingExploit attempts
Based on Snort, adapt some existing intrusion detection techniques (e.g. BotHunter, PEHunter)
A-Plane Monitor
Two-layer clustering on activity logs
A-Plane Clustering
C-Plane
A-PlaneMonitor + Clustering
C-PlaneMonitor + Clustering
Cross-Plane Correlation
Network
TrafficReport
Capture network flows and records information on who is talking to whom
Adapt an efficient network flow capture tool named fcapture, which is based on Judy library
Each flow record contains the following information: time, duration, source IP, source port, destination IP, destination port, and the number of packets and bytes transferred in both directions
C-Plane Monitor
Architecture of the C-plane clustering
First two steps are not critical, however, they can reduce the traffic workload and make the actual clustering process more efficient
In the third step, given an epoch E (typically one day), all TCP/UDP flows that shares the same protocol, source IP, destination IP and port, are aggregated into the same C-flow
C-Plane Clustering
Extract a number of statistical features from each C-flow and translate them into d-dimensional pattern vectors compute the discrete sample distribution of (currently) four random variablesthe number of flows per hour (fph)the number of packets per flow (ppf)the average number of bytes per packets (bpp)the average number of bytes per second (bps)
Feature Extraction
Temporal related statistical distribution
information: FPH and BPS
Spatial related statistical distribution
information: BPP and PPF
Compute the overall discrete sample distribution of the random variable considering all the C-flows in the traffic for an epoch E, then describe that random variable (approximate) distribution as a vector of 13 elements.
Apply the same algorithm for all four random variables, and therefore we map each C-flow into a pattern vector of d = 52 elements
Feature Extraction Algorithm
Why multi-step?Coarse-grained clustering
Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8)
Efficient clustering algorithm: X-means
Fine-grained clusteringUsing full feature space
(13*4=52)
Two-step Clustering of C-flows
Cross-Plane Correlation
A-PlaneMonitor + Clustering
C-PlaneMonitor + Clustering
Cross-Plane Correlation
Network
TrafficReport
Botnet score s(h) for every host hh will receive a high score if it has performed
multiple types of suspicious activities, and if other hosts that were clustered with h also show the same multiple types of activities
Similarity score between host hi and hjTwo hosts in the same A-clusters and in at least one
common C-cluster are clustered together
Cross-Plane Correlation
Use the Davies-Bouldin (DB) validation index to find the best dendrogram cut, which produces the most compact and well separated clusters
Hierarchical clustering
Motivation and Background
System description
Experimental analysis
Conclusion
Outline
Data collected
Results
Motivation and Background
System description
Experimental analysis
Conclusion
Outline
Evading C-plane monitoring and clusteringMisuse whitelistManipulate communication patterns
Evading A-plane monitoring and clusteringVery stealthy activityIndividualize bots’ communication/activity
Evading cross-plane analysisExtremely delayed task
Limitation and Discussion
Related Work
Propose a detection framework which is independent of botnet C&C protocol and structure, and requires no a priori knowledge of specific botnets
Build a prototype system based on the general detection framework, and evaluate it with multiple real-world network traces including normal traffic and several real-world botnet traces
Contribution
Offline systemLong time data collection and analysisNo incremental ability of analysis
The experiment is not convincing enough Only shows the system performance on day-2,
what about the other days?Not a real “real world experiment”
Weakness
Fast detection and online analysis
More efficient clustering, more robust features
More experiments in different and real network environment
Improvement
Sides of the paper in USENIX Security’08 http://faculty.cs.tamu.edu/guofei/paper/botMiner-Security08-slides.pdf
Sad Planet, Kayak Adventure. Botnets on the Rampage http://birdhouse.org/blog/2006/11/16/botnets-on-the-rampage/
Beware of Potential Confickor BotNet Chaos http://thejunction.net/2009/03/25/april-1st-beware-of-potential-botnet-
chaos/ Oracle Data Mining Mining Techniques and Algorithms
http://www.oracle.com/technology/products/bi/odm/odm_techniques_algorithms.html
Reference
Question?