Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Towards Reliable Traffic Classification UsingVisual Motifs
Wilson Lian1 John McHugh1,2 Fabian Monrose1
1University of North Carolina at Chapel Hill
2RedJack, LLC
FloCon 2010
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Overview
Background
Visual Motifs
Traffic Classification
Evaluation
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Motivation
Internet
Network Administrator
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Motivation
GET /index.ht...
d3b07384d113e...
d41d8cd98f00b...
7d8ad5cb9c940...
MAIL FROM: foo@...
d41d8cd98f00b...
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Goals
Port 22
36fd6d8c3f5af4...Port 22
fc2394c1a922...Port 22
f4d6d8c3f5a36...Port 22
222394c1a9fc...Port 22
5ad6d8c3ff436...Port 22
a92394c122fc...
Port 25
MAIL FROM: [email protected] 25
f98698466c3ef...Port 25
ef8698466c3f9...Port 25
DATA\r\nSubject: fo...Port 25
f98698466c3ef...
Port 80
b314caafaa3e...Port 80
b314caafaa3e...Port 80
POST /login.ph...Port 80
b314caafaa3e...Port 80
POST /AuthSv...Port 80
GET /index.ht...
Port 1214
113edec49eaa...Port 1214
006f7b3db8f4f...Port 1214
aa3edec49e11...Port 1214
4f006f7b3db8f0...Port 1214
9e3edec4aa11...Port 1214
b8006f7b3d4ff0...
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Assumptions
Reliable transport via TCP
Stream Cipher
No access to payloadLength preservation
Negligible packet loss & retransmission
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Related Work
Scatter (and other) Plots for Visualizing User Profiling Dataand Network Traffic, Goldring 2004.
Using Visual Motifs to Classify Encrypted Traffic, Wright etal. 2006
Intelligent Classification and Visualization of Network ScansMuelder et al. 2008.
FloVis: A Network Security Visualization Framework, Taylor2009.
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Timeline Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Unigram Heatmaps
Image credit: Wright et al. 2006
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Bigram Heatmaps
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Heatmap Construction
Time
SYN 48 bytes
SYN-ACK 48 bytes
ACK 40 bytesHTTP Request 891 bytes
1500 bytes
40 bytes
40 bytes
1500 bytes270 bytes40 bytes
Client Server
48-4840
891-40
-270-1500
40-1500
40
(48, -48)(-48, 40)(40, 891)(891, -40)(-40, -270)
(-270, -1500)(-1500, 40)(40, -1500)(-1500, 40)
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Heatmap Construction
(48, -48)(-48, 40)(40, 891)(891, -40)(-40, -270)
(-270, -1500)(-1500, 40)(40, -1500)(-1500, 40)
(40, 891)
(48, -48)
(-48, 40)(40, 891)
(891, -40)(-40, -270)(-270, -1500)
(-1500, 40)
(40, -1500)
(-1500, 40)
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Heatmap Construction
(48, -48)(-48, 40)(40, 891)(891, -40)(-40, -270)
(-270, -1500)(-1500, 40)(40, -1500)(-1500, 40)
(40, 891)
(48, -48)
(-48, 40)(40, 891)
(891, -40)(-40, -270)(-270, -1500)
(-1500, 40)
(40, -1500)
(-1500, 40)3/9 = 33.3%
2/9 = 22.2%
1/9 = 11.1%
3/9 = 33.3%
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Heatmap Construction
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Bigram Heatmaps
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Modeling Protocol Behavior
(40, 891)
(48, -48)
(-48, 40)(40, 891)
(891, -40)(-40, -270)(-270, -1500)
(-1500, 40)
(40, -1500)
(-1500, 40)3/9 = 33.3%
2/9 = 22.2%
1/9 = 11.1%
3/9 = 33%
1
3 4
2
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Modeling Protocol Behavior
0 1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bin
Prob
abili
ty
.333
.111
.222
.333
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Comparing Protocol Models
1 2 3 4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bin
Prob
abili
ty
.333
.100 .111
.222
.333
.700
.150
.050
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Comparing Protocol Models
Atotal =n∑
k=1
Ak
Btotal =n∑
k=1
Bk
ScoreA↔B =n∑
i=1
∣∣∣∣ Ai
Atotal− Bi
Btotal
∣∣∣∣=
1
Atotal · Btotal
n∑i=1
|Ai · Btotal − Bi · Atotal |
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Comparing Protocol Models
50 1 2 3 4
Prob
abili
ty
.333
.100 .111
.222
.333
.700
.150
.050
Score = .233+.589+.072+.283 = 1.177
|.333-.100| = .233
|.111-.700| = .589
|.333-.050| = .283
|.222-.150| = .072
Diff
eren
ce
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Comparing Protocol Models
1 2 3 4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bin
Prob
abili
ty
.333
.400
.111
.222
.333
.150 .150
.300
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Comparing Protocol Models
50 1 2 3 4
-0.2
-0.1
-0
0.1
0.2
0.3
0.4
0.5
0.6
0.7Pr
obab
ility
.333
.400
.111
.222
.333
.150 .150
.300
Score = .067+.039+.072+.033 = .211
|.333-.400| = .067
|.111-.150| = .039 |.333-.300| = .033
|.222-.150| = .072
Diff
eren
ce
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Classifying Samples: Easy as 1-2-3
1 Create training models for desired protocols
2 Build distribution for sample network trace
3 Find training model with lowest difference score
ScoreA↔B =n∑
i=1
∣∣∣∣ Ai
Atotal− Bi
Btotal
∣∣∣∣=
1
Atotal · Btotal
n∑i=1
|Ai · Btotal − Bi · Atotal |
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Evaluation
How much traffic must be collected for:
TrainingTesting
Precision?
true positives
true positives + false positives
Recall?
true positives
true positives + false negatives
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Data
CRAWDAD Dataset
Weekdays: January 19, 2004 – February 6, 2004
Ports with sufficient traffic
≥ 1M packets0.3% of ports → 95.21% of packets
Keep top 10 ports by number of sessions observed
No ground truth
Total Packets 1.3 Billion
Traffic Volume 707 GB
Observed Ports 64,214
Sessions 5.2 Million
Port 80 Sessions 1.7 Million
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Methodology
Trial :=1 Randomly sample some percentage of available data for each
port and train classifier2 Randomly sample some number of the remaining data points
for each port and create testing samples3 Classify testing samples
50 Trials
80 110 445
TrainingData
TestingData
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Training Size Selection
160 2 4 6 8 10 12 14
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Training Sample Size (%)
Aver
age
Reca
ll
.952
.884
Average Recall for Varying Training Size
6.8% improvement
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Training Size Selection
160 2 4 6 8 10 12 14
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Training Sample Size (%)
Aver
age
Prec
isio
n
.887
.953
Average Precision for Varying Training Size
6.6% improvement
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Testing Size Selection
0 10,000 20,000 30,000 40,000 50,000
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Testing Sample Size (Data Points)
Aver
age
Reca
ll
.898
.926
.954
Average Recall for Varying Testing Size
5.6% improvement
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Testing Size Selection
0 10,000 20,000 30,000 40,000 50,000
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Testing Sample Size (Data Points)
Aver
age
Prec
isio
n
.961
.914
4.7% improvement
Average Precision for Varying Testing Size
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Results
50 Trials
15% TrainingSet Size
50,000 DataPoints TestingSet Size
96.5% Precision
96.0% Recall
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Classification Confidence Threshold
Goal: Eliminate close calls
Require 1st place candidate to lead 2nd place by certainamount to make decision
Standard deviation of scores
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Methodology v2.0
Randomly sample some percentage of available data for eachport and train classifier
Randomly sample some number of the remaining data pointsfor each port and create testing samples
Attempt to classify testing samples
If all testing samples reach threshold, done.If any testing sample fails, rebuild testing samples and tryagain.
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Classification Confidence
50 Trials
5% Training SetSize
35,000 DataPoints TestingSet Size
1.0 LeadThreshold
96.9% Precision
96.6% Recall
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Classification Confidence
0 0.25 0.5 0.75 1 1.25
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
5,500,000
Lead Threshold
Test
ing
Sess
ions
Sam
pled
1.0 M 1.0 M 1.2 M
2.4 M
Testing Session Sampled For 50 Trials35,000 Data Point Testing Samples
5.4 M
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Ground Truth Testing
MIT Lincoln Labs DARPA Data50 trials, 5% training sample size, 35,000 data point testingsample size, 1.25 lead threshold
Precision: 98.3%
Recall: 98.0%
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Results
50 Trials
5% Training SetSize
35,000 DataPoints TestingSize
1.25 LeadThreshold
Ground Truth
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Evasion
One might attempt to thwart our technique by padding all packetsto MTU.Reduces problem to 4-quadrant problem.Can still make decisions based on relative prevalence of eachquadrant.
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Current/Future Work
Packet loss/re-transmission may cause unpredictable results
On-line classification
Training and testing from separate datasets
UDP
Subcategorization
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Conclusion
Modeling protocol behavior using only packet size, direction,and order
Resistant to encryption and padding
Average precision and recall > 97%
Quick and reliable traffic inspection
Useful for pre-screening traffic for deeper analysis
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs
BackgroundVisual Motifs
Traffic ClassificationEvaluation
Questions?
Thanks for listening.Q & A
Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs