Upload
sven-krasser
View
1.426
Download
0
Tags:
Embed Size (px)
Citation preview
FINDING THE NEEDLE IN THE IP STACK
Dr. Sven Krasser
McAfee, Inc.
Session ID: RR-403
Session Classification: Intermediate
AGENDA
Data Mining – A Human Approach
English Words
Bad Behavior
What’s in a File
Conclusions
2
Data MiningA Human Approach
3
ANTHROPOMETRIC DATA
4
Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
MEASUREMENTS
5
Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
MEASUREMENTS (CONTINUED)
6
Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
250 –
200 –
150 –
100 –
60 65 70 75 80Height (in inches)
Weig
ht
(in
po
un
ds
)
HEIGHT VERSUS WEIGHT
7
250 –
200 –
150 –
100 –
60 65 70 75 80Height (in inches)
Weig
ht
(in
po
un
ds)
HEIGHT VERSUS WEIGHT (CONTINUED)
Women
Men
8
PUTTING WEIGHT AND HEIGHT INTO PERSPECTIVE
9
BEST GUESS FOR GENDER
Height (in inches)
Weig
ht
(in p
ounds)
100% male
0% female
0% male
100% female
50% male
50% female
Best G
uess
10
0.15 –
0.10 –
0.05 –
0.00 –
ONE DIMENSION ONLY
Height (in inches)
55 60 65 70 75
11
BETTER FEATURES
Buttock Circumference: “The circumference of the body measured at the level of the maximum posterior protuberance of the buttocks.”
Weig
ht
(in
po
un
ds)
800 900 1000 1100 1200
200 –
180 –
160 –
140 –
120 –
100 –
12
BEST GUESS FOR REVISED FEATURES
13
Weig
ht
(in p
ounds)
Best G
uess
Buttock Circumference
FURTHER IMPROVING THE SEPARATION
Signal to NoiseFeatures with very different distribution per class
CorrelationFeatures with low correlation
DimensionalityConsider more features at the same time
14
EMAIL DATA IN THREE DIMENSIONS
15
16
SPARSE DATA25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 1 0 3 1 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CLASSIFICATION ALGORITHMS
+ FinalVerdict
Decision Trees Decision Forests
Neural Networks Support Vector Machines
17
English WordsAnd why do they look English?
18
SOME ENGLISH WORDS
• militate
• caterwaul
• deracinate
• arrant
• concinnity
• imprecation
• vertiginous
• profuse
19
SOME ENGLISH EXPLANATIONS
• militate: to have force or influence
• caterwaul: to make a harsh cry or screech
• deracinate: to uproot
• arrant: outright; thoroughgoing
• concinnity: elegance – used chiefly of literary style
• imprecation: a curse
• vertiginous: causing dizziness; also, giddy; dizzy
• profuse: plentiful; copious
20
Source: http://dictionary.reference.com/
TRANSITION PROBABILITIES
a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9
a .00 .07 .15 .02 .00 .10 .00 .00 .00 .00 .02 .05 .00 .17 .00 .02 .00 .05 .02 .27 .00 .05 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
b .14 .00 .00 .00 .29 .00 .00 .00 .00 .00 .00 .43 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .14 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
c .05 .03 .05 .00 .11 .00 .00 .08 .03 .00 .03 .00 .00 .00 .24 .00 .00 .03 .00 .22 .14 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
d .17 .04 .04 .00 .17 .00 .00 .00 .17 .00 .00 .00 .04 .00 .04 .00 .00 .00 .17 .13 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
e .03 .02 .15 .11 .00 .01 .00 .03 .04 .01 .00 .01 .04 .11 .02 .02 .00 .12 .12 .09 .00 .02 .02 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
f .06 .00 .00 .00 .06 .24 .00 .00 .29 .00 .00 .00 .00 .00 .12 .00 .00 .12 .06 .06 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
g .00 .00 .00 .14 .14 .00 .00 .14 .00 .00 .00 .00 .00 .00 .00 .14 .00 .00 .29 .00 .00 .00 .00 .00 .14 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
h .16 .00 .00 .00 .53 .05 .00 .00 .11 .00 .00 .00 .00 .05 .11 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
i .02 .00 .08 .00 .04 .00 .00 .00 .00 .00 .00 .06 .00 .29 .15 .00 .02 .02 .15 .04 .00 .08 .00 .00 .00 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
j .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
k .00 .00 .00 .00 .50 .00 .00 .00 .00 .00 .00 .50 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
l .06 .00 .00 .06 .35 .00 .00 .00 .06 .00 .00 .06 .00 .00 .12 .00 .00 .00 .06 .06 .06 .00 .00 .00 .12 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
m .08 .00 .00 .00 .08 .00 .00 .00 .08 .00 .00 .00 .00 .00 .17 .33 .00 .00 .00 .17 .08 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
n .04 .00 .02 .18 .16 .00 .12 .00 .02 .00 .00 .00 .00 .00 .02 .00 .00 .02 .12 .24 .02 .00 .02 .00 .00 .00 .00 .00 .00 .00 .00 .02 .00 .00 .00 .00
o .00 .00 .02 .02 .04 .12 .02 .00 .00 .00 .00 .04 .12 .16 .00 .00 .02 .20 .08 .00 .12 .02 .02 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
p .00 .00 .00 .00 .08 .00 .00 .00 .00 .00 .00 .00 .00 .00 .08 .00 .00 .46 .00 .00 .38 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
q .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
r .15 .00 .07 .02 .24 .00 .00 .00 .07 .00 .00 .02 .00 .09 .11 .04 .00 .00 .07 .02 .07 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .02 .00 .00
s .08 .00 .04 .00 .17 .02 .00 .00 .10 .00 .00 .02 .00 .00 .12 .02 .00 .00 .04 .31 .06 .00 .00 .00 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
t .07 .00 .01 .00 .27 .00 .00 .16 .16 .00 .00 .00 .00 .00 .09 .01 .00 .13 .01 .04 .03 .00 .00 .00 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
u .00 .00 .03 .00 .03 .00 .00 .00 .03 .00 .00 .06 .03 .16 .00 .00 .00 .29 .13 .23 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
v .11 .00 .00 .00 .33 .00 .00 .00 .56 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
w .50 .00 .00 .00 .00 .00 .00 .00 .25 .00 .00 .00 .00 .00 .25 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
x .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
y .29 .00 .00 .00 .00 .00 .00 .00 .14 .00 .00 .00 .00 .00 .14 .00 .00 .14 .29 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
z .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
21
ACTIVE .COM DOMAINS
22
82 million active .com domains
MARKOV CHAINS
• Analysis of recent domain registrations
• Using Second Order Markov Chains to detect potentially malicious domain names
– bnkofpunjab is not legitimate
– ferrylines.com is legitimate
– ebay.com is not determinable
abbn.0073
.1733nk
.0641
.0872ko
.0213
.2738of
.0912
.0431fp
.0912
.1534pu
.0732
.0932un
.0014
.0714nj
.2175
.2936ja
.0143
.0437
fe.2626
.1860er
.0301
.0196rr
.0939
.0371ry
.0322
.0291yl
.2419
.1932li
.3598
.1120in
.1457
.1269ne
.0633
.0411es
eb.1064
.4759ba
.0588
.2979ay
23
LIMITATIONS OF THE MARKOV MODEL
• Useful to detect malicious domain names
• Very effective for randomly generated names
• Detects some legitimate domain names as malicious domains
– Malicious names similar to legitimate ones (e.g. ebay.com phishing sites)
– International domain names and punycode
• Solution: add DNS related features into classification process
24
DNS FEATURES
Domain Number of Name Servers
bnkofpunjab.com 15
ferrylines.com 2
ebay.com 4
1. The number of the nameservers that hosted or are hosting this domain
2. The average time of one nameserver to host this domain
3. The maximum time of one nameserver to host this domain
4. The minimum time of one nameserver to host this domain
5. The number of non-activated nameservers that hosted this domain before
6. Whether the domain is an international one
25
0.15 –
0.10 –
0.05 –
0.00 –
EXAMPLE FEATURE
Time of domain on name server (in days)
De
nsi
ty
0 200 400 600
26
27
RESULTS ANALYSIS
27
Tru
e P
osit
ive R
ate
False Positive Rate
Bad BehaviorEmail and Spam
28
IP BLACKLIST LOOKUP
• Mail server looks up sender IP over DNS
• Simple classifier modeled on IP blacklist query logs
• Narrow data set – queried IP, source IP, timestamp
• Deep data set – billions of query records monthly
• More complex data can be included
29
Q?
Q=x
Q?
Q=x
Reputation server
IP LOOKUPS
Sender Receiver
<Q, S, T>
DNS
IP=Q
IP=S
30
– Source IPs (thousands)
FEATURE EXTRACTION
Breadth features
– Number of messages
– Number of recipients
– Burstiness (data transmitted in short, uneven spurts)
– Sending sessions to individual recipients
– Global sending sessions to any recipient
Spectral features
– Periodicity over 24-hour window
– Average and standard deviation of low-frequency discrete Fourier
transform (DFT coefficients)
– Average and standard deviation of high-frequency DFT coefficients
Distribution features
31
SELECTION OF ADVANCED FEATURES
Geographic features
• Location of sender and receiver
• Distance
• Local time at sender and receiver
32
Static features
• Host name features
• Dial-up Ips
• Reputation of neighboring IPs
Content features
• Ratio of good and bad messages
• Number of “from” domains handled
• Persistent sender/receiver address pairs
• Message size distribution
Sparse distribution features
• Source devices (thousands)
• Extended HELO (EHLO) strings (millions)
• “From” domains (billions)
• “To” addresses (billions)
BREADTH FEATURES
0.2
0.4
0.6
0.8
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Normalized number of receivers
No
rmal
ize
d n
um
be
r o
f m
ess
age
s p
er
rece
ive
r
33
Spam
Ham
What’s in a FileA Look at Image Spam and Malware
34
IMAGE SPAM—COMPOSITION
36
CLOSE-UP OF GRADIENT
37
CLOSE-UP OF GRADIENT (CONTINUED)
38
GRADIENT FIELD OF PHOTO
39
GRADIENT DIRECTIONS
40
IMAGE FEATURE ANALYSIS
1:0 2:266 3:285 4:0.933333 5:9678 6:7.83323 7:1 8:0 9:0.038768 10:0.0286506 11:0.0242844 12:12.9656 13:0.688315 14:0.688289
15:0.688927 16:0.688345 17:1.47216 18:1.48728 19:1.45537 20:1.4721 21:0.998652 22:0.998907 23:0.998662 24:1 25:1 26:1 27:1
28:1 29:1 30:1 31:1 32:1 33:1 34:1 35:1 36:1 37:1 38:1 39:1 40:1 41:1 42:1 43:1 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:1 52:1
53:1 54:1 55:1 56:1 57:1 58:1 59:1 60:62895.6 61:62894.4 62:62923.5 63:62897 64:11.9708 65:0.439338 66:0.0768368
67:0.0533835 68:0.694764 69:285 70:97 71:106 72:99 73:97 74:69979 75:69484 76:68665 77:69365 78:1 79:0 80:0 81:0.0342435
82:0.0281361 83:0.025709 84:1327.37 85:35.0028 86:28.6605 87:0.818808 88:1 89:2.98484e+07 90:4.16282e+06 91:8.01424e+06
92:1.49028e+07 93:3.56203e+09 94:7.21651e+06 95:4.73602e+06 96:3.10232e+07 97:0.0083796 98:0.576846 99:1.69219 100:0.480375
101:3.61226e+09 102:3.74413e+07 103:1.22301e+07 104:1.17737e+07 105:3.6044e+07 106:3.47745e+09
1:0 2:403 3:328 4:1.22866 5:14076 6:9.39074 7:1 8:0 9:0.0107123 10:0.00245869 11:0.00118774 12:8.11821 13:0.437548
14:0.43765 15:0.437561 16:0.437535 17:1.50918 18:1.49392 19:1.50991 20:1.50827 21:0.487349 22:3.32315e-05 23:9.95995e-05
24:2 25:1 26:4 27:2 28:1 29:4 30:2 31:1 32:4 33:2 34:1 35:4 36:2 37:1 38:4 39:2 40:1 41:4 42:2 43:1 44:4 45:2 46:1 47:4 48:2
49:1 50:4 51:2 52:1 53:4 54:2 55:1 56:4 57:2 58:1 59:4 60:87436.3 61:87446.5 62:87437.6 63:87435 64:21.4308 65:0.770517
66:0.0444456 67:0.0244281 68:0.549617 69:328 70:98 71:98 72:103 73:90 74:105800 75:99639 76:109102 77:104674 78:1 79:0 80:0
81:0.00520487 82:0.00256461 83:0.00166435 84:771.479 85:20.5683 86:47.573 87:2.31293 88:1 89:1.2547e+07 90:1.11096e+06
91:3.35713e+06 92:4.41541e+06 93:2.70918e+09 94:2.06067e+06 95:2.66906e+06 96:1.28006e+07 97:0.0046313 98:0.539126 99:1.2578
100:0.344938 101:2.72749e+09 102:1.02016e+07 103:1.04445e+07 104:1.03338e+07 105:1.00934e+07 106:2.69858e+09
1:0 2:418 3:320 4:1.30625 5:18652 6:7.17135 7:1 8:0 9:0.0106459 10:0.00264653 11:0.000994318 12:14.1862 13:0.243456
14:0.243497 15:0.243457 16:0.243446 17:2.41721 18:2.4152 19:2.41193 20:2.41671 21:7.91675e-05 22:8.63708e-05 23:0.339384
24:4 25:1 26:8 27:3 28:1 29:8 30:2 31:1 32:8 33:4 34:1 35:8 36:3 37:1 38:8 39:2 40:1 41:8 42:4 43:1 44:8 45:3 46:1 47:8 48:2
49:1 50:8 51:4 52:1 53:8 54:3 55:1 56:8 57:2 58:1 59:8 60:65998.9 61:66004.4 62:65999 63:65997.5 64:10.224 65:0.127104
66:0.0635766 67:0.056407 68:0.88723 69:320 70:53 71:48 72:57 73:57 74:111983 75:115960 76:114435 77:113875 78:1 79:0 80:0
81:0.006407 82:0.00189145 83:0.000485945 84:964.421 85:33.207 86:64.7237 87:1.9491 88:1 89:1.76351e+07 90:2.50429e+06
91:6.24028e+06 92:1.09962e+07 93:3.00335e+09 94:3.5386e+06 95:5.21808e+06 96:1.85759e+07 97:0.00587181 98:0.707707 99:1.1959
100:0.591959 101:3.02005e+09 102:2.17951e+07 103:2.6213e+07 104:2.59369e+07 105:2.15655e+07 106:2.98824e+09
1:0 2:425 3:213 4:1.99531 5:0 6:inf 7:1 8:0 9:0.0204143 10:0.0121072 11:0.00813035 12:14.5448 13:0.574197 14:0.562077
15:0.0938837 16:0.106849 17:2.52864 18:2.29707 19:5.7086 20:5.11698 21:0.0739991 22:0.95797 23:0.951505 24:1 25:1 26:1 27:1
28:1 29:1 30:1 31:1 32:1 33:2 34:1 35:2 36:1 37:1 38:2 39:1 40:1 41:2 42:2 43:5 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:3
52:3.66667 53:5 54:1 55:1 56:5 57:1 58:1 59:3 60:68596 61:67868.2 62:27737.3 63:29590.7 64:11.4527 65:1.08368 66:0.077625
67:0.0372273 68:0.479579 69:213 70:256 71:256 72:256 73:255 74:83329 75:78194 76:72107 77:77795 78:0 79:1 80:0 81:0.0200608
82:0.0118089 83:0.00857222 84:1814.96 85:43.1429 86:37.0588 87:0.858977 88:1 89:3.50206e+07 90:3.97185e+06 91:7.57905e+06
92:1.92885e+07 93:2.92089e+09 94:5.71381e+06 95:5.4605e+06 96:3.81577e+07 97:0.0119897 98:0.695132 99:1.38798 100:0.505495
101:2.99697e+09 102:2.84841e+07 103:1.06295e+07 104:1.04169e+07 105:2.79142e+07 106:2.93701e+09
1:0 2:345 3:328 4:1.05183 5:12654 6:8.94263 7:1 8:0 9:0.197119 10:0.144919 11:0.130974 12:16.5426 13:0.213558 14:0.213561
15:0.213558 16:0.213541 17:2.58033 18:2.58009 19:2.58045 20:2.57963 21:0.00235566 22:8.63563e-05 23:8.24988e-05 24:5 25:1
26:10 27:4 28:1 29:10 30:2 31:1 32:10 33:5 34:1 35:10 36:4 37:1 38:10 39:2 40:1 41:10 42:4 43:1.25 44:9 45:3 46:1.33333 47:8
48:2 49:1 50:8 51:5 52:1 53:10 54:4 55:1 56:10 57:2 58:1 59:10 60:52293.8 61:52294.2 62:52293.9 63:52291.8 64:10.1244
65:0.115826 66:0.0834305 67:0.0747702 68:0.896197 69:328 70:171 71:154 72:169 73:129 74:16728 75:14297 76:14292 77:15012
78:1 79:0 80:0 81:0.167754 82:0.150486 83:0.14035 84:1517.35 85:38.3333 86:65.9516 87:1.72048 88:1 89:2.79228e+07
90:3.07939e+06 91:6.79947e+06 92:1.53908e+07 93:1.22e+08 94:5.30236e+06 95:5.54061e+06 96:2.88332e+07 97:0.228875
98:0.580758 99:1.22721 100:0.533785 101:1.49517e+08 102:3.08441e+07 103:3.45075e+07 104:2.88255e+07 105:2.57652e+07
106:1.24897e+08
41
View of two-dimensional subspace
IMAGE FEATURE ANALYSIS
Ham
Spam
42
MALWARE FEATURE ANALYSIS
43
Conclusions
44
CONCLUSIONS
• Heuristics are limited
• Mathematical descriptions
• Dimensionality
• Intuition
45
July 12, 2011TrustedSource Data Mining Technologies46
RESEARCH PUBLICATIONS
http://www.trustedsource.org/en/resources/publications
46