Upload
hortense-lang
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Remote Hardware Fingerprinting:
A Statistical Approach
R. Fink ~ May, 2006
Problem
Identify Specific Machines via Remote Network Fingerprinting Passive Networked Physical properties of the machine
Use to: Identify endpoints in a communication Show that an endpoint participated in a transaction Show that an endpoint did not participate in a transaction
Challenges: What properties? Similar machines? Network delay factors?
TCP Timestamp Option
32-bit TS Value indicates clock tick, bound to oscillator circuit, crystal
Present in most TCP packets by default (all of Linux, Windows can be tricked)
Best part: independent of network time server corrections!
Timestamps
FlagsReserved
Offset
Source Port Destination Port
Sequence Number
Acknowledgement Number
Checksum
Window Size
Urgent Pointer
Options + Padding
TCP
Kind=8 Len=10 TS Reply (32)TS Value (32)
Approach
Passively collect TS values from observed machine, to
IP address identifies machine during collection phase
Record to along with measurer system time, tm
Scatter-plot to versus tm
Fit a regression line to the slope Slope is the clock skew of the
observed machine: that is, the amount of drift relative to the measurer per unit time
Group similar drifts to sort out individual machines
0
20
40
60
80
100
120
140
160
0 20 40 60 80 100 120 140 160 180
tm
toClock skew
Clock skew BClock s
kew C
Previous Research
Kohno, Claffy, Broido 63 Campus Machines 38 days of data (12 hour spans)
Convex Hull Method of Fit Posed, but did not address:
Required sample size Effect of differing topology
Ignored Statistical Techniques ~
Using a convex hull technique, instead of a linear regression technique, throws out the whole body of error
analysis theory!
0
20
40
60
80
100
120
140
160
0 20 40 60 80 100 120 140 160 180
Current Work
Recreated Experiment 4 identical Dell GX-150 machines,
one observer Collected initial data on fast switch
Extended the Research Skew via linear regression
algorithm Error analysis theory to estimate
required number of samples Simulated WAN delay (via Linux
Netfilter hacking) in progress Measured PCI bus with frequency
counter to verify the physical link to clock skew
0
20
40
60
80
100
120
140
160
0 20 40 60 80 100 120 140 160 180
Results
1. PCI bus clock speed is directly related to clock skew
2. Linear regression (in LAN case) uniquely identifies machines to within a couple parts per million (ppm)
3. Number of samples required is directly proportional to observed timestamp error and confidence interval, inversely proportional to collection interval and allowed ppm tolerance• Validated on repeated population
subsets4. Showed clock skew varies with
machine temperature5. In progress – experiments on WAN
data
samples ofnumber required n secs interval,collectionfixed
limitconfidence5%*2
erroryysampleextremesbetweendifferenceppm
slopesample1
where
110623/2
1
*243/1
tntes
b
tb
stn en
Summary
Highlights Clock skew is a repeatable way to fingerprint a specific
machine Linear regression, a simple machine learning concept, is
readily applied Statistical error analysis tells us how much to collect
Lowlights TCP timestamp options are, well, OPTIONAL ~ can just
turn them off Future Research
Wireless mobile devices: effect of battery, topology, mobility, clock stepping
Other protocol properties, not just timestamps