View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Keystroke Biometrics Study
Software Engineering Project Team
+ DPS Student
2
Keystroke Biometric
As with other biometrics, the keystroke one is becoming important for security apps
Advantage - inexpensive and easy to implement, the only hardware needed is a keyboard
Disadvantage - behavioral rather than physiological biometric, easy to disguise
One of the least studied biometrics, thus good for dissertation studies
3
Focus of Study Previous studies mostly
concerned short character string input Password hardening Short name strings
We focus on large text input 200 or more characters per sample
4
Focus of Study (cont)
Applications of interest Identification
1-of-n classification problem e.g., sender of inappropriate e-mail in a
business environment with a limited number of employees
Verification Binary classification problem, yes/no e.g., student taking online exam
5
Software Components
Raw Keystroke Data Capture over the Internet (Java applet)
Feature Extraction (SAS software) Classification (SAS software)
Training Testing
6
Keystroke Data Capture
(Java Applet)
Raw data recorded for each entry Key’s character Key’s code text equivalent Key’s location on keyboard
1 = standard, 2 = left, 3 = right Time key was pressed (msec) Time key was released (msec) Number of left, right, double mouse
clicks
7
Keystroke Data Capture(Java Applet)
8
Aligned Raw Data File(Hello World!)
9
SAS Statistical Software:Feature Extraction &
Classification
Powerful tool with its own programming language and development environment
Data management Relational database built-in Many data manipulation functions
Statistical analysis Library of procedures to do a wide variety
of statistical analyses
10
Feature Extraction
10 Mean and 10 Std of key press durations 8 most frequent alphabet letters (e, a, r, i, o, t, n, s)
Space & shift keys
10 Mean and 10 Std of key transitions 8 most common digrams (in, th, ti, on, an, he, al,
er) Space-to-any-letter & any-letter-to-space
15 Total number of keypresses for Space, backspace, delete, insert, home, end, enter,
ctrl, 4 arrow keys combined, shift (left), shift (right), total entry time, left, right, & double mouse clicks
11
Feature Measurement Sample
12
Feature Extraction Preprocessing
Outlier removal Remove samples > 2 std from mean Prevents skewing of feature
measurements caused by pausing of the keystroker
Standardization x’ = (x - xmin) / (xmax - xmin) Scales to range 0-1 to give roughly equal
weight to each feature
13
Classification Identification
Nearest neighbor classifier using Euclidean distance
Input sample compared to every training sample
Verification Dichotomizer (feature difference model) Train with neural network
14
Experimental Design:Identification Experiment
15 subjects that know the purpose of exp. Training – 5 reps of text a (approx. 600 char) Testing
5 reps of text a 5 reps of text b (same length as text a) 5 reps of text c (half length of text a)
28 subjects don’t know purpose of input Subset of above training/testing data Also, arbitrary text input of reasonable length
15
Experimental Design: Instructions for Subjects
All subjects will be told to make any necessary corrections to the input data (texts a, b, and c are Aesop fables)
Knowing subjects will be told to input the data using their normal keystroke dynamics
The experiments are designed so that subjects leave at least a day between entering samples
16
Experimental Design:Text a – about 600
characters This is an Aesop fable about the bat and the
weasels. A bat who fell upon the ground and was caught by a weasel pleaded to be spared his life. The weasel refused, saying that he was by nature the enemy of all birds. The bat assured him that he was not a bird, but a mouse, and thus was set free. Shortly afterwards the bat again fell to the ground and was caught by another weasel, whom he likewise entreated not to eat him. The weasel said that he had a special hostility to mice. The bat assured him that he was not a mouse, but a bat, and thus a second time escaped. The moral of the story: it is wise to turn circumstances to good account.
17
Experimental Design (cont)
Verification Basically the same as for
identification The training and testing data consists
of various text input samples collected over a period of approximately 10 weeks
18
Expected Outcomes: Recognition Accuracy
Accuracy on text a > that on text b text a is the training text
Accuracy on text b > that on text c text b is longer than text c
Accuracy on texts a, b, c > arbitrary text texts a, b, & c are similar, all Aesop fables
Accuracy on knowing subjects > that on unknowing ones Knowing subjects are more likely to use their
normal keystroke dynamics for all input
19
Expected Outcomes:Analysis of Experimental
Results
Feature analysis – which are better? Key press durations or transitions More or less frequent letters/digrams Other feature measurements
Determine the spread (std) of feature measurements within versus across subjects
20
Preliminary Results Reduced identification experiment
Smaller text input “The quick brown fox jumps over the lazy
dog.” Fewer subjects
Three project team members Fewer feature measurements
Mean and std for “e” and “o” key press durations
Accuracy of 80%, which is promising
21
Questions/Comments?
Focus or applications? Software implementation? Experimental design? Expected experimental outcomes?