Keyboard Acoustics Emanations Revisited

Keyboard Acoustics Emanations RevisitedKeyboard Acoustics Emanations RevisitedLi Zhuang, Feng Zhou, J. D. Tygar, {zl,zf,tygar}@cs.berkeley.edu, University of California, Berkeley http://redtea.cs.berkeley.edu/~zl/keyboar

d

Motivation•Emanations of electronic devices leak information

•How much information is leaked by emanations?

•Apply statistical learning methods to security

•What is learned from sound of typing on a keyboard?

Key Observation•Build acoustic model for keyboard &

typist•Non-random typed text (English)

•Limited number of words•Limited letter sequences (spelling)•Limited word sequences (grammar)

•Build language model•Statistical learning theory•Natural language processing

Acoustic Information: Previous and Ours•Frequency information in sound of each typed key•Why do keystrokes make different sounds?•Different locations on the supporting plate•Each key is slightly different

Sample Collector

Feedback-based Training

Unsupervised Learning

Some Experiment Results4 date sets (12~27mins of recordings)

•Feedback for more rounds of training•Output: keystroke classifier

•Language independent•Can be used to recognize random sequence of

keys•E.g. passwords

•Representation of keystroke classifier•Neural networks, linear classification, Gaussian mixtures

Alicepassword

Asonov and Agrawal (SSP’04) Ours

Requirement Text-labeling Direct recoveryAnalogy in Cry

ptoKnown-plaintext att

ackKnown-ciphertext att

ackFeature Extract

ion FFT Cepstrum

Initial training Supervised learning with Neural Networks

Clustering (K-means, Gaussian), EM algorithm

Language Model / HMMs at different lev

elsFeedback-base

d Training / Self-improving feedback

Feature Extraction

Language Model Correction

Keystroke Classifier(use trained classifiers for

each key to recognize sound samples)

Subsequent recognitionwave signal

Unsupervised Learning


Sample Collector

Classifier Builder

Feature Extraction

Initial training

wave signal

keystroke classifierrecovered keystrokes

•How to represent a keystroke?•Vector of features: •FFT, Cepstrum

•Cepstrum features is better•Also used in speech recognition

Feature Extraction

•Group keystrokes into N clusters•Assign keystroke a label, 1, …, N

•Find best mapping from cluster labels to characters•Some character combinations are more common

•“th” vs. “tj”•Hidden Markov Models (HMMs)

5 11 2

“t” “h” “e”


Before spelling and

grammar correction

After spelling and grammar

correction

Recovered keystrokes

Set 1 (%) Set 2 (%) Set 3 (%) Set 4 (%)Word Char Word Char Word Char Word Char

Initial 35 76 39 80 32 73 23 68Final 90 96 89 96 83 95 80 92

3 different models of keyboards (12mins recording)Keyboard 1 (%) Keyboard 2 (%) Keyboard 3 (%)

Word Char Word Char Word CharInitial 31 72 20 62 23 64Final 82 93 82 94 75 90

3 different supervised learning methods in feedback

0102030405060708090100

Word Char

NNLC

GM

4/26/2006

Documents

Keyboard Acoustics Emanations Revisited