1
Keyboard Acoustics Emanations Revisited Keyboard Acoustics Emanations Revisited Li Zhuang, Feng Zhou, J. D. Tygar , {zl,zf,tygar}@cs.berkeley.edu, University of California, Berkeley http://redtea.cs.berkeley.edu/~zl /keyboard Motivatio n •Emanations of electronic devices leak information •How much information is leaked by emanations? •Apply statistical learning methods to security •What is learned from sound of typing on a keyboard? Key Observation •Build acoustic model for keyboard & typist •Non-random typed text (English) •Limited number of words •Limited letter sequences (spelling) •Limited word sequences (grammar) •Build language model •Statistical learning theory •Natural language processing Acoustic Information: Previous and Ours •Frequency information in sound of each typed key •Why do keystrokes make different sounds? •Different locations on the supporting plate •Each key is slightly different Sample Collector Feedback-based Training Unsupervised Learning Some Experiment Results 4 date sets (12~27mins of recordings) •Feedback for more rounds of training •Output: keystroke classifier •Language independent •Can be used to recognize random sequence of keys •E.g. passwords •Representation of keystroke classifier •Neural networks, linear classification, Gaussian mixtures Alice password Asonov and Agrawal (SSP’04) Ours Requirement Text-labeling Direct recovery Analogy in Cryp to Known-plaintext attack Known-ciphertext attack Feature Extracti on FFT Cepstrum Initial training Supervised learni ng with Neural Ne tworks Clustering (K-means, Gaussian), EM algo rithm Language Mode l / HMMs at differen t levels Feedback-based Training / Self-improving f eedback Feature Extraction Language Model Correction Keystroke Classifier (use trained classifiers for each key to recognize sound samples) Subsequent recognition wave signal Unsupervised Learning Language Model Correction Sample Collector Classifier Builder Feature Extraction Initial training wave signal keystroke classifier recovered keystrokes •How to represent a keystroke? •Vector of features: •FFT, Cepstrum •Cepstrum features is better •Also used in speech recognition Feature Extraction •Group keystrokes into N clusters •Assign keystroke a label, 1, …, N •Find best mapping from cluster labels to characters •Some character combinations are more common •“th” vs. “tj” •Hidden Markov Models (HMMs) 5 11 2 “t “h “e Language Model Correction Before spelling and grammar correction After spelling and grammar correction Recovered keystrokes Set 1 (%) Set 2 (%) Set 3 (%) Set 4 (%) Word Char Word Char Word Char Word Char Init ial 35 76 39 80 32 73 23 68 Fina l 90 96 89 96 83 95 80 92 3 different models of keyboards (12mins recording) Keyboard 1 (%) Keyboard 2 (%) Keyboard 3 (%) Word Char Word Char Word Char Initi al 31 72 20 62 23 64 Final 82 93 82 94 75 90 3 different supervised learning methods in feedback 0 10 20 30 40 50 60 70 80 90 100 Word Char NN LC GM 4/26/2006

Keyboard Acoustics Emanations Revisited

Embed Size (px)

DESCRIPTION

“t”. “h”. “e”. Feature Extraction. How to represent a keystroke? Vector of features: FFT, Cepstrum Cepstrum features is better Also used in speech recognition. Initial training. wave signal. Feature Extraction. Unsupervised Learning. Language Model Correction. Sample Collector. - PowerPoint PPT Presentation

Citation preview

Page 1: Keyboard Acoustics Emanations Revisited

Keyboard Acoustics Emanations RevisitedKeyboard Acoustics Emanations RevisitedLi Zhuang, Feng Zhou, J. D. Tygar, {zl,zf,tygar}@cs.berkeley.edu, University of California, Berkeley http://redtea.cs.berkeley.edu/~zl/keyboar

d

Motivation•Emanations of electronic devices leak information

•How much information is leaked by emanations?

•Apply statistical learning methods to security

•What is learned from sound of typing on a keyboard?

Key Observation•Build acoustic model for keyboard &

typist•Non-random typed text (English)

•Limited number of words•Limited letter sequences (spelling)•Limited word sequences (grammar)

•Build language model•Statistical learning theory•Natural language processing

Acoustic Information: Previous and Ours•Frequency information in sound of each typed key•Why do keystrokes make different sounds?•Different locations on the supporting plate•Each key is slightly different

Sample Collector

Feedback-based Training

Unsupervised Learning

Some Experiment Results4 date sets (12~27mins of recordings)

•Feedback for more rounds of training•Output: keystroke classifier

•Language independent•Can be used to recognize random sequence of

keys•E.g. passwords

•Representation of keystroke classifier•Neural networks, linear classification, Gaussian mixtures

Alicepassword

Asonov and Agrawal (SSP’04) Ours

Requirement Text-labeling Direct recoveryAnalogy in Cry

ptoKnown-plaintext att

ackKnown-ciphertext att

ackFeature Extract

ion FFT Cepstrum

Initial training Supervised learning with Neural Networks

Clustering (K-means, Gaussian), EM algorithm

Language Model / HMMs at different lev

elsFeedback-base

d Training / Self-improving feedback

Feature Extraction

Language Model Correction

Keystroke Classifier(use trained classifiers for

each key to recognize sound samples)

Subsequent recognitionwave signal

Unsupervised Learning

Language Model Correction

Sample Collector

Classifier Builder

Feature Extraction

Initial training

wave signal

keystroke classifierrecovered keystrokes

•How to represent a keystroke?•Vector of features: •FFT, Cepstrum

•Cepstrum features is better•Also used in speech recognition

Feature Extraction

•Group keystrokes into N clusters•Assign keystroke a label, 1, …, N

•Find best mapping from cluster labels to characters•Some character combinations are more common

•“th” vs. “tj”•Hidden Markov Models (HMMs)

5 11 2

“t” “h” “e”

Language Model Correction

Before spelling and

grammar correction

After spelling and grammar

correction

Recovered keystrokes

Set 1 (%) Set 2 (%) Set 3 (%) Set 4 (%)Word Char Word Char Word Char Word Char

Initial 35 76 39 80 32 73 23 68Final 90 96 89 96 83 95 80 92

3 different models of keyboards (12mins recording)Keyboard 1 (%) Keyboard 2 (%) Keyboard 3 (%)

Word Char Word Char Word CharInitial 31 72 20 62 23 64Final 82 93 82 94 75 90

3 different supervised learning methods in feedback

0102030405060708090100

Word Char

NNLC

GM

4/26/2006