[16.06.14] Auto Correction for Mobile Typing

Auto Correctionfor

Mobile Typing

2016320172 Chan Ho Jun

2016320177 Hyeon Min Park

2016160040 Sun Mook Choi

2016-06-14 1

Contents

Algorithm Research

Nota Keyboard

SwiftKey

Conclusion

Reference

2016-06-14 2

ALGORITHM RESEARCHChapter 1

2016-06-14 3

Ultimate Goal of Spelling Correction

Reducing spelling errors while the user types the same way

as before

Reducing spelling errors that occur at borders between keys

2016-06-14 4

Cause of Spelling Error

The difference among an individual’s touch distribution

The difference between a key’s area of recognition and an

individual’s touch distribution

2016-06-14 5

Review

Machine Learning

Learn through training data

Supervised Learning

Knowing a user’s intention is the key to spelling correction

Supervised model

- Refined input & answer information

2016-06-14 6

Review (Cont’d)

Problem

Difficult to differentiate which key the user pressed when he or she

presses the border between keys

Other Algorithms

By tracking backspace

- Inferring the answer information

- Learning through supervised learning

Low accuracy

2016-06-14 7

Semi-supervised Learning

Supervised learning

A small amount of labeled data (the answer information)

Unsupervised learning

A large amount of unlabeled data (the distribution of pressed keys)

A model that can learn without an answer information when

a user presses the borders between keys

2016-06-14 8

Clustering Algorithm

Grouping similar objects into a same group

Distribution-based clustering

Gaussian mixture models

- Using the Expectation-Maximization algorithm

2016-06-14 9

Clustering Algorithm (Cont’d)

Data near the key center

Intended that key

Used first-hand to educate the model

Data on key borders

Filed into the clustering algorithm

- Widen a key's area of recognition

2016-06-14 10

NOTA KEYBOARDChapter 2

2016-06-14 11

Statistics

5.52% Error rate25.4% decreased

4.12%

292.0 press/min Input speed4.8% increased

306.1 press/min

9.19% Backspace input23.6% decreased

7.02%

2016-06-14 12

Usage Map

5/8 ~ 6/10

2016-06-14 13

Typing Video

2016-06-14 14

Correction Moment

2016-06-14 15

Problems or Limitations

Not possible to suggest correction on a contextual basis

When data set is small - High error rate when false data is

mistakenly input

2016-06-14 16

SWIFTKEYChapter 3

2016-06-14 17

SwiftKey

Natural Language Processing (NLP) for predictions and

spelling corrections

Retroactive correction

2016-06-14 18

NLP – Types of Errors

Non word error (NWE)

bannana → banana

Real word error (RWE)

Typographical

- two → tow

Cognitive

- two → too

2016-06-14 19

Correction

NWE

RWE

Candidate generation

Candidate selection

Detect errorCandidate generation

Candidate selection

2016-06-14 20

Candidate Generation

Words with similar spelling

Words with similar pronunciation ( for RWE )

The word itself ( for RWE )

2016-06-14 21

Candidate GenerationWords with similar spelling

Smallest edit distance between words where the edits of

letters are

Deletion

Insertion

Substitution

Reversal (Transposition)

80% to 95% of errors are within edit distance 1

2016-06-14 22

Candidate GenerationExample

Typo Candidate ti ci Type

acress

actress t Deletion

cress a Insertion

caress ac ca Reversal

access r c Substitution

across e o Substitution

acres s Insertion

acres s Insertion

2016-06-14 23Jurafsky 2012

Candidate Selection

Select the candidate where the following is greatest:

𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑡𝑦𝑝𝑜

=𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)

𝑃(𝑡𝑦𝑝𝑜)

≈ 𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒

Bayes’ Theorem

Error Model Language Model

2016-06-14 24

Candidate SelectionLanguage Model

Unigram Model

𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)

The ratio of the frequency of 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 and the total count of words in

the training set

n-gram Model

𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒|𝑤𝑜𝑟𝑑1,… ,𝑤𝑜𝑟𝑑𝑛−1)

The ratio of the frequency of 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒with considering n-1 words

surrounding the training set

2016-06-14 25

Candidate SelectionError Model

Noisy Channel Model

Kernighan, Church, Gale 1990

𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ≈

𝑑𝑒𝑙 𝑐𝑖−1, 𝑐𝑖𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1𝑐𝑖]

, if deletion

𝑑𝑒𝑙 𝑐𝑖−1, 𝑡𝑖𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1]

, if insertion

𝑑𝑒𝑙 𝑡𝑖 , 𝑐𝑖𝑐𝑜𝑢𝑛𝑡[𝑐𝑖]

, if substitution

𝑟𝑒𝑣 𝑐𝑖 , 𝑐𝑖+1𝑐𝑜𝑢𝑛𝑡[𝑐𝑖𝑐𝑖+1]

, if reversal

𝑑𝑒𝑙[𝑥,𝑦] : count of 𝑥𝑦 typed as 𝑥𝑎𝑑𝑑[𝑥,𝑦] : count of 𝑥 typed as 𝑥𝑦𝑠𝑢𝑏[𝑥,𝑦] : count of 𝑥 typed as 𝑦𝑟𝑒𝑣[𝑥,𝑦] : count of 𝑥𝑦 typed as 𝑦𝑥

𝑐𝑖 : the edit letter in correction𝑡𝑖 : the edit letter in typo

𝑐𝑜𝑢𝑛𝑡[𝑥] : count of 𝑥 in training set𝑐𝑜𝑢𝑛𝑡[𝑥𝑦] : count of 𝑥𝑦 in training set

2016-06-14 26

2016-06-14 27Kernighan, Church, Gale 1990

2016-06-14 28Kernighan, Church, Gale 1990

Candidate GenerationExample

Jurafsky 2012

Typo Candidate ti ci Type

acress

actress t Deletion

cress a Insertion

caress ac ca Reversal

access r c Substitution

across e o Substitution

acres s Insertion

acres s Insertion

2016-06-14 29

Candidate SelectionExample (Language Model: Unigram, Error Model: Noisy Channel Model)

Candidate Frequency P(Candidate) P(Typo|Candidate) P(Typo|Candidate)P(Candidate)

actress 9321 .0000230573 .000117000 2.7000 × 10-9

cress 220 .0000005442 .000001440 .00078 × 10-9

caress 686 .0000016969 .000001640 .00280 × 10-9

access 37038 .0000916207 .000000209 .01900 × 10-9

across 120844 .0002989314 .000009300 2.8000 × 10-9

acres 12874 .0000318463 .000032100 1.0000 × 10-9

acres 12874 .0000318463 .000034200 1.0000 × 10-9

Using training set of Corpus of Contemporary English (400 million words)

2016-06-14 30Jurafsky 2012

Candidate SelectionExample (Language Model: Bigram)

“… a stellar and versatile acress whose combination of sass

and glamour …”

Using training set of Corpus of Contemporary English (400 million words)

P(actress|versatile) = .000021 P(whose|actress) = .0010

P(across|versatile) = .000021 P(whose|across) = .000006

P(versatile, actress, whose) = .000021 × .001000 = 210 × 10-10

P(versatile, across, whose) = .000021 × .000006 = 1 × 10-10

2016-06-14 31Jurafsky 2012

CONCLUSIONChapter 4

2016-06-14 32

Nota Keyboard SwiftKey

Preventing typo’s Correcting typo’s

2016-06-14 33

REFERENCEAppendix

2016-06-14 34

Reference

https://en.wikipedia.org/wiki/Semi-supervised_learning

https://en.wikipedia.org/wiki/Cluster_analysis#Algorithms

https://play.google.com/store/apps/details?id=com.notakeyboard&hl=ko

Kernighan, Mark D., Kenneth W. Church, and William A. Gale. (1990). A Spelling Correction

Program Based on a Noisy Channel Model.

Jurafsky, D. (2012). Spelling Correction and the Noisy Channel. Lecture. Retrieved June 10,

2016, from http://spark-public.s3.amazonaws.com/nlp/slides/spelling.pdf

2016-06-14 35

https://en.wikipedia.org/wiki/Semi-supervised_learning

https://en.wikipedia.org/wiki/Cluster_analysis#Algorithms

https://play.google.com/store/apps/details?id=com.notakeyboard&hl=ko

http://spark-public.s3.amazonaws.com/nlp/slides/spelling.pdf

Q&A

2016-06-14 36

Thank You

You can look again this presentation athttps://docs.com/kennyhm97/2659/16-06-14-auto-correction-for-mobile-typing

2016-06-14 37

https://docs.com/kennyhm97/2659/16-06-14-auto-correction-for-mobile-typing

Education

[16.06.14] Auto Correction for Mobile Typing