Upload
kenny-park
View
186
Download
0
Embed Size (px)
Citation preview
Auto Correctionfor
Mobile Typing
2016320172 Chan Ho Jun
2016320177 Hyeon Min Park
2016160040 Sun Mook Choi
2016-06-14 1
Ultimate Goal of Spelling Correction
Reducing spelling errors while the user types the same way
as before
Reducing spelling errors that occur at borders between keys
2016-06-14 4
Cause of Spelling Error
The difference among an individual’s touch distribution
The difference between a key’s area of recognition and an
individual’s touch distribution
2016-06-14 5
Review
Machine Learning
Learn through training data
Supervised Learning
Knowing a user’s intention is the key to spelling correction
Supervised model
- Refined input & answer information
2016-06-14 6
Review (Cont’d)
Problem
Difficult to differentiate which key the user pressed when he or she
presses the border between keys
Other Algorithms
By tracking backspace
- Inferring the answer information
- Learning through supervised learning
Low accuracy
2016-06-14 7
Semi-supervised Learning
Supervised learning
A small amount of labeled data (the answer information)
Unsupervised learning
A large amount of unlabeled data (the distribution of pressed keys)
A model that can learn without an answer information when
a user presses the borders between keys
2016-06-14 8
Clustering Algorithm
Grouping similar objects into a same group
Distribution-based clustering
Gaussian mixture models
- Using the Expectation-Maximization algorithm
2016-06-14 9
Clustering Algorithm (Cont’d)
Data near the key center
Intended that key
Used first-hand to educate the model
Data on key borders
Filed into the clustering algorithm
- Widen a key's area of recognition
2016-06-14 10
Statistics
5.52% Error rate25.4% decreased
4.12%
292.0 press/min Input speed4.8% increased
306.1 press/min
9.19% Backspace input23.6% decreased
7.02%
2016-06-14 12
Problems or Limitations
Not possible to suggest correction on a contextual basis
When data set is small - High error rate when false data is
mistakenly input
2016-06-14 16
SwiftKey
Natural Language Processing (NLP) for predictions and
spelling corrections
Retroactive correction
2016-06-14 18
NLP – Types of Errors
Non word error (NWE)
bannana → banana
Real word error (RWE)
Typographical
- two → tow
Cognitive
- two → too
2016-06-14 19
Correction
NWE
RWE
Candidate generation
Candidate selection
Detect errorCandidate generation
Candidate selection
2016-06-14 20
Candidate Generation
Words with similar spelling
Words with similar pronunciation ( for RWE )
The word itself ( for RWE )
2016-06-14 21
Candidate GenerationWords with similar spelling
Smallest edit distance between words where the edits of
letters are
Deletion
Insertion
Substitution
Reversal (Transposition)
80% to 95% of errors are within edit distance 1
2016-06-14 22
Candidate GenerationExample
Typo Candidate ti ci Type
acress
actress t Deletion
cress a Insertion
caress ac ca Reversal
access r c Substitution
across e o Substitution
acres s Insertion
acres s Insertion
2016-06-14 23Jurafsky 2012
Candidate Selection
Select the candidate where the following is greatest:
𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑡𝑦𝑝𝑜
=𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)
𝑃(𝑡𝑦𝑝𝑜)
≈ 𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
Bayes’ Theorem
Error Model Language Model
2016-06-14 24
Candidate SelectionLanguage Model
Unigram Model
𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)
The ratio of the frequency of 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 and the total count of words in
the training set
n-gram Model
𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒|𝑤𝑜𝑟𝑑1,… ,𝑤𝑜𝑟𝑑𝑛−1)
The ratio of the frequency of 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒with considering n-1 words
surrounding the training set
2016-06-14 25
Candidate SelectionError Model
Noisy Channel Model
Kernighan, Church, Gale 1990
𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ≈
𝑑𝑒𝑙 𝑐𝑖−1, 𝑐𝑖𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1𝑐𝑖]
, if deletion
𝑑𝑒𝑙 𝑐𝑖−1, 𝑡𝑖𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1]
, if insertion
𝑑𝑒𝑙 𝑡𝑖 , 𝑐𝑖𝑐𝑜𝑢𝑛𝑡[𝑐𝑖]
, if substitution
𝑟𝑒𝑣 𝑐𝑖 , 𝑐𝑖+1𝑐𝑜𝑢𝑛𝑡[𝑐𝑖𝑐𝑖+1]
, if reversal
𝑑𝑒𝑙[𝑥,𝑦] : count of 𝑥𝑦 typed as 𝑥𝑎𝑑𝑑[𝑥,𝑦] : count of 𝑥 typed as 𝑥𝑦𝑠𝑢𝑏[𝑥,𝑦] : count of 𝑥 typed as 𝑦𝑟𝑒𝑣[𝑥,𝑦] : count of 𝑥𝑦 typed as 𝑦𝑥
𝑐𝑖 : the edit letter in correction𝑡𝑖 : the edit letter in typo
𝑐𝑜𝑢𝑛𝑡[𝑥] : count of 𝑥 in training set𝑐𝑜𝑢𝑛𝑡[𝑥𝑦] : count of 𝑥𝑦 in training set
2016-06-14 26
Candidate GenerationExample
Jurafsky 2012
Typo Candidate ti ci Type
acress
actress t Deletion
cress a Insertion
caress ac ca Reversal
access r c Substitution
across e o Substitution
acres s Insertion
acres s Insertion
2016-06-14 29
Candidate SelectionExample (Language Model: Unigram, Error Model: Noisy Channel Model)
Candidate Frequency P(Candidate) P(Typo|Candidate) P(Typo|Candidate)P(Candidate)
actress 9321 .0000230573 .000117000 2.7000 × 10-9
cress 220 .0000005442 .000001440 .00078 × 10-9
caress 686 .0000016969 .000001640 .00280 × 10-9
access 37038 .0000916207 .000000209 .01900 × 10-9
across 120844 .0002989314 .000009300 2.8000 × 10-9
acres 12874 .0000318463 .000032100 1.0000 × 10-9
acres 12874 .0000318463 .000034200 1.0000 × 10-9
Using training set of Corpus of Contemporary English (400 million words)
2016-06-14 30Jurafsky 2012
Candidate SelectionExample (Language Model: Bigram)
“… a stellar and versatile acress whose combination of sass
and glamour …”
Using training set of Corpus of Contemporary English (400 million words)
P(actress|versatile) = .000021 P(whose|actress) = .0010
P(across|versatile) = .000021 P(whose|across) = .000006
P(versatile, actress, whose) = .000021 × .001000 = 210 × 10-10
P(versatile, across, whose) = .000021 × .000006 = 1 × 10-10
2016-06-14 31Jurafsky 2012
Reference
https://en.wikipedia.org/wiki/Semi-supervised_learning
https://en.wikipedia.org/wiki/Cluster_analysis#Algorithms
https://play.google.com/store/apps/details?id=com.notakeyboard&hl=ko
Kernighan, Mark D., Kenneth W. Church, and William A. Gale. (1990). A Spelling Correction
Program Based on a Noisy Channel Model.
Jurafsky, D. (2012). Spelling Correction and the Noisy Channel. Lecture. Retrieved June 10,
2016, from http://spark-public.s3.amazonaws.com/nlp/slides/spelling.pdf
2016-06-14 35
Thank You
You can look again this presentation athttps://docs.com/kennyhm97/2659/16-06-14-auto-correction-for-mobile-typing
2016-06-14 37