Loop Investigation for Cursive Handwriting Processing and Recognition
By Tal Steinherz
Advanced Seminar (Spring 05)
Outline Background on cursive handwriting Introduction to loops
Pattern recognition and machine learning conflicts
Feature extraction solutions
Demonstrations and experimental results
Cursive Handwriting (J. C. Simon)
“Displacing a pen from left to right in an oscillating movement, with loops, descendants (legs), and ascendants (poles).”
Cursive vs. Character
Cursive – continuous concatenated set of strokes.produced by a human being in a free style.
Character – a single standalone symbol.produced by a machine subjected to numerous alternative fonts.
Online vs. Offline
Online – captured by pen-like devices.the input format is a two-dimensional signal of pixel locations as a function of time (x(t),y(t)).
Offline – captured by scanning devices.the input format is a two-dimensional image of gray-scale colors as a function of location I(m*n).strokes have significant width.
A Loop (T. Steinherz)
“A set of neighboring foreground pixels surrounding a hole, i.e., a connected blocked group of background pixels in the word’s image, where all foreground pixels are within stroke width distance from the hole.”
The importance of loops Shared by many letters (especially
a,d,e,g,o,p,q) Byproduct of the continuous nature
of cursive handwriting (like with b,f,h,j,k,l,s,t,y,z)
Elementary and prominent features Carry additional information given
by a set of descriptive parameters
The motivation to investigate loops Character recognition
supports discrimination between letters.
Writer modeling Identification Examination
contributes to applications in forensic science and graphology.
The output of loop investigation Incomplete (open) loop identification Hidden (collapsed) loop tracking -
locating blobs that correspond to online loops
Multi (encapsulated) loops understanding - distinguishing natural from artificial loops
Temporal information recovery - retracing the original path of a pen
The Engineering Approach(J. C. Simon & T. Pavlidis)
“Requires understanding the structure of the objects to be recognized and apply the appropriate combination of (pattern recognition) techniques.”
Feature extraction dilemmas
Offline cursive word signal representation Loop identification Signal to noise ratio Feature vector translation
The difficulties consist in the feature extraction and preprocessing rather than the machine learning \ recognition engine phase.
Offline cursive word signal representation
We use the external upper and lower contours in conjunction with the internal contour of all visible loops.
Loop identification
Given a set of singular points, identification is provided by correlation between pieces of the same contour (around anchor points), of the opposite contours and\or in association with subsets of internal contours.
Signal to noise ratio
In order to improve the signal’s parametric quantifiability and reduce noisy artifacts, the contour is transformed to a polygon.
Hidden loop tracking -an application to ascending (descending) loops
Writer#1 Writer#2 Writer#3 Writer#4 Writer#5 Writer#6 Total
Number of words
Number of characters
Number of Loops (all kinds)
223 219 223 170 215 223
1130 1113 1130 835 1083 1130
1273
6421
1039 1272 1013 745 1332 1146 6547
Hidden loop tracking -an application to ascending (descending) loops (cont.)
Offline Loops
Encapsulated Disqualified Found Total
Online LoopsReal Loops
Number
Rate
259
100%
1006 186 519 964
25.7% 18.5% 51.6% 95.8%
Hidden loop tracking -an application to ascending (descending) loops (cont.)
Offline Loops
Encapsulated Disqualified Found Total
Online LoopsLarge Loops
(8)<
Number
Rate
233
100%
856 147 341 721
27.2% 17.2% 39.8% 84.2%
Offline Loops
Encapsulated Disqualified Found Total
Online LoopsLarge Loops
(6)<
Number
Rate
288
100%
1105 177 390 855
26.1% 16.0% 35.3% 77.4%
Hidden loop tracking -an application to ascending (descending) loops (cont.)
Threshold Small Loops No Loops Total
8 180 209 389
6 131 209 340
Multi loops understanding -a classifier of beginning a-s
More than 40 writers with 1-4 samples per writer.