Upload
buidat
View
222
Download
0
Embed Size (px)
Citation preview
1
Màster de Visió per Computador Curs 2006 - 2007
OPTICAL CHARACTER RECOGNITION
2Outline
Outline
• Introduction• Pre-processing (document level)
– Binarization– Skew correction
• Segmentation– Layout analysis– Character segmentation
• Pre-processing (character level)• Feature extraction
– Image-based features– Statistical features– Transform-based features– Structural features
• Classification• Post-proccessing
– Classifier combination– Exploitation of context information
• Examples of OCR systems• Bibliography
2
3
Optical Character Recognition
StatisticalPattern Recognition
StructuralPattern Recognition
Document Analysis
OpticalCharacterRecognition
Methods
Applications
Introduction
Pattern Recognition
Image Processing
4
Some examples
Books, journals, reports
Postal addresses
Drawings, maps
IdentityCards
License plates
Quality controlIntroduction
PDAs
Cheques, billsOld documents
3
5
Document Image Analysis
What is a document? Objects created expressly to convey information encoded as iconic symbols– Scanned images from paper documents– Electronic documents– Multimedia documents (video with text)– …
Document image analysis is the subfield of digital image processing that aims at converting document images to symbolic form for modification, storage, retrieval, reuse and transmission.
Document image analysis is the theory and practice of recovering the symbol structure of digital images scanned from paper or produced by computer.
Introduction
G. Nagy: Twenty years of document image analysis in PAMI. IEEE Trans. on PAMI, vol. 22, nº 1, pp. 38-62 January 2000.
6
Applications of DIA
Applications of DIA
Document Imaging:
Document understanding•Recognition
•Interpretation
•Indexing
•Retrieval
•Digitization
•Storage
•Compression
•Re-printing
Introduction
4
7
DIA tasks
•Text-graphics separation
•Symbol recognition
•Interpretation
•Segmentation
•Layout analysis
•OCRDocument Understanding
•Acquisition
•Binarization
•Filtering
•Vectorization
•Acquisition
•Binarization
•Filtering
•Skew correction
Document Imaging
Mostly graphicsMostly text
Introduction
8
Outline of the course
1. Acquisition2. Pre-processing
− Binarization− Skew correction
3. Layout analysis4. Character segmentation5. OCR
− Feature extraction− Classification− Post-processing
Focus: Document understanding of mostly text documents
Introduction
5
9
Categorization of Character Recognition
Optical Character Recognition
Machine-printed character recognition
Hand-written character recognition
On-linecharacter recognition
Off-linecharacter recognition
According to the type of writing
According to the type of acquisition
Introduction
10
Machine-printed character recognition
• Characters are totally defined by the font type:– Dimensions (segmentation)
• Character width
• Inter-character separation
• Character height
– Shape (recognition)• Typographic effects (boldface, italics, underline).
• Challenges:– Similar shapes among characters
– Multiple fonts
– Joined characters
– Digitization noise: broken lines, random noise, heavy characters, etc.
– Document degradation: old documents, photocopies, etc.
Introduction
6
11
Machine-printed character recognition
• Classification of machine-printed OCR systems– Monofont:
• One single type of font
– Multifont:• Recognition of a fixed and known set of fonts
• It is necessary to identify and learn the differences between characters of all thetypes of fonts
– Omnifont:• Recognition of any arbitrary type of font, even if it has not been previously
learned
Introduction
12
Off-line hand-written character recognition
• Hand-written
• Off-line: acquisition by a scanner or a camera
• Challenges:– Shape variability among images of the same character
– Character segmentation
• Subproblems:– Hand-written numeral recognition: digit recognition
– Hand-printed character recognition: well-separated characters
– Cursive character recognition: non-separated characters
Introduction
7
13
On-line hand-written character recognition
• On-line acquisition– Digitizer tablets
– Digital Pen
– Tablet PC
• Advantages with respect to off-line acquisition:– Image is acquired while the text is written
– We can take advantage of dynamic information:• Temporal information: writing order, stroke segmentation, etc
• Writing speed
• Pen pressure
• Subproblems:– Cursive script recognition.
– Signature verification/recognition.
Introduction
14
Levels of difficulty in character recognition
• Little shape variability• Small number of characters• Little noiseL
evel
0 0.0. Printed characters. Specific font. Constant size. Roman alphabets.
0.1. Constrained hand-printed characters. Arabic numerals.
Lev
el 1 • Medium variation in shape
• Medium noise
1.0. Printed characters. Multiple fonts. Nº characters < 100
1.1. Loosely constrained hand-printed char. Nº char < 100
1.2. Chinese characters of few fonts
1.3. Loosely constrained hand-printed char. Nº char≈1000
Lev
el 2 • Much variation in shape
• Heavy noise
2.0. Printed characters of multiple fonts
2.1. Unconstrained hand-printed characters
2.2. Affine transformed characters
Lev
el 3 • Nonsegmented strings of characters
3.0. Touching/broken characters
3.1. Cursive handwriting characters
3.2. Characters on a textured background
Introduction
•S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999.•S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier. Kluwer Academic Publishers. 1999.
8
15
Levels of difficulty in character recognition
Level 00.0. Printed character of a specific font with a constant size
• Constant size
• Connectivity of characters
• Variation in the stroke thickness
• Little noise
0.1. Constrained hand-printed characters
• Characters are written according to some instructions or box guidelines
Solved problem
Introduction
16
Levels of difficulty in character recognition
Level 11.0. Printed characters of multiple fonts
1.1. Loosely constrained hand-printed characters
1.2. Chinese characters of few fonts.
1.3. Loosely constrained hand-printed characters. Nº characters ≈ 1000
Solved problemIntroduction
9
17
Levels of difficulty in character recognition
Level 22.0. Printed characters of multiple fonts
2.1. Unconstrained hand-printed characters
2.2. Affine transformed characters
Introduction
18
Levels of difficulty in character recognition
Level 33.0. Caràcters no separats o trencats
3.1. Cursive handwriting characters
3.2. Characters on a textured background
Introduction
10
19
Databases for OCR
Off-line Hand-written characters On-line characters
Machine-printed
CEDAR CENPARMI NIST UNIPEN Univ. Washington
• 50.000 segmented numerals from zip codes
• 5.000 zip codes • 5.000 city
names • 9.000 state
names
• 17.000 manually segmented numerals from zip codes
• More than 1.000.000 characters from forms
• Several learning and test sets (more variability)
• 91.500 sentences with a dictionary
• Definition of a format to represent on-line data
• 4.500.000 characters
• Segmented characters, words and sentences
• More than 1500 pages of articles in english
• More than 500 pages of articles in japanese
• Originals, photocopies and pages with arificially generated noise
• Page segmentation into labeled zones
Introduction
20
Performance evaluation of OCR systems
• Hand-printed Character Recognition– Institute for Posts and Telecommunications Policy (IPIP) – Japan - 1996– 5000 hand-written numerals from japanese zip codes– Performance of the best system: 97.94% (human performance: 99.84%)
• Machine-printed Character Recognition– “The fifth Annual Test of OCR Accuracy”. Information Science Research
Institute. TR-96-01. April 1996. http://www.isri.unlv.edu– 5.000.000 characters from 2000 pages in journals, newspapers, letters and
technical reports– Performance in good quality documents: 99.77% - 99.13%– Performance in medium quality documents: 99.27% - 98.21%– Performance in low quality documents: 97.01% - 89.34%
• Performance 99% => 30 errors /page (3000 characters/page)
Introduction
11
21
Performance evaluation of OCR systems
Introduction
0.8499.1630,000MLP
NIST SD19 database
2.3497.662007VSVM
USPS database
1.398.72000VSVM
CENPARMI database
1.6898.3210,000POE
0.8299.1810,000LeNet-5
0.5699.4410,000VSV2
0.3899.6210,000VSVMb
0.9499.0610,000GPR
MNIST database
Error (%)Recognition (%)N of test samplesClassifier
C.Y. Suen, J. Tan: Analysis of errors of handwritten digits made by a multitude of classifiers. PRL 26, pp. 369-379. 2005
22
Performance evaluation of OCR systems
Introduction
62.7011.7525.56100Percentage (%)39574161630Sum81830119MLP
NIST SD 19 database21131347VSVM
USPS database164626VSVM
CENPARMI database118941168POE51141782LeNet53291556VSV21761538VSVMb59112494GPR
MNIST databaseCategory 3Category 2Category 1N. of errorsClassifier
12
23
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
Introduction
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
24Acquisition
Acquisition. Scanners
A scanner is a linear camera with a lighting and a displacement system
text
Document Lights
Scan line Opening
CCD
Video circuit Digital image
Lens
13
25
Important features in a scanner:• Optical resolution / Interpolated resolution. • Bits/píxel (depth).• Speed (acquisition and calibration).• Connection (paralel, USB, SCSI).• Programming tools (TWAIN protocol, specific programming languages, such as SCL d’HP, etc.).• Automatic feeding
Acquisition. Scanners
Types of scanners:• Flatbed scanners. The CCD line is displaced along the paper• Traction scanners. The paper is displaced through the CCD line• Others. Specific scanners for negative films, cards, passports, etc.
Acquisition
26
Acquisition. Scanners vs cameras
• Resolution determines:• Quality of image• Size of image• Speed of acquisistion
• Minimal resolution for OCR: 200dpi. • 12 points character (2x3mm aprox.) at 200dpi generates an image 16x24.
• A4 page (297x210 mm) • Scanner at 200dpi : image 2376x1680. • Camera 1024x1024 pixels: resolution of 90dpi.
• Dni (85x55 mm) • Scanner at 200dpi: image 670x435 punt. • Camera 1024x1024 pixels: resolució of 300dpi.
B 200 dpi Noise
Acquisition
14
27
Acquisition. Scanners vs cameras
• Advantages of scanners:– Cost
– Resolution
– Lighting is under control
– Control of optical distortions
• Advantages of cameras:– Acquisition speed
– More flexibility to adapt to the environment and to the material to read
Acquisition
28
On-line acquisition
• Input device: simulates pen and paper– The image is acquired while it is generated– Special input device that provides x and y coordinates along the time.
• Components of the device– Pen, paper and support. At least, one of these components must be special
• Technical specifications– Resolution– Sample frequency– Precision
Digitizer tabletwithout display
Digitizer tabletwith display
Tablet PC Digital penand paper
Acquisition
15
29
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
Pre-processing
30
Binarization
• Global methods: – Apply the same global threshold to all the pixels of the image
• Local adaptative– Apply a different threshold to every pixel depending on the local
distribuition of gray values
• Special case: binarization of textured backgrounds
Pre-processing
O.D. Trier, T. Taxt: Evaluation of binarization methods for document images. IEEE Trans on PAMI, vol. 17, nº 3, pp. 312-315. 1995
16
31
Binarization: Otsu
• Two classes of gray-scale levels: black pixels (foreground) and white pixels (background).
• Probabilistic criterion to select the threshold:– To maximize inter-class variability
– To minimize intra-class variability
Pre-processing
N. Otsu: A threshold selection method from gray-level histograms. IEEE Trans on systems, man andcybernetics, vol. 9, nº 1, pp. 62-66, 1979
32
Otsu
• The probability distribution of gray levels is defined as:
where ni is the number of pixels with gray level i and N is the total number of pixels.
• We want to define two classes of gray levels:– C1: [1..k]
– C2: [k+1..L]
where k is the threshold
• The mean and standard deviation of each class are defined as:
N
np i
i =
∑=
=k
i
CiiP1
11 )|(µ ∑+=
=L
ki
CiiP1
22 )|(µ
11)|(
ωip
CiP =2
2 )|(ω
ipCiP =
( )∑=
−=k
i
CiPi1
12
121 )|(µσ ( )∑
+=
−=L
ki
CiPi1
22
222 )|(µσ
∑=
=L
iiT ip
1
µ
( )∑=
−=L
iiTT pi
1
22 µσ
∑=
=k
iip
11ω ∑
+==
L
kiip
12ω
Pre-processing
17
33
Otsu
• We define within-class and between-class variances:
• The following criteria of class separability are equivalent:
• Then, the optimal threshold is:
222
211
2 σωσωσ +=W
( ) ( )222
211
2TTB µµωµµωσ −+−=
2
2
W
B
σσλ = 2
2
W
T
σσκ =
2
2
T
B
σση =
1+= λκ 1+=
λλη
)(maxarg 2
1kk B
Lkσ
≤≤=
Pre-processing
34
Binarization: Local adaptative binarization
• The threshold at every pixel depends on the local distribution of gray levels on the neighborhood of the pixel.
• Niblack’s method:
– m(x,y) and s(x,y) are the mean and standard deviation in a local neighborhood of the pixel.
– Window size: 15 x 15– k = -0.2
• Eikvil et al.’s method:
– For every pixel, we define a small window S (3 x 3) and a large window L (15 x 15).
– The threshold value is selected by applying Otsu to the large window L.– The pixels in the small window S are thresholded using this value.
),(),(),( yxskyxmyxT ⋅+=
Pre-processing
W. Niblack: An introduction to digital image processing, pp. 115-116, Prentice Hall, 1986
L. Eikvil, T. Taxt, k. Moen: A fast adaptative method for binarization of document images. International Conference on Document Analysis and Recognition 2001, pp. 435-443
18
35
Binarization: evaluation
Otsu
Otsu
Eikvil
Niblack
36
• Visual criteria of evaluation– Broken line structures: gaps in lines
– Broken symbols and text: symbols and text with gaps
– Blurring of lines, symbols and text
– Loss of complete objects
– Noise in homogeneous areas: noisy spots and flase objects in bothbackground and print
• Scale of 1-5 for each criterion
Binarization: evaluation
19
37
Binarization: evaluation
38
Binarization: Textured Backgrounds
1. Selection of candidate thresholds• Iterative application of Otsu
• At each iteration Otsu is applied to the part of the histogram with the lowest mean
• The number of iterations depends on the number of peaks in the histogram
• We get a set of possible threshold values
2. Computation of a set of texture features for each possible binarization• Based on the computation of run-length histogram
of the binarized image, R(i), i∈{1,..L}
3. Selection of the optimal binarization
Pre-processing
Y. Liu, S. Srihari: Document image binarization based on texture features: IEEE Trans. on PAMI, vol. 19, nº 5, pp 540-544, 1997
20
39
Binarization: Textured Backgrounds
• Texture features– Stroke width: run-length with the highest frequency
– Stroke-like pattern noise: relevance of stroke-like patterns in the background between consecutive threshold selections. If it is high, it denotes stroke-like patterns due to texture.
– Unit run noise: relevance of unit run-lengths. Ideally, it should be low
– Long run noise: relevance of long rung-lengths. Ideally, it should be low
– Broken character: it will be high if characters are broken.
iiiR 1),(maxargSW ≠=
1,)(max
)(max
1
≠=+
iiR
iRSPN
ji
ji
1,)(max
)1( ≠= iiR
RURN
i
1,)(max
)(≠= ∑ > i
iR
iRLRN
i
Li
)}(maxarg,0{',)(max
)(min ' iRIiR
iRBC i
i
Ii == ∈
Pre-processing
40
Binarization: Textured Backgrounds
• Experimental results show it is enough with 2 iterations of otsu
• Decision tree to select the optimal threshold (between T1 and T2, T1 > T2)
1. Select the value with larger stroke width feature.
2. If T1 is the selected value,– If background noise features are low, T1 is the selected threshold.
– Otherwise, if T2 does not result in many, broken characters, select T2
3. If T2 is the selected value,– If the broken character feature is low, select T2
– Otherwise, if noise features are low with T1, select T1
4. If neither of both thresholds is quite good, select the average between them
Pre-processing
21
41
Binarization of textured backgrounds
Pre-processing
42
Skew Correction : Projection profiles
1. Compute the horizontal projection at several angles (depending on the desired resolution)
2. For every projection, compute a directional criterion (that estimates the difference between maxima and minima in the projection)• Sum of squared differences in adjacent rows• Variance of the number of black pixels in a scan line
3. Select the angle that maximizes the directional criterion
Pre-processing
W. Postl: Detection of linear oblique structures and skew scan in digitized documents. 8th InternationalConference on Pattern Recognition, pp. 687-689, 1986
22
43
Skew Correction : Projection profiles
• Modification of the previous algorithm: – Use only the bottom-centers of connected components to compute the
projection profiles – Reduces computation cost
H.S. Baird: The skew angle of printed documents. Proc. Of the Society of Photographic Scientist andEngineers, pp. 14-21, 1987.
44
Analysis of neighboring connected components
1. Compute connected components.2. For every connected component, search the k nearest negihbours
(k = 5)3. For every pair of connected components, compute the angle
between the centroids.4. Compute the histogram of angles.5. Estimation of the skew angle: maximum of the histogram
Pre-processing
L. O’Gorman: The document spectrum for page layout analysis. IEEE Trans. on PAMI, vol. 15, nº 11, 1993
23
45
Analysis of neighboring connected components
• Accurate angle estimation:1. Find connected components in the same line (clusters of pairs of
connected components with an angle near the rough estimation).
2. Fit a straight line (using regression) to the centroids of components in each line.
3. Make the final estimation from these text lines.
Pre-processing
46
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
Segmentation
24
47
Page Layout Analysis
• Layout analysis: segmentation of the image into several blockswith the same type of information: text, graphics, table, image, etc.
• Methods:– Run-length smearing
– Analysis of connected components
Segmentation
48
Page Layout Analysis
25
49
Page Layout Analysis: Run-length smearing
1. Horizontal run-length smearing• Threshold: inter-character separation:
maximum of the histogram of the width of white runs
2. Vertical smearing• Threshold: inter-line separation: estimated
during skew correction3. Logical AND between both images4. Additional horizontal smearing5. Connected components are the blocks6. Computation of features for each
connected component: aspect ratio, black pixels density, Euler number, perimeter length, perimeter to width ratio, perimeter-squared to area ratio
1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 11 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
Segmentation
J.L. Fisher, S. C. Hinds, D.P. D’amato: A rule-based system for document image segmentation. 10th International Conference on Pattern Recognition, pp. 567-572, 1990.
50
Page Layout Analysis: Run-length smearing
• A set of rules classifies each block into text or non-text according to the features for each connected component
26
51
Page Layout Analysis: Run-length smearing
Segmentation
52
Page Layout Analysis: Analysis of connected components
1. Detection of connected components
2. Definition of distance and overlap among components
3. Grouping of connected components in the same line:• Dx below a distance threshold
• Vy above an overlap threshold
)](),(min[)](),(max[),( jliljuiujix oXoXoXoXooD −=
)](),(min[)](),(max[),( jliljuiujiy oYoYoYoYooD −=
)](),(min[
),(),(
ji
jixjix oWoW
ooDooV
−=
)](),(min[
),(),(
ji
jiyjiy oHoH
ooDooV
−=
Segmentation
A.K. Jain, B. Yu: Document representation and its application to page decomposition. IEEE Trans. onPAMI, vol. 20, nº 3, pp. 294-308, 1998
27
53
Page Layout Analysis: Analysis of connected components
4. Classification of lines between text and non-text• Text lines:
• Height below a threshold and horizontally aligned (standard deviation of bottom edges of CC below a threshold)
• Width over a threshold and all CC have similar height (ratio between mean and standard deviation less than a threshold)
5. Grouping of text lines into text regions• Vertically close and horizontally overlapped
6. Grouping of non-text lines into non-text regions• Vertically close and horizontally overlapped• Horizontally close and vertically overlapped
7. Identification of image regions from non-text regions• Large regions• The ratio of black pixels is large
8. Identification of table regions• Detection of horizontal and vertical lines with similar orientations• Similar height of CC
9. Remaining non-text regions: drawing regions
Segmentation
54
Page Layout Analysis: Analysis of connected components
Segmentation
28
55
Page Layout Analysis: Multiscale analysis
1. Application of wavelets to obtain a multiscale representation
2. Computation of local features (local moments) from the wavelet representation
3. Training of a neural network to classify each block as text, image or graphics according to these local features.
4. Propagation of classification through adjacent blocks and between different scales of the wavelet representation
( )n
Wxw
iin
i
ifxf
WW
1
)(||
1)( ⎟⎟
⎠
⎞⎜⎜⎝
⎛−= ∑
∈
−µ
Segmentation
56
Page Layout Analysis: Performance Evaluation
• Goals– Evaluate the performance of several commercial OCR engines on a set
of journal pages
– Use the results of the evaluation to define methods for the combinationof these engines in order to improve the overall performance
• Therefore, evaluation must permit to determine the strengths andweaknesses of each method
29
57
Page Layout Analysis: Performance Evaluation
Segmentation
Ground-truthOutput of segmentation
Error zones
CorrectMisrecognition
Unrecognition
58
Page Layout Analysis: Performance Evaluation
30
59
Page Layout Analysis: Performance Evaluation
• 5 types of zones:– Text
– Graphics
– Table
– Background
– Text Over Image
• Comparation between the ground-truth and the output of engines
Output zones
Ground-truth zones
Overlapping between output and ground-truth zones
Segmentation
60
Page Layout Analysis: Performance Evaluation
• Evaluation measures: more than 100 measures grouped in sixcategories– Good recognition measures: percentages of ground truth area (grouped
by zone type), recognized as zones with same type in the output, i.e. text zones in ground truth recognized as text zones, graph zones recognized as graph zones, etc.
– Unrecognition measures: relative area of zones with type text, graph, textoverimage or table in ground truth recognized as background
– Misrecognition measures: zones in ground truth recognized as a different type, for example text zone recognized as (graph, textoverimage or table).
– Overlap measures: relative area (grouped by type) recognized twice by a zoning engine.
– Split and merge measures: how many zones are recognized and assigned in terms of splitting and merging errors.
Segmentation
31
61
Page Layout Analysis: Performance Evaluation
• Experiments– Creation of the ground-truth for 100 journal pages
– Evaluation of six OCR engines
– Tests with two formats of images (TIFF and JPEG) and 4 imageresolutions (100 dpi, 200 dpi, 300 dpi and 400 dpi)
– Evaluation of a simple combination scheme
Segmentation
62
Page Layout Analysis: Performance Evaluation
32
63
Page Layout Analysis: Performance Evaluation
64
Page Layout Analysis: Performance Evaluation
33
65
Page Layout Analysis: Performance Evaluation
66
Page Layout Analysis: Performance Evaluation
34
67
Page Layout Analysis: Performance Evaluation
68
Page Layout Analysis: Performance Evaluation
Text correct recognition
0102030405060708090
100
TIFF400 TIFF300 TIFF200 TIFF100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
Text correct recognition
0102030405060708090
100
JPG400 JPG300 JPG200 JPG100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
Text misrecognition
0102030405060708090
100
TIFF400 TIFF300 TIFF200 TIFF100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
Text missrecognition
0102030405060708090
100
JPG400 JPG300 JPG200 JPG100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
Segmentation
35
69
Page Layout Analysis: Performance Evaluation
Graph correct recognition
0102030405060708090
100
TIFF400 TIFF300 TIFF200 TIFF100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
Graph correct recognition
0102030405060708090
100
JPG400 JPG300 JPG200 JPG100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
ToI correct recognition
0102030405060708090
100
TIFF400 TIFF300 TIFF200 TIFF100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
ToI correct recognition
0102030405060708090
100
JPG400 JPG300 JPG200 JPG100
ABBYY 6
ABBYY 7
PODCORE
PODCORE2
SCANSOFT
XVISION
COMB
Segmentation
70
Page Layout Analysis: Performance Evaluation
• Conclusions– Image format makes no difference in the final results
– No ideal resolution• Sometimes better at 300dpi, sometimes, better at 400dpi
• Results are a little bit lower at 200dpi or 100dpi
– Combination can improve the results, but more advanced combinationschemes should be defined
Segmentation
36
71
Text/Graphics Separation
• Analysis of connected components– Size: characters are smaller than graphic components– Aspect ratio x-y: characters are more “squared” than graphic components– Pixel density: characters are more dense than graphic components
• Grouping of characters: Hough transform and component proximity
• Difficulties– Joined characters– Characters touching to lines
Segmentation
L.A. Fletcher, R. Kasturi: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. on PAMI, vol. 10 nº 5, pp. 910-918, 1988
72
Text/Graphics Separation
• Detection and removal of lines at given orientations: horizontal, vertical, ±22.5, ±45, ±67.5
– Detection of long consecutive runs of black pixels after rotating the image at given orientations.
• Analysis of connected components to separate text and graphics
Z. Lu: “Detection of Text Regions From Digital Engineering Drawings”. IEEE Trans. on PAMI, Vol. 20, n.4, pp. 431-439. April 1998.
Specific methods
37
73
Text/Graphics Separation
•Vertical and horizontal run-length smearing to join components
•Classification of final components as text or graphics based on density and size of components
•Recovering of the original image from the enclosing rectangles of text components
74
• Segmentation of characters in blocks of text• Levels of difficulty:
– Characters with uniform separation and fixed width
– Well separated characters with proportional width
– Broken characters
– Touching characters
– Broken and touching characters
– Cursive script
– Hand-printed words
– Handwritten cursive words.
Character segmentation
IntroduccióSegmentation
•R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996.•Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1, pp. 67-80, 1995.•Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. PatternRecognition, Vol. 29, n. 1, pp. 77-96, 1996.
38
75
• Relevant features for segmentation– Character width and height
– Distance between characters
– Inter-character interval (distance between character centres)
– Aspect ratio.
– Baseline and Top Baseline
– Ascenders and Descenders
Character segmentation
Segmentation
76
Character segmentation: classification of methods
• External segmentation:– Segmentation before recognition. Independent processes
– The goal is to find the exact location of character separation
– Low performance with cursive script, touching characters or handwriting
• Internal segmentation:– Based on Sayre Paradox: a letter cannot be segmented without being
recognized and cannot be recognized without being segmented
– Segmentation and recognition are done at the same time
– Recognition generates or validates segmentation hypothesis
• Holistic methods– No character segmentation
– Recognition tries to recognize words without recognizing individual characters
Segmentation
39
77
Character segmentation: classification of methods
analytical holistic
external(based on image)
internal(based on recognition)
hybrid
dissectionpost-process(graphemes)
windowing Feature-based
Hidden MarkovModel
No Markov
MarkovDynamic
programming
Segmentation
78
External segmentation
• Image decomposition in sub-images using general features
• Each subimage corresponds to a possible character
• Combination of several methods:– Connected component labelling
– Run-Lenght smearing.
– Projections and X-Y tree decomposition
– Analysis of contours
– Analysis of profiles
Segmentation
40
79
External segmentation : Connected component labelling
?
Original image Labelling accordingto neighbours
End of labelling Unification of equivalent classes
Segmentation
80
External segmentation : Projections and X-Y trees
If text lines are perfectly separated, only one vertical projection is required. Otherwise, it is necessary to apply several vertical and horizontal projections
i
... ... ...1.1 1.2 1.10 2.1 2.2 1.92.4
2.4.1 2.4.2
1 2projecció hor.
projecció vert.
projecció hor.
t h s c o
.
X-Y tree
Segmentation
41
81
External segmentation : projections
)(
)1()(2)1()(
xV
xVxVxVxF
++−−=
It is more robust to use the second derivative of the projectionwith respect to the value at each point of the histogram (it enhances the projection minima)
Segmentation
82
AND
Smearingvertical
External segmentation: Run-length smearing
• Run-lengths: sequences of consecutive pixels with the samecolour in a row or column
• Smearing: inversion of runs with a length below a certainthreshold
Smearinghoritzontal
Segmentation
42
83
External segmentation: analysis of profiles
• Determination of the point of separation between characters:– Follow the profile beginning at a minimum up to the following
maximum– Distance between upper and lower profile
Upper profile
Lower profile
Segmentation
84
External segmentation: post-process
• Problem: broken and touching characters– Analysis of the bounding box: definition of several rules that permit
to join or break them properly• Estimated character size (width and height)
• Number of estimated characters
• Component aspect ratio
• Proximity/overlapping of bounding boxes
Segmentation
43
85
External segmentation: post-process
• “Hit and Deflect” strategy– Starting point: maximum of lower profile / minimum of upper profile
– Contour following:• Vertical scan to find a contour point
• Move the scan point according to the value of neighbouring pixels:– To the right/left if one pixel correspond to the character and the other does not
– Up, if both neighbouring pixels belong to the character
Segmentation
M. Shridhar, A. Badreldin. “Recognition of Isolated and Simply Connected Handwritten Numerals”. Pattern Recognition, Vol. 19, n. 1, pp. 1-12, 1986.
86
External segmentation: oversegmentation
• Segmentation of the image into subimages that do notnecessarily correspond to individual characters
• Analysis form the contour minima and maxima
– Detection of significant contour minima:• Cut points
• Lower extreme of a character
– Generation of possible cut points:• For each contour minima, search to the left and to the right those points
that correspond to a single vertical run with low density
– Compactation of nearby cut points
Segmentation
R. Bozinovic, S. Srihari. “Off-line Cursive Script Word Recognition”. IEEE Trans. on PAMI, Vol. 11, n. 1, 1989
44
87
Character segmentation: classification of methods
analytical holistic
external(based on image)
internal(based on recognition)
hybrid
dissectionpost-process(graphemes)
windowing Feature-based
Hidden MarkovModel
No Markov
MarkovDynamic
programming
Segmentation
88
Internal segmentation
• Segmentation and recognition at the same time
• Two approaches:– Windowing
• Sequential scan the image from left to right
• Generation of segmentation hypotheses
• Detection of the cut points with the best recognition performance (verification step)
– Based on image features• Feature detection
• Generation of possible correspondences between features and letters
• Search for the best possible combination among all correspondences
• Two types of methods: Markov-based and non Markov-based
Segmentation
45
89
Internal segmentation: windowing
• A mobile window is used to generate possible segmentation sequences
• Each possible segmentation is validated by the recognition process
• Search for a segmentation sequence that yields to a valid final result
Segmentation
R.G. Casey, G. Nagy: Recursive segmentation and classification of composite patterns. 6th International Conference on Pattern Recognition, pp. 349-451, 1986
90
Internal segmentation: windowing
• Shortest Path Segmentation– Representation of all segmentation possibilities
with a graph:• Nodes: all possible combinations of pre-
segmented zones
• Edges: neighbour compatibility between pre-segmented zones
– Using a neural network, each node is assigned a recognized character, with a measure ofconfidence (distance)
– Finding the shortest path in the graph isequivalent to finding the best possiblesegmentation
Segmentation
C.J.C. Burges, J.I. Be, C.R. Nohl: Recognition of handwritten cursive postal words using neuralnetworks. Proc. USPS 5th Advanced Technology Conference, p. A-117, 1992
46
91
• Graph-based:– Representation of image skeleton with a graph
– Subgraph matching to find possible correspondences with the characterprototypes
– Creation of a network• Nodes: recognized prototypes labelled with the matching cost
• Edges: adjacency relationship among nodes
– Recognition: searching the optimal path in the network
Internal segmentation: Feature-based
CU
I J
S TMO
C X T
E R
Segmentation
J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9, pp. 903-909, 1995
92
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
Character pre-processing
47
93
Character pre-processing
Some usual pre-processing operation in OCR:
• Filtering: noise reduction
• Thinning
• Binarization
• Normalization: – Reduce character variability
– Converting the character to a normal shape:• Orientation
• Slant
• Size
• Stroke thickness
Character pre-processing
94
Normalization
Inverse transforms to reduce intra-class variance
The most usual normalization transforms are:
• Rotation. Rotated scan, text in graphic documents
• Slant. Cursive fonts or handwriting
• Stroke thickness. Bold fonts of very thin strokes, handwriting with different pen thickness
• Size. Titles, footnotes, handwriting
Character pre-processing
48
95
Normalization
⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟
⎠
⎞⎜⎜⎝
⎛
y
x
t
t
y
x
aa
aa
y
x
2221
1211
'
'
Type Parameters Meaning
Traslation aij=0 ; i,j=1,2 tx, ty: traslation factors
a11=sx a12=0 Scaling
a21=0 a11=sy
sx, sy: scaling factors
a11=cos α a12=-sin α Rotation
a21=sin α a22=cos α α: rotaion angle
a11=1 a12=tg β Slant
a21=0 a22=1 β: slant angle
Character pre-processing
96
Normalization
• Rotation.– To determine the baseline of a set of aligned characters: projection
analysis, Hough transform
– To determine the orientation of a single characters: inertia axes (moments of second order)
• Slant.– Approximation of slant angle from the regression line of the character
pixels
– Approximation from the orientation of “vertical” segments in a word
Character pre-processing
49
97
Normalization
• Size.– Normalization to a standard size
– Normalization of the relation between x size and y size
– Usually, it is done from the bounding box of the character
– Some pixel resamping is required: interpolation to avoid ”aliasing”
• Stroke thickness– It is not straightforward to determine because of variability in the
character itself
– Approximation of stroke thickness:• It is assumed that the character stroke has length l and width w.
• The width is estimated from the area and perimeter of the character
– Then, morphological operations (dilation and erosion) are used to normalize the character to standard width
Character pre-processing
98
Non-linear normalization
• Optical distortion– Parameter estimation with least squares criterion using a calibration
image
tcoefficien Distorsion :
tcoefficienion Magnificat :
)(
)(
'
'22
22
d
m
dm
C
C
yyx
xyxC
y
xC
y
x⎟⎟⎠
⎞⎜⎜⎝
⎛
++
+⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟
⎠
⎞⎜⎜⎝
⎛
Character pre-processing
50
99
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
FeatureExtraction
100
Feature-based Recognition
• Probably, the selection of the method for feature extraction is the mostimportant factor in order to achieve good recognition performance
Which are the best features to discriminatebetween the characters?
FeatureExtraction
51
101
Feature Extraction
• Selection of appropiate features:– It is a critical decision
– It depends on the specific application
– Features must be invariant to the character variations (depending on theapplication): rotation, degradation, noise, shape distortion
– Low dimensionality to avoid large learning sets
– Features determine the type of information to work with: gray-levelimage, binary image, character contour, vectorization of the skeleton, etc.
– Features also determine the type of classifier
To extract from the image the most relevant information for classification, i.e., to minimize intra-class variability while maximizinginter-class variability
FeatureExtraction
102
Feature Extraction
• Image-based features– Projection– Profiles– Crossings
• Statistical features– Moments– Zoning
• Global transforms and series expansion– Karhunen-Loeve– Fourier descriptors
• Topological and geometric features. Structural analysis– Contour analysis– Skeleton analysis– Topological and geometric features
FeatureExtraction
O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996.
52
103
Image-based Features
• All the image as feature vector– Classification by correlation
– Very sensitive to noise, character distortion and similarity between classes.
• x and/or y projections– We can use the accumulated projection too
– Sensitive to rotation, distortion and large number of characters
• Peephole. – Coding with a binary number some pre-
selected pixels of the image
– Pre-selected pixels can vary depending on thecharacter to be recognized
010111011
AFeature
Extraction
104
Image-based Features
• Contour profiles– Left (right) profile: contour minimum (maximum) x at every y value
– Lower (upper) profile: contour minimum (maximum) y at every xvalue
– Feature:• Profile values
• Difference between consecutive profile values
• Maxima and minima of the profile
• Maxima of the difference between profile valuesFeature
Extraction
F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. PatternRecognition, 24(10), pp. 969-983, 1991
53
105
Image-based Features
• Crossing method– Features are computed from the number of times that the character is
crossed by vectors along some orientations, for example, 0o, 45o, 90o, 135o.
– Used in commercial systems because of speed and low complexity
– Robust to some distortions and noise
– Sensitive to size variations
FeatureExtraction
106
Statistical Features
• Methods based on the statistical distribution of pixels in theimage:– Geometric moments
– Zoning.
• Features are robust to distortion and, up to a certain extent, to some style variations
• Low computation time and easy to implement
• A learning step is needed to infer the model of characters
FeatureExtraction
54
107
Statistical Features: Zoning
• The image is divided in n x m cells.
• For each cell the mean of gray levels is computed and all these values arejoined in a feature vector of length n x m.
• We can also use information from the contour or any other feature computed in every zone
orientation num.
0o
45o
90o
135o
1520
FeatureExtraction
F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. PatternRecognition, 24(10), pp. 969-983, 1991
108
Statistical Features: Geometric Moments
• Moments of order (p+q) of image f:
– moo character area (in binary images)
– Center of gravity of the character:
• Central moments (centering the character at the center of gravity):
– Central moments of order 2 (µ20, µ02, µ11) permit to compute:• Main inertia axes
• Character length
• Character orientation
∑∑== =
N
x
N
y
qp
pqyxyxfm
1 1
)())(,(
00
10
m
mx =
00
01
m
my =
∑∑ −−== =
N
x
N
y
qp
pqyyxxyxf
1 1
)())(,(µ
⎟⎟⎠
⎞⎜⎜⎝
⎛−
=0220
112
2
1
µµµθ atan
FeatureExtraction
M. Hu: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theroy 8, pp. 179-187, 1962
55
109
Statistical Features: Geometric Moments
• Invariant moments:– Central moments µpq are traslation-invariant
– Scale-invariants
211
202202
02201
)( νννφννφ
+−=
+=
2 ,q)/2)(p(1
00
≥+= ++ qppqpq µ
µν
– Rotation-invariants (order 2):
– Invariant to general linear transforms:
( ) ( )( )2120321
2211230
2122103302
21102201
4 µµµµµµµµµµµµµ
−−−−=−=
I
I
400
11 µ
ψ I= 400
11 µ
ψ I=
FeatureExtraction
– A set of moment invariants of different orders can be defined in a similar way
T.H. Reiss: The revised fundamental theorem of moment invariants. IEEE Trans. PAMI, vol. 13, nº 8, pp. 830-834, 1991
110
Statistical Features: Zernike Moments
• Geometric moments: – Projection of the function f(x,y) over the monomial xpyq (No orthogonality =>
information redundancy)
• Zernike moments: – Change to polar coordinates to achieve orthogonality and rotation invariance
– Projection of the image over the Zernike polynomials Vnm which are orthogonalinsidethe unitary circle x2+y2=1.
∑−
=
−
⎟⎟⎠
⎞⎜⎜⎝
⎛−
−⎟⎟⎠
⎞⎜⎜⎝
⎛−
+−−=
−≤≥
+==−=
==
2/) (
0
2
22
!2
!
2
!
)!()1()(
and even, is , ,0
),/tan( ,1 where
)(),(),(
mn
s
sns
nm
jmnmnmnm
smn
smn
s
snR
mnnmn
yxxyaj
eRVyxV
ρρ
ρθ
ρθρ θ
FeatureExtraction
A. Khotanzad, Y.H. Hong: Invariant image recognition by Zernike moments. IEEE Trans. PAMI, vol. 12, nº 5, pp. 489-497, 1990.
56
111
Statistical Features: Zernike Moments
• The image is projected over the Zernike polynomials:
• Anm coefficients are the Zernike moments of order n and repetition m:
where x2+y2≤ 1 (the image must be re-scaled to the unitay circle) and * is theconjugate complex
• | Anm| is rotation invariant
• Relation between Zernike moments and geometric moments:
∑∑+=x y
nmnm Vyxfn
A *)],()[,(1 θρ
π
∑∑=n m
nmnmVAyxf ),(),( θρ
( )
( )π
µµπ
µµµ
ππµ
123
,23
,0
,1
022020
11200222
1,111
0000
−+=
−−=
==
==
−
A
jA
AA
A
FeatureExtraction
112
Statistical Features: Zernike Moments
Rows 1 and 2: Zernike moments of order 1-13. This equation is used to display them:
Rows 3 and 4: Image reconstruction from Zernike moments of order 1-13. •To reconstruct the image from Zernike moments:
Orders 1 and 2, for example, represent orientation, height and width
parell. i ,1on ,),(),( 22 mnnmyxyxVAyxIm
nmnmn −≤≤+= ∑
∑ ∑=
−≤∞→
=N
nmn
nmmnm
NyxVlimyxf
0parell
, , nm ),(A),(
FeatureExtraction
57
113
Statistical Features: Zernike Moments
• Image reconstruction using moments up to order 10 (66 moments).
FeatureExtraction
114
Statistical Features: Zernike Pseudo-Moments
• Less noise sensitive than Zernike moments
• Better recognition results
– Removing the constraint:
( ) ( )∑−
=
−
−−−++−+−=
mn
s
snm
nm smnsmns
snR
0 ! !1 !
)!12()1()(
ρρ
venmn e−
FeatureExtraction
58
115
Statistical Features: Invariant Moments
• Experiments show that a robust OCR system needs, at least, 10-15 features, i.e, we need to define between 10 and 15 geometricinvariants
• Handwritten digit recognition:– Moments up to order 6:
• Regular moments (24 moments): 94%
• Zernike moments (23 moments): 95%
• Zernike psedudo-moments (44 moments): 91.5%
– Moments of upper orders: decrease in recognition performance
FeatureExtraction
S.O. Belkasim, M. Shridhar, M. Ahmadi: Pattern Recognition with Moment Invariants: A Comparative Study and New Results. Pattern Recognition, Vol. 24, n. 12, pp. 1117-1138, 1991.
116
Transform-based features
• Instead of using the character image to compute the feature vector, a linear transform is applied to compute the features
where T is a matrix of constant values
• These transforms help to reduce the dimensionality of the feature vector, preserving the most relevant information of the shape of the character
• Original image can be reconstructed from the feature vector
• Features are invariant to some global deformations, such as traslation and rotation
• High computational cost
• Some examples:– Karhunen-Loeve.
– Sèries de Fourier.
fTg ⋅=
FeatureExtraction
59
117
Transform-based features: Karhunen-Loeve Expansion
• KLT is defined as:
where f is the feature vector of the image and f is the mean of all the samples representing the character
• Each column of T is an eigenvector of the covariance matrix:
• Usually, only M (M<d ) eigenvectors are used , corresponding to the largest eigenvalues. In this way, dimensionality is reduced
)( ffTg t −⋅=
∑==
N
ii
fN
f1
1
∑=
−⋅−=N
i
tii ffff
NC
1
)()(1
FeatureExtraction
118
Transform-based features: Karhunen-Loeve Expansion
• For each image, we get a feature vector x of dimension d
• The learning set is composed of n samples per class
• The set of samples is represented by matrix X, of dimension nxd.
• For each class, we can compute the covariance matrix R:
• Then, the transform matrix T is defined as:
• The transformation of an image x is:
Tii
T xxxxn
XXn
R ))((1
1
1
1 −−−
=−
= ∑
[ ]
d
d
d
d
R
Rvv
vvT
λλλλλ
L
L
L
L
≥≥
=
21
1
1
1
matrix covariance of seigenvalue:
matrix covariance of rseigenvecto:
xTy T=
FeatureExtraction
60
119
Transform-based features: Karhunen-Loeve Expansion
• Usually, the dimensionality is reduced and we only take the m eigenvectors of R with greater weight
• Then, the transform matrix becomes:
• For each image xi, the feature vector yi is:
• Usually m is selected in such a way that the eigenvectors explain some pre-specified percentage of the total variance (usually 0.9 or 0.95).
• The percentage of variance explained by the eigenvectors is given by:
[ ]mvvP L1=
iT
i xPy =
∑=
m
ii
1
λ
FeatureExtraction
120
Transform-based features: Karhunen-Loeve Expansion
• Application of KL transform to the NIST database:– Digit recognition: 96% - 97%
– Uppercase recognition: 89% - 90%
– Lowecase recognition: 77% - 82%
M. D. Garris, J.L. Blue, G. T. Candela, P. J. Grother, S. A. Janet, C.L. Wilson: NIST Form-basedhandprint recognition system (release 2.0). Technical report NISTIR 5959, National Institute ofStandards and Technology, USA, 1994.
61
121
Transform-based features: Fourier Descriptors
• Decomposition as a Fourier series of a periodic function of period T
∑∞
−∞=
⋅=n
T
nti
n eCtfπ2
)( ∫−
⋅=T
T
nti
n dtetfC0
2
)(π
∫
∫
∫
∑
⋅=
⋅=
=
⎟⎠⎞
⎜⎝⎛ ⋅+⋅+=
∞
=
T
n
T
n
T
nnn
dtTnt
tfT
b
dtTnt
tfT
a
dttfT
A
Tnt
bTnt
aAtf
0
0
00
10
2sen)(
2
2cos)(
2
)(1
2sen
2cos)(
π
π
ππ
FeatureExtraction
122
The shape contour can be described in polar coordinates:
The contour can be described as a function of the tangent angle:
Defining the function of accumulation of the tangent angle:
This function is normalized in the range [0, 2π]
Finally, this function can be descomposed in Fourier descriptors in the following way:
Transform-based features: Fourier Descriptors
)()()( siysxsz +=
[ ]
[ ] ααθ
ααθ
dysy
dxsx
s
s
)(sin)0()(
)(cos)0()(
0
0
∫∫
+=
+=
∑∞
=
++=Φ1
0* )sincos()(
kkk ktbktaat
)0()()( θθ −=Φ ll
tLt
t +⎟⎠⎞
⎜⎝⎛Φ=Φ
π2)(*
FeatureExtraction
C. T. Zhan, R. Z. Roskies: Fourier descriptors for plane closed curves. Trans. on computers, vol. C-21, nº 3, pp. 269-281,1972
62
123
Transform-based features: Fourier Descriptors
• In the discrete case:
• Then:
• Fourier descriptors depend on the starting point• For 64x64 images, it has been shown that 5 coefficients are enough to
discriminate between “2” and “Z”
∑=
−
−−
∆Φ=Φ
⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−
=∆Φ
+=
j
kjj
jj
jjj
jjj
ss
sxsx
sysys
siysxsz
1
1
11
)()(
)()(
)()(tan)(
)()()(
∑
∑
∑
=
=
=
∆Φ=
∆Φ−=
∆Φ−−=
m
k
kn
m
k
kn
m
kk
L
nlk
nb
L
nlk
na
klL
a
1
1
10
2cos)(
1
2sin)(
1
)(1
ππ
ππ
π
FeatureExtraction
124
Transform-based features: Fourier Descriptors
. when )()(ˆ ),()(ˆ
andlength contour theis where
2sin
2cos)(ˆ
2sin
2cos)(ˆ
10
10
∞→≡≡
⎥⎦⎤
⎢⎣⎡ ++=
⎥⎦⎤
⎢⎣⎡ ++=
∑
∑
=
=
Ntytytxtx
T
T
tnd
T
tncCty
T
tnb
T
tnaAtx
N
nnn
N
nnn
ππ
ππ
[ ]
[ ]
[ ]
[ ]
pixelscontour ofnumber ,
,
, , ,/2on
sinsin2
coscos2
sinsin2
coscos2
1
1
22
11
11
22
11
22
11
22
11
22
mttT
ttyxt
yyyxxxTtn
t
y
n
Td
t
y
n
Tc
t
x
n
Tb
t
x
n
Ta
m
jjm
i
jjii
iiiiiiii
ii
m
i i
in
ii
m
i i
in
ii
m
i i
in
ii
m
i i
in
∑
∑
∑
∑
∑
∑
=
=
−−
−=
−=
−=
−=
∆==
∆=∆+∆=∆
−=∆−=∆=
−∆∆=
−∆∆=
−∆∆=
−∆∆=
πφ
φφπ
φφπ
φφπ
φφπ
Invariance to starting point
The phase shift with respecte the main axis is computed and the coefficients are rotatedaccording to this angle
2
1
2
1
2
1
2
1
11111 )(2tan
2
1
dcba
dcba
−+−+= −θ
In thediscrete
case
⎥⎦
⎤⎢⎣
⎡ −⋅⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
11
11
**
**
cossen
sencos
θθθθ
nn
nn
dc
ba
dc
ba
nn
nn
nn
nn
FeatureExtraction
Elliptic Fourier Descriptors
F.P. Kuhl, C.R. Giardina: Elliptic fourier features of a closed contour. Comput. Vis. Graphics ImageProcess, vol. 18, pp. 236-258, 1982
63
125
Transform-based features: Fourier Descriptors
Character 5 reconstructed using elliptic Fourier descriptors of order 1, 2, ..., 10; 15, 20, 30, 40, 50 and 100 respectively
Rotation invarianceThe orientation of the main semi-axis is computed and the coefficients rotated
*1
*1
1 arctgac=ψ ⎥
⎦
⎤⎢⎣
⎡⋅⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡**
**
11
11
****
****
cossen
sencos
nn
nn
nn
nn
dc
ba
dc
ba
ψψψψ
Scale invarianveCoefficients are divided by the magnitude of the main semi-axis
**1
*1
*1
22
acaE =+=
FeatureExtraction
126
Transform-based features: Fourier Descriptors
• Experiments with handwritten digits (100 images per digit)– 12 elliptic descriptors: 99.7%
– 12 non-elliptic descriptors: 99.5%
• Experiments with digits + lowercase letters:– 12 elliptic descriptors: 98.6%
– 12 non elliptics descriptors: 90.1%
FeatureExtraction
Evaluation
T. Taxt, J.B. Olafsdottir, M. Daehlen: Recognition of Handwritten Symbols, Pattern Recognition, Vol. 23, n. 11, pp. 1155-1166, 1990
64
127
Structural Analysis
• Contour analysis
• Skeleton analysis
• Analysis of topological and geometric features
Methods based on the analysis of the character structure from thedetection of some features and their relationships (the basic idea is to divide the character into its basic parts)
FeatureExtraction
128
2, 4, 5, 1, 4, 3, 1
Structural Analysis: Run-length encoding
• It is the simplest structural representation
• run-length encoding represents each image row as a sequence of pairs (l,g), where each pair represents a sequence of l consecutivepíxels with gray level g.
• For binary images, only the sequence length is required
0 0 4 4 4 2 2 2 2 2 4 4 4 1 1 1 1 1 1 1 1 0 0 0 0 (2,0), (3,4), (5,2), (3,4), (8,1), (4,0)
FeatureExtraction
65
129
Structural Analysis: Run-length encoding
A graph is built on the run-lengthencoding, where:
Nodes: run-lengths.
Edges: overlapping between runs in consecutive rows. 4,7
3,3 8,8
8,8
4,8
3,3
4,7
8,8
9,9
3,3 8,8
3,3 8,8
7,7
6,6Region 1
Region 2
FeatureExtraction
130
Structural Analysis: Chain-code
Chain-codes or Freeman codes are the simplest angular approximation. Theypermit to code each vector di between two consecutive points with a codebetween 0 and 7.
6
21
0
3
4
5 7
The codification of a string S is composed of 3 fields:
• Starting point of the segment po(S)=(xo,yo).
• Segment length l(S).
• A table of directions: Dir(S) = [d0, d1, ... , dl(S)-1] where di ∈[0,7], ∀i∈[0,l(S)-1], according to the following codification::
di-1 di
0
di-1
di
1
di-1
di 2di-1
di3
di-1di
4 di-1
di
5di-1
di
6 di-1
di
7
FeatureExtraction
66
131
Structural Analysis: Contour Analysis
• Contour pixels are codified accoriding to their orientatios and using the chain-code representation
• Classification can be done with methods of structuralrecogntion based on the distance string-edit algorithm byWagner i Fischer
22221220201000070070
Chain Codes
6
21
0
3
4
5 7
FeatureExtraction
132
Structural Analysis: Skeleton Analysis
• Image thinning and representation of the skeleton with some encodingmethod allowing to compare two shapes (sometimes it is necessary to vectorize the skeleton):
– Chain codes
– Grafs
– Gramàtiques
– Zoning
– Característiques discretes.
• Skeleton problems:
– Noise sensitive
– Variability of the representation
B esquelet
FeatureExtraction
67
133
Structural Analysis: Skeleton Analysis
• Representation with graphs or grammars– Based on the detection of characteristic skeleton points and skeleton
polygonal approximation
– Two possibilities to represent the skeleton with a graph:• Nodes are the characteristic points while edges are the segments joining the
points
• Nodes are the segments of polygonal approximation while edges representthe adjancency relations between the segments
v1
v2
v3
v4
v5
v6
a1
a4
a7
a2 a3
a5 a6
v1
v2v3
v4
v5
v6
v7
a1a2
a3
a4a5
a6
a7
FeatureExtraction
134
Structural Analysis: Skeleton Analysis
Representation with graphs:
FeatureExtraction
J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9, pp. 903-909, 1995
68
135
Structural Analysis: Skeleton Analysis
• Zoning.
• Discrete features:– Number of loops
– Number of T joints and X joints
– Number of terminal points, corner points and isolated points
– Cross points with horizontal and vertical axes
A B C
G H I
D E F
Opció 1: stroke length within each zone
Opció 2: coding from the arcs:
ArC, ArD, CcF, DrF, DrG, FcI, GrI
on r = recta, c = corba.
FeatureExtraction
136
Structural Analysis : Topological and geometric features
• Aspect ratio x-y.• Perimeter, area, center of gravity• Minimal and maximal distance of the contour to the center of gravity• Number of holes• Euler number = (n. of connected components) - (n. of holes)• Compacity = (perimeter)2 / (4π ·area)• Information about contour curvature• Ascenders and descenders• Concavities and holes• Loops• Unions, terminal points, crossings with horizontal and vertical axes• Angular information: histogram of segment angles
FeatureExtraction
69
137
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
Classification
138
Classification
• Different methods, depending on the model of featurerepresentation
• Classification using feature vectors– Correlation– Euclidean distance– Mahalanobis distance– k nearest neigbours– Bayes’ classifier– Neural networks
• Classification with structural features– Dicothomic search– String edit– Graph matching– Grammars
Classification
70
139
Classification using feature vectors
• Correlation– Classification with the class with the largest correlation value
• Minimal euclidean distance– Distance to the mean mi of the class
– It does not take into account differences in variance for each class
)()()(i
T
iimxmxxD −−=
∫∫∫∫∫∫=
R iR
R ii
yxgyxf
dxdyyxgyxffS
22 ),(),(
),(),()(
Classification
140
Classification using feature vectors
• Minimal quadratic distance (Mahalanobis).– For each class i, the mean mi and covariance matrix Si are computed
from the set of samples
– The covariance matrix is taken into account when computing thedistance from an image to the class i
– The feature vector of the image x is projected over the eigenvectors of the class
ii
ii
iTii
Tii
Tii
S
S
mxz
zzmxSmxxD
de propis vectors:
de propis valors:
)(
)()()(2/1
1
ΨΛ
−ΨΛ=
−=−−=−
−
Classification
71
141
Classification using feature vectors
• k-Nearest Neighbors– Several sample models or each class
– Given an image, we take the k nearest models to the image
– The image is classified into the class with more elemetns in the set of knearest neighbors
• Weighted several nearest neighbors– k depends on the image. For each image x, the set Vx contains all the
models with a distance lower than α times the distance to the nearest model
iVV
xkV
VxD
xi
x
x
ixi
classe la apertanyen que i de dinsestan que models :
a propers més models delsconjunt :
)(
)(
)(=
( )∑∈
=
)(
)(2
)(
,)(
ixVj
ij
ix
ixxd
VxD
Classification
142
Classification using feature vectors
• Bayes’ classifier
– An image x is classified into the class i that maximize the posterior probability p(wi|x)
– Applying Bayes’ theorem:
– p(x) is constant and independent of class i. Therefore, it has no influence in classification
– If all classes have the same prior probability, p(wi) we can discard it too. Then:
)(
)()|()|(
xp
wpwxpxwp ii
i =
yprobabilitPosterior . class tobelongs y that probabilit: )|(
. class tobelongsit that
given image vector obseving ofy Probabilit .Likelihood : )|(
imagevector observingofy probabilit: )(
ixxwp
i
xwxp
xxp
i
i
)|(arg)|(arg ii
ii
wxpmaxxwpmax =
Classification
72
143
Classification using feature vectors
• If we assume a normal distribution for each class with mean mi and covariance Si, estimated from the set of samples for each class:
• Discarding constants, taking squares and applying the logarithm, we can infer the following discriminant function:
)()(2
12/12/
1
)2()|(ii
Ti mxSmx
in
i eSwxp−−−−−
−
⋅⋅= π
)()(log 1)(ii
Tii
xi mxSmxSD −−−−= −
Classification
144
Classification using feature vectors
• Neural networks– They can be applied directly to the image or to a feature vector,
previously computed from the image
– The most used neural networks are multi-layer feed-forward networks
Y.L. Cun et al.Backpropagation applied to zip code recognition, Neural comput. 541-551 (1989).
Each layer represents a feature subvector of a higherlevel
Classification
73
145
Classification using feature vectors
• A neural network is organized into several layers. Each layer has a fixednumber of nodes
• In the first layer, the nodes correspond to the values in the feature vector
• In the last layer, the nodes correspond to each of the classes
• Intermediate (hidden) layers represent feature subvectors of a higher level
• The value at each node in a given layer is computed through theapplication of a propagation function to the values of the nodes in theprevious layer, weighted by a vector of weights
1layer node ,layer nodebetween connexion theofweight : w
layer at bias node :
layer of nodes of Nº : )(
layer at node of value:
kji
)1(
1
1
−
⎟⎟⎠
⎞⎜⎜⎝
⎛+= ∑
−
=
−
kikj
kib
kkN
kix
xwbx
ki
ki
kN
i
ki
kji
kj
kj σ
Classification
146
Classification using feature vectors
• To design a neural network, we have to decide:– The number of layers
– The number of nodes at each layer
• Learning step, using a set of samples– The weights at each connexion are automatically determined in such a
way that minimize classification error
• Example of neural network– Three-layer perceptron (the input accounts for one layer). The final
discriminant function for each class is:
sigmoidea, )1/(1)(
)()1(
1
)0(
1
)1()1()2()2(
x
N
j
N
kjkjijii
exf
xwbfwbfxD
−= =
+=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛++= ∑ ∑
Classification
74
147
Classification using structural features
• Dicotomic search. The presence or absence of certainprimitives is tested in several steps. Character models can be organized with decision trees
• String or graph matching. Application of algorithms for stringedit or matching in graphs of atributes
• Grammars. We test if the character belongs to the languagegenerated by the grammar that represents the model
Classification
148
Classification using Deformable models
• Deformable models– We start with an ideal representation of the shape of the object
– This ideal shape is deformed using a set of rules or pre-defined operation in such a way that all possible valid object distortions can be generated
– Given an image, we look for the object deformation that yields the best result for an energy function defined as the combination of two measures
• Internal energy: it measures the deformation degree from the model of the object
• External energy: it measures the degree of similarity of the deformation to the image
– The image is classified into the model with the lowest global energy
Classification
75
149
Classification using Deformable models
• Based on a character prototype:– We define a prototype of the character that can be deforme by applying a
series of trigonometric transforms over the image space
– Basis of the transform:
– External energy bassed on the distance of the deformation to the image contour
– Bayesian combination of internal and external energy
))sin()cos(2,0(
)0),cos()sin(2(
nymxe
mynxeymn
xmn
ππππ
=
=
Classification
150
Classification using Deformable models
• Based on a set of point generators located on the surface of a spline:– Internal energy:
• The character is represented by a spline. We can modify the control points of the spline
• A probability is generated based on the modification of these control points
– External energy:• Image points are generated from generators located along the spline• A probability is defined based on the distance from image pixels to point
generators
– Minimization:• Probabilistic combination• EM algorithm
Classification
76
151
Classification using Deformable models
• Point distribution model:– The model is represented as the mean of a set of points obtained from
the skeleton of the learning samples
– PCA is applied to obtain the set of valid character deformation from the learning set
– For each image, we get the nearest deformation according to the space defined by PCA
– Internal energy:
– External energy: distance between the image and the image obtained using the nearest deformation
Pbxx +=
)( xxPb Tx −≈
xb
Classification
152
Holistic methods
• Used for recognition of handwritten script• Each word is a recognition unit. We try to recognize each word
using global features of it. Each word is a different class• Based on psychological evidences• Applications:
– Constrained domains with only a few words (bank applications, personal agendas, etc)
– Filtering of large domains to reduce the set of possible words
• Usually, they are based on the application of HMM or dynamic programming (string edit) to words (not to letters)
Classification
77
153
Estratègies holístiques
• Global word features:– Distribution of segment orientations (horizontal, vertical, diagonal)
– Terminal points
– Concavities
– Holes
– Loops
– Word length
– Ascenders and Descenders.
– Crossing point with the central line
– Fourier coefficients
• Feature representation:– Feature vectors of matrices: the word is divided into zones. Each zone corresponds
to a component of the feature vector where it is tested the presence or absence ofthe features
– Graphs: adjacency or neighboring relations between detected features
Classification
154
Hidden Markov Models
• Hidden Markov Model– It represents a double stochastic process where a Markov chain is not
directly observable. It can only be inferred from the observation of another stochastic process
– In an HMM, a sequence of observations O=(o1,...,oT) is produced by a sequence of states Q=(q1,...qT)
– An HMM is modelled by:{ }{ }{ }{ }{ }{ }
stategiven a
in symbol a observing ofy probabilit : )|()(b,)(b B
state final theofy probabilit : )(,
statesbetween y probabilitn transitio: )|(a,aA
stateeach for y probabilit Initial : )(,
states ofset : 1;S
symbols observable ofset : 1;
jj
1ijij
1ii
jqvoPkk
sqP
sqsqP
sqP
Nis
MkvV
tkt
jTjj
itjt
i
i
k
====
===Γ
=======Π
≤≤=≤≤=
+
γγ
ππ
Hidden MarkovModels
78
155
Hidden Markov Models
Example: Markov Model to model the weather
States: S = {rainy, sunny}Set of observations (possible values of humidity) V = {0%, 25%, 50%, 75%, 100%}Sequence of observations: O = {0%,25%,25%,50%,25%,75%,50%}
sunny rainy
P(s|s)
P(s|p)
P(p|s)P(p|p)
P(0%|s)P(25%|s)P(50%|s)P(75%|s)P(100%|s)
P(0%|p)P(25%|p)P(50%|p)P(75%|p)P(100%|p)
With the model we can:
•Compute the probability that the sequence of observations can be generated with the model of weather•Given a set of observations of humidity, to estimate the model of weather•Decide if the weather has been rainy or sunny for each of the days under observation
Hidden MarkovModels
156
Hidden Markov Models
• Three problems related to a HMM:– Given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find
P(O| λ), the probability that the sequence of observations can be generated from the model
• Computed by summing the probabilities of the sequence of observations for all possible sequences of states
• Forward/backward propagation methods
– Learning problem: given a learning set, O, find the parameters of the model λ that maximize P(O| λ).
• Baum-Welch algorithm
– Recognition problem: given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find the optimal sequence of states Q
• Viterbi algorithm
Hidden MarkovModels
79
157
Hidden Markov Models. OCR applications
1. External segmentation with context post-process:• The Markov model represents grapheme variations. Based on bigram
frequencies or trigram frequencies, probabilities of finding sequences of consecutives letters in the dictionary of the language
2. Internal segmentation:• The Markov chain represents character features extracted from left to right
along the text. These features are compared with the model and, with therecognition, it can be decided when a character ends and the next onebegins
3. Holistic methods:• The Markov chain represents variations within a given word, belonging to
a lexicon with the valid words
Hidden MarkovModels
158
Hidden Markov Models. OCR applications
• One state per letter
• Transition probabilities are the probabilities of finding two consecutive lettersin the language
• The observations are the pre-segmented zones of the image
• The probability of the observations given a state are the probabilities that eachpre-segmented zone corresponds to the letter associated to the state
a m n ...
pam
pmm
pma
pnm
pmn
paapnn
Hidden MarkovModels
External segmentation with context post-process
80
159
Hidden Markov Models. OCR applications
• Model Discriminant HMM:– An HMM for each letter
– Recognition is done letter by letter
– States within an HMM represent zoneswithin the letter with different featurevalues
– Word recognition is done by finding thebest individual HMM combinationaccording to all segmentation possibilities
Hidden MarkovModels
Internal segmentation
160
Hidden Markov Models. OCR applications
• Path-disciminant HMM– One HMM
– Each state corresponds to a letter
– Transitions between states correspond to the probability of changing fromone letter to another within a word
– Over segmentation of the image. Each observation corresponds to a set offeatures extracted from each segment
– The probability of each observation for a given state is the probability ofgenerating the set of features from a given letter
– Recognition looks for the sequence of states (letters) that best generates theset of features extracted from the image
a b c ...
Hidden MarkovModels
Internal segmentation
81
161
Hidden Markov Models. OCR applications
• Example: HMM for word recognition– One HMM for each word– Each letter is represented with four states. Each state is a possible result of
a previous oversegmentation– Recognition looks for the HMM (word) that maximizes the probability of
generating the sequence of observations (features extracted from theimage)
c a
...
Hidden MarkovModels
Holistic methods
162
Hidden Markov Models. OCR applications
• Feature extraction:– Computation of 9 features inside a mobile window:
• Number of pixels• Center of gravity• Moments of order 2• Location of upper and lower contour• Orientation of upper and lower contour• Number of black-white transitions in the vertical direction• Number of black pixels between upper and lower contours
• One HMM for each character• HMM are concatenated to compose words. Find the combination
of HMM with the highest probability• Results:
– Vocabulary: 2296 words– Test set: 3850 words, 80 people– Recognition: 82.05%
S. Gunter, H. Bunke: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognition, vol. 37, pp. 2069-2070, 2004.
82
163
Components of an OCR system
ACQUISITION
SEGMENTATION
CHARACTERPRE-PROCESSING
FEATURE EXTRACTION
CLASSIFICATION
POSTPROCESSING
• Layout analysis• Text/graphics separation• Character segmentation
• Filtering• Normalization
• Image-based features• Statistical features• Transform-based features• Structural features
Context infromation
LEARNING
Models
DOCUMENT PRE-PROCESSING
• Filtering.• Binarization.• Skew correction.
Post-process
164
Post-process
• Post-process to improve OCR results:– Voting. Combination of several classifiers
– Utilization of context information. Analysis of classification of individual characters in the context of adjacent characters
Post-process
83
165
Post-process. Classifier Combination
Combination of specific classifiers with good performance for some characters, fonts, etc.
Text
Classifier 1 Classifier 2 Classifier k...
ClassifiersCombination
(voting)
Output of individual
classification
Finaloutput
Knowledge
•Integrity: how the voting algorithm controls the activation and configuration of individual classifiers. High integrity => The votingalgorithm decides the best classifiers in eachsituation
•Representation of the classification results:
•Abstract: each classifier simply givesthe label of the class.
•Ranked: each classifier gives severalranked labels.
•Ranked with a degree of confidence: for each label, the classifier gives a levelof confidence.
•Combination of the classifier results. How to combine the results
Post-process
166
Post-process. Classifier Combination
Classif. e
C17=‘s’, c21=‘S’, c4=‘g’, c9=‘8’
x =
?
Post-process
84
167
Post-process. Context information
• Context analysis tries to correct errors produced by decisions taken in function of local features
• In the presence of uncertainty, hypothesis generated by the local classifier are complemented with hypothesis of neighboringcharacters
• Two points of view– Geometric conext (typographic).
– Linguistic context
13 o B?
Post-process
168
Post-process. Context information
• Methods based on n-grams (Combinations of n letters).– Probability that an n-gram appears in the words of the dictionary
– When there are characters with uncertainty, we take the final decisionaccording to the n-gram with the highest probability
– bottom-up techniques
– Viterbi algorithm and Markov methods
• Methods based on grammars– A grammar is used to validate the results of the OCR
– Similar to n-grams, but they permit to consider variable-length stringsand recursivity
vint - centsvuit - cents
Xifra → Desena ‘-’ Unitat |
Unitat ‘-’ CentenaD - CU - C
?
vuit - centsPost-process
85
169
Post-process. Context information
• Methods based on a dictionary– Creation of a dictionary with the set of correct words
– It permits ortographic correction of the text
– Utilization of string-edit algorithms
– Problem: words not included in the dictionary
– Requires structures for representing the dictionary providing a quickaccess
Autòmat per al diccionari: {LLIBRE, LLIURE, CAURE, COURE, COST}
L L I B R E
C A U
O S T
p a r a u l a h paraula
21
n
h(k)clau Hash
function
dictionaryPost-process
170
Examples of OCR systems
• Printed characters:– Readstar (Innovatic).
– Cuneiform (Cognitive Technology).
– Word Scan (Calera).
– OmniPage (Caere).
– Text Bridge (Xerox).
– Neuro Talker OCR (Int. Neural Machines).
– OCR Master.
– Recognita Plus.
– TypeReader Professional.
– Etc.
• Hand-written characters:– Mitek.
Examples
86
171
Examples of OCR systems
• Inspection of surgical sachets– Digit recognition: reference and date
– System requirements• Irregular surface: shadows and reflections
• Resolution: 175dpi
• Aqcuisition with a camera B/W
• Diffuse lighting
• Detection of 16 defects in 400ms.
• Verification system
Examples
172
Examples of OCR systems
• Preprocess– Skew correction: combination of the angle of the upper contour and the
segments of the box surrounding the word LOT.
– Binarization: otsu
– Thinning.
Examples
87
173
Examples of OCR systems
• Segmentation– Connected components from the skeleton
– Application of domain knowledge (character size and separation) tosegments touching or broken characters:
• Divide wide components
• Join thin and nearby components
Examples
174
Examples of OCR systems
• Feature extraction: zoning– The size of each zone is not constant: adapted to image size
– Two versions: • Value at each zone: measure of the importance of the zone with respect to the
whole character:– Number of pixels greater than a percentage of the total number in the image 1.
Otherwise 0
– The central region is more important. Value multiplied by 2.
– Three values are added to combine the values in the more discriminant zones
• Value at each zone: percentage of white pixels at each zone
0.00.52.0
111
100
121
101
111
Examples
88
175
Examples of OCR systems
• Classification– Version 1
• Model with the minimal distance
• If several models have the same minimal distance and the digit to verify isamong the candidates, it is verified with an ambigous signal
– Versió 2:• Mahalanobis distance
• Learning step to compute mean and covariance for each digit
• Verification of the digit string:– The string is rejected if there are more than a digit mis-recognized or more
than two ambigous digits
∑=
−=n
jjj mid
1
Examples
176
Bibliography
•S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999.
•H. Bunke, P.S.P. Wang. Handbook of Character Recognition and Document Image Analysis. WorldScientific Publishing Company, 1997.
•S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier. Kluwer Academic Publishers. 1999.
•S. Impedovo. Fundamentals in Handwiting Recognition. Springer-Verlag, 1994.
• A. Belaïd, Y. Belaïd. Reconnaissance des formes. Méthodes et Applications. Inter Editions, Paris, 1992.
• A.C. Downtown, S. Impedovo. Progress in Handwriting Recognition. World Scientific PublishingCompany, 1997.
•P.S.P. Wang. Character and Handwriting Recognition: Expanding Frontiers. Special Issue of IJPRAI, Vol. 5 nums. 1,2, 1991.
• T. Pavlidis, S. Mori. Optical Character Recognition. Special Issue of Proceedings of the IEEE, Vol. 80 no. 7, 1992.
• V.K. Govindan, A.P. Shivaprasad. Character Recognition - A Review. Pattern Recognition, Vol. 23, No. 7, pp. 671-683, 1990.
Bibliography
89
177
Bibliography
• O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996.
• R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996.
• Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1, pp. 67-80, 1995.
• Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. PatternRecognition, Vol. 29, n. 1, pp. 77-96, 1996.
• S.W. Lee. Advnces in Handwriting Recognition. World Scientific. 1999.
• C.H. Chen, L.F. Pau, P.S.P. Wang. Handbook of Pattern Recognition and Computer Vision. WorldScientific. 1993.
• J.L. Blue et al. Evaluation of Pattern Classifiers for Fingerprint and OCR Applications. PatternRecognition, Vol. 27, no. 4, pp. 485-501, 1994.
• R. Palmondon, S. Srihari. On-line and Off-line Handwriting Recognition. A ComprehensiveSurvey. IEEE Transactions on PAMI. Vol. 22, no. 1, pp. 63-84, 2000.
• G. Nagy. Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on PAMI, Vol. 22, no. 1, pp. 38-62, 2000.
• H.Bunke, T. Caelli. Hidden Markov Models. Applications in Computer Vision. World Scientific. 2001.
• R. Duda, P. Hart, D. Stork. Pattern Classification. 2ª ed., Wiley Interscience, 2000
Bibliography
178
Practical work
Binarization(Local adaptative)
Document image
Binary image Layout Analysis
N binarytext images
CharacterSegmentation
N characterimages
FeatureExtraction
Feature Vector Classification(Multiple classifiers)
Character Label
Groups of 2 people for each task