Upload
adria-ewing
View
46
Download
1
Embed Size (px)
DESCRIPTION
OCR a survey. Csink László 2009. Problems to Solve. Recognize good quality printed text Recognize neatly written handprinted text Recognize omnifont machine-printed text Deal with degarded, bad quality documents Recognize unconstrained handwritten text Lower substitution error rates - PowerPoint PPT Presentation
Citation preview
OCR OCR a surveya survey
Csink László 2009Csink László 2009
22
Problems to SolveProblems to Solve
Recognize good quality printed textRecognize good quality printed text Recognize neatly written handprinted Recognize neatly written handprinted
texttext Recognize omnifont machine-printed textRecognize omnifont machine-printed text Deal with degarded, bad quality Deal with degarded, bad quality
documentsdocuments Recognize unconstrained handwritten Recognize unconstrained handwritten
texttext Lower substitution error ratesLower substitution error rates Lower rejection ratesLower rejection rates
33
OCR accprding to Nature of OCR accprding to Nature of InputInput
44
Feature ExtractionFeature Extraction
Large number of feature extraction Large number of feature extraction methods are available in the methods are available in the literature for OCRliterature for OCR
Which method suits which Which method suits which application?application?
55
A Typical OCR SystemA Typical OCR System
1.1. Gray-level scanning (300-600 dpi)Gray-level scanning (300-600 dpi)2.2. PreprocessingPreprocessing
– Binarization using a global or locally adaptive Binarization using a global or locally adaptive methodmethod
– Segmentation to isolate individual charactersSegmentation to isolate individual characters– (optional) conversion to another character (optional) conversion to another character
representation (e.g. skeleton or contour curve)representation (e.g. skeleton or contour curve)
3.3. Feature extractionFeature extraction4.4. Recognition using classifiersRecognition using classifiers5.5. Contextual verification or post-processingContextual verification or post-processing
66
Feature Extraction (Devivjer and Feature Extraction (Devivjer and Kittler)Kittler)
Feature ExtractionFeature Extraction = = the problem of extracting the problem of extracting from the raw data the information which is most from the raw data the information which is most relevant for classification purposes, in the sense relevant for classification purposes, in the sense of minimizing the within-class variability while of minimizing the within-class variability while enhancing the between-class pattern variabilityenhancing the between-class pattern variability
Extracted features must be invariant to the Extracted features must be invariant to the expected distortions and variationsexpected distortions and variations
Curse of dimensionalityCurse of dimensionality= if the training set is = if the training set is small, the number of features cannot be high small, the number of features cannot be high eithereither
Rule of thumb: Rule of thumb: number of training patterns = 10number of training patterns = 10××(dim of feature (dim of feature vector)vector)
77
Some issuesSome issues
Do the characters have known Do the characters have known orientation and size?orientation and size?
Are they handwritten, machine-Are they handwritten, machine-printed or typed?printed or typed?
Degree of degradation?Degree of degradation? If a character may be written in two If a character may be written in two
ways (e.g. ‘a’ or ‘ways (e.g. ‘a’ or ‘αα’), it might be ’), it might be represented by two patternsrepresented by two patterns
88
Variations of the same Variations of the same charactercharacter
Size invariance can be achieved by normalization, but norming can cause discontinuities in the character
Rotation invariance is important if chaarcters may appear in any orientation (P or d ?)
Skew invariance is important for hand-printed text or multifont machine-printed text
99
Features Extracted from Features Extracted from Grayscale ImagesGrayscale Images
Goal: locate candidate characters. If the image is binarized, one may find the connected components of expected character size by a flood fill type algorithm (4-way recursive method, 8-way recursive method, non-recursive scanline method etc., check http://www.codeproject.com/KB/GDI/QuickFill.aspx
Then the bounding box is found.
A grayscale method is typically used when recognition based on the binary representation fails. Then the localization may be difficult. Assuming that there is a standard size for a character, one may simply try all possible locations.
In a good case, after localization one has a subimage containing one character and no other objects.
1010
Template MatchingTemplate Matching((not often used in OCR systems for grayscale not often used in OCR systems for grayscale
characterscharacters))
No feature extraction is usedNo feature extraction is used, the template , the template character image itself is compared to the character image itself is compared to the input character image:input character image:
where the character Z and the template Twhere the character Z and the template Tjj are are of the same size and summation is taken over of the same size and summation is taken over all the M pixels of Z. The problem is to find j all the M pixels of Z. The problem is to find j for which Dfor which Djj is minimal; then Z is identified is minimal; then Z is identified with Twith Tjj..
M
ij yxTyxZD iijii1
2
,,
1111
Limitations of Template Limitations of Template MatchingMatching
Characters and templates must be of the Characters and templates must be of the same sizesame size
The method is The method is not invariantnot invariant to changes in to changes in illuminationillumination
Very vulnerable to noiseVery vulnerable to noiseIn template matching, all pixels are used as In template matching, all pixels are used as
templates. templates. It is a better idea to use unitaryIt is a better idea to use unitary (dfistance-preserving) (dfistance-preserving) transformstransforms to to character images, obtaining a reduction of character images, obtaining a reduction of features while preserving most of the features while preserving most of the informations of the character shape.informations of the character shape.
1212
The Radon TransformThe Radon TransformThe Radon transform computes projections of an image matrix along specified directions.
A projection of a two-dimensional function f(x,y) is a set of line integrals. The Radon function computes the line integrals from multiple sources along parallel paths, or beams, in a certain direction.
The beams are spaced 1 pixel unit apart. To represent an image, the radon function takes multiple, parallel-beam projections of the image from different angles by rotating the source around the center of the image. The following figure shows a single projection at a specified rotation angle.
1313
Projections to Various AxesProjections to Various Axes
1414
1515
ZoningZoningConsider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the average gray level in each part, yielding a 25-length feature vector.
1616
ThinningThinning
Thinning is possible both for Thinning is possible both for grayscale and for binary imagesgrayscale and for binary images
Thinning= skeletonization of Thinning= skeletonization of characterscharacters
Advantage: few features, easy to Advantage: few features, easy to extractextract
The informal definition of a skeleton is a line The informal definition of a skeleton is a line representation of an object that is: representation of an object that is: ii) one-pixel thick, ) one-pixel thick, iiii) through the "middle" of the object, and, ) through the "middle" of the object, and, iiiiii) preserves the topology of the object.) preserves the topology of the object.
1717
When No Skeleton ExistsWhen No Skeleton Exists
a) Impossible to egnerate a one-pixel width skeleton to be in the middle
b) No pixel can be left out while preserving the connectedness
1818
Possible DefectsPossible Defects
Specific defects of data may cause misrecognitionSpecific defects of data may cause misrecognitionSmall holes Small holes loops in skeleton loops in skeletonSingle element irregularities Single element irregularities false tails false tailsAcute angles Acute angles false tails false tails
1919
How Thinning WorksHow Thinning Works Most thinning algorithms rely on the erosion of the Most thinning algorithms rely on the erosion of the
boundary while maintaining connectivity,seeboundary while maintaining connectivity,see http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Morphhttp://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Morpholo.htmlolo.html for mathematical morphology for mathematical morphology
To avoid defects, preprocessing is desirableTo avoid defects, preprocessing is desirable As an example, in a black and white applicationAs an example, in a black and white application
– They remove very small holesThey remove very small holes– They remove black elements having less than 3 black They remove black elements having less than 3 black
neighbours and having connectivity 1neighbours and having connectivity 1
2020
An Example of Noise An Example of Noise RemovalRemoval
This pixel will be removed (N=1; has 1 black neighbour)
2121
Generation of Feature Vectors Using Generation of Feature Vectors Using Invariant MomentsInvariant Moments
Given a Given a grayscalegrayscale subimage Z containing a character subimage Z containing a character candidate, the moments of order p+q are defined bycandidate, the moments of order p+q are defined by
yxyxm iiZqpM
iiipq
1
,
where the sum is taken over all M pixels of the subimage.
The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity:
yyxxyx iiZqpM
iiipq
1
, mm
mm yandx
00
01
00
10 where
2222
Hu’s (1962) Central MomentsHu’s (1962) Central Moments
mqpwhereqp
pq
pq 00002
1,2,
ηpq –s are scale invariant to scale
Mi –s are rotation invariant
2323
K-Nearest Neighbor K-Nearest Neighbor ClassificationClassification
Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer circle).
Disadvantage in practice: the distance of the green circle to all blue squares and to all red triangle shave to be computed, this may take much time
2424
From now on we will deal with From now on we will deal with binary (binary (blackblack and and whitewhite) ) images onlyimages only
2525
Projection HistogramsProjection Histograms These methods are typically used forThese methods are typically used for
– segmenting characters, words and text linessegmenting characters, words and text lines– detecting if a scanned text page is rotateddetecting if a scanned text page is rotatedBut But they can also provide features for recognitionthey can also provide features for recognition, too!, too!
•Using the same number of bins on each axis – and dividing by the total number of pixels - the features can be made scale independent
•Projection to the y-axis is slant invariant, but projection to the x-axis is not
•Histograms are very sensitive to rotation
2626
Comparision of HistogramsComparision of HistogramsIt seems plausible to compare two histograms yIt seems plausible to compare two histograms y11 and y and y22
(where n is the number of bins) in the following way:(where n is the number of bins) in the following way:
n
iii xyxyd
121
However, the dissimilarity using cumulative histograms is less sensitive to errors. Define the cumulative histogram Y as follows:
k
ikk xx yY
1
For the cumulative histograms
Y1 and Y2 define D as:
n
iii xYxYD
121
2727
Zoning for Binary Zoning for Binary Characters 1Characters 1
Contour extraction or thinning may be unusable for self-touching characters.
This kind of error often occurs to degraded machine-printed texts (generations of photocopying )
The self-touching problem may be healed by morphological opening.
2828
Similarly to the grayscale case, we consider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the number of black pixels in each part, yielding a 25-length feature vector.
Zoning for Binary Zoning for Binary Characters 2Characters 2
2929
Generation of Moments in the Binary Generation of Moments in the Binary CaseCase
Given a Given a binarybinary subimage Z containing a character subimage Z containing a character candidate, the moments of order p+q are defined bycandidate, the moments of order p+q are defined by
pixelsblackover
qp
pq yxm ii
where the sum is taken over all black pixels of the subimage
The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity:
pixelsblackover
qp
pqyyxx ii m
mmm yandx
00
01
00
10 where
3030
The Central Moments can be used similarly to the grayscale The Central Moments can be used similarly to the grayscale casecase
mqpwhereqp
pq
pq 00002
1,2,
ηpq –s are scale invariant to scale
Mi –s are rotation invariant
3131
Contour ProfilesContour Profiles
The profiles may be outer profiles or inner profiles. To construct profiles, find the uppermost and lowermost pixels on the contour. The contour is split at these points. To obtain the outer profiles, for each y select the outermost x on each contour half.
Profiles to the other axis can be constructed similarly.
3232
Features Generated by Contour Features Generated by Contour ProfilesProfiles
First differences of profiles: First differences of profiles: X’X’LL=X=XLL(y+1)-x(y+1)-xLL(y)(y)
Width: Width: w(y)=xw(y)=xRR(y)-x(y)-xLL(y)(y)
Height/maxHeight/maxyy(w(y))(w(y))
LocationLocation of of minimaminima and and maximamaxima of the profiles of the profiles
Location of peaksin the first differencesLocation of peaksin the first differences (which (which may indicate discontinuities)may indicate discontinuities)
3333
Zoning on Contour Curves 1 (Kimura & Zoning on Contour Curves 1 (Kimura & Sridhar)Sridhar)
Enlarged zone
A feature vector of size (4× 4) × 4 isgenerated
3434
Zoning on Contour Curves 2 (Takahashi)Zoning on Contour Curves 2 (Takahashi)
Contour codes were extracted from inner contours (if any) as well as outer contours, the feature vector had dimension (4 ×6 ×6 ×6) ×4 ×(2)
(size ×four directions × (inner and outer))
3535
Zoning on Contour Curves 3 (Cao)Zoning on Contour Curves 3 (Cao)
When the contour curve is close to a zone border, small variations in the curve may lead to large variations in the feature vector
Solution:
Fuzzy border
3636
Zoning of SkeletonsZoning of Skeletons
Features: length of the character graph in each zone (9 or 3).
By dividing the length with the total length of the graph, size independence can be achieved.
Additional features: the presence or absence of junctions or endpoints
3737
The Neural Network The Neural Network Approach for Digit Approach for Digit
RecognitionRecognition
Le Cun et al:
• Each character is scaled to a 16×16 grid
• Three intermediate hidden layers
• Training on a large set
Advantage:
• feature extraction is automatic
Disadvantage:
• We do not know how it works
• The output set (here 0-19) is small