37
OCR OCR a survey a survey Csink László 2009 Csink László 2009

OCR a survey

Embed Size (px)

DESCRIPTION

OCR a survey. Csink László 2009. Problems to Solve. Recognize good quality printed text Recognize neatly written handprinted text Recognize omnifont machine-printed text Deal with degarded, bad quality documents Recognize unconstrained handwritten text Lower substitution error rates - PowerPoint PPT Presentation

Citation preview

Page 1: OCR  a survey

OCR OCR a surveya survey

Csink László 2009Csink László 2009

Page 2: OCR  a survey

22

Problems to SolveProblems to Solve

Recognize good quality printed textRecognize good quality printed text Recognize neatly written handprinted Recognize neatly written handprinted

texttext Recognize omnifont machine-printed textRecognize omnifont machine-printed text Deal with degarded, bad quality Deal with degarded, bad quality

documentsdocuments Recognize unconstrained handwritten Recognize unconstrained handwritten

texttext Lower substitution error ratesLower substitution error rates Lower rejection ratesLower rejection rates

Page 3: OCR  a survey

33

OCR accprding to Nature of OCR accprding to Nature of InputInput

Page 4: OCR  a survey

44

Feature ExtractionFeature Extraction

Large number of feature extraction Large number of feature extraction methods are available in the methods are available in the literature for OCRliterature for OCR

Which method suits which Which method suits which application?application?

Page 5: OCR  a survey

55

A Typical OCR SystemA Typical OCR System

1.1. Gray-level scanning (300-600 dpi)Gray-level scanning (300-600 dpi)2.2. PreprocessingPreprocessing

– Binarization using a global or locally adaptive Binarization using a global or locally adaptive methodmethod

– Segmentation to isolate individual charactersSegmentation to isolate individual characters– (optional) conversion to another character (optional) conversion to another character

representation (e.g. skeleton or contour curve)representation (e.g. skeleton or contour curve)

3.3. Feature extractionFeature extraction4.4. Recognition using classifiersRecognition using classifiers5.5. Contextual verification or post-processingContextual verification or post-processing

Page 6: OCR  a survey

66

Feature Extraction (Devivjer and Feature Extraction (Devivjer and Kittler)Kittler)

Feature ExtractionFeature Extraction = = the problem of extracting the problem of extracting from the raw data the information which is most from the raw data the information which is most relevant for classification purposes, in the sense relevant for classification purposes, in the sense of minimizing the within-class variability while of minimizing the within-class variability while enhancing the between-class pattern variabilityenhancing the between-class pattern variability

Extracted features must be invariant to the Extracted features must be invariant to the expected distortions and variationsexpected distortions and variations

Curse of dimensionalityCurse of dimensionality= if the training set is = if the training set is small, the number of features cannot be high small, the number of features cannot be high eithereither

Rule of thumb: Rule of thumb: number of training patterns = 10number of training patterns = 10××(dim of feature (dim of feature vector)vector)

Page 7: OCR  a survey

77

Some issuesSome issues

Do the characters have known Do the characters have known orientation and size?orientation and size?

Are they handwritten, machine-Are they handwritten, machine-printed or typed?printed or typed?

Degree of degradation?Degree of degradation? If a character may be written in two If a character may be written in two

ways (e.g. ‘a’ or ‘ways (e.g. ‘a’ or ‘αα’), it might be ’), it might be represented by two patternsrepresented by two patterns

Page 8: OCR  a survey

88

Variations of the same Variations of the same charactercharacter

Size invariance can be achieved by normalization, but norming can cause discontinuities in the character

Rotation invariance is important if chaarcters may appear in any orientation (P or d ?)

Skew invariance is important for hand-printed text or multifont machine-printed text

Page 9: OCR  a survey

99

Features Extracted from Features Extracted from Grayscale ImagesGrayscale Images

Goal: locate candidate characters. If the image is binarized, one may find the connected components of expected character size by a flood fill type algorithm (4-way recursive method, 8-way recursive method, non-recursive scanline method etc., check http://www.codeproject.com/KB/GDI/QuickFill.aspx

Then the bounding box is found.

A grayscale method is typically used when recognition based on the binary representation fails. Then the localization may be difficult. Assuming that there is a standard size for a character, one may simply try all possible locations.

In a good case, after localization one has a subimage containing one character and no other objects.

Page 10: OCR  a survey

1010

Template MatchingTemplate Matching((not often used in OCR systems for grayscale not often used in OCR systems for grayscale

characterscharacters))

No feature extraction is usedNo feature extraction is used, the template , the template character image itself is compared to the character image itself is compared to the input character image:input character image:

where the character Z and the template Twhere the character Z and the template Tjj are are of the same size and summation is taken over of the same size and summation is taken over all the M pixels of Z. The problem is to find j all the M pixels of Z. The problem is to find j for which Dfor which Djj is minimal; then Z is identified is minimal; then Z is identified with Twith Tjj..

M

ij yxTyxZD iijii1

2

,,

Page 11: OCR  a survey

1111

Limitations of Template Limitations of Template MatchingMatching

Characters and templates must be of the Characters and templates must be of the same sizesame size

The method is The method is not invariantnot invariant to changes in to changes in illuminationillumination

Very vulnerable to noiseVery vulnerable to noiseIn template matching, all pixels are used as In template matching, all pixels are used as

templates. templates. It is a better idea to use unitaryIt is a better idea to use unitary (dfistance-preserving) (dfistance-preserving) transformstransforms to to character images, obtaining a reduction of character images, obtaining a reduction of features while preserving most of the features while preserving most of the informations of the character shape.informations of the character shape.

Page 12: OCR  a survey

1212

The Radon TransformThe Radon TransformThe Radon transform computes projections of an image matrix along specified directions.

A projection of a two-dimensional function f(x,y) is a set of line integrals. The Radon function computes the line integrals from multiple sources along parallel paths, or beams, in a certain direction.

The beams are spaced 1 pixel unit apart. To represent an image, the radon function takes multiple, parallel-beam projections of the image from different angles by rotating the source around the center of the image. The following figure shows a single projection at a specified rotation angle.

Page 13: OCR  a survey

1313

Projections to Various AxesProjections to Various Axes

Page 14: OCR  a survey

1414

Page 15: OCR  a survey

1515

ZoningZoningConsider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the average gray level in each part, yielding a 25-length feature vector.

Page 16: OCR  a survey

1616

ThinningThinning

Thinning is possible both for Thinning is possible both for grayscale and for binary imagesgrayscale and for binary images

Thinning= skeletonization of Thinning= skeletonization of characterscharacters

Advantage: few features, easy to Advantage: few features, easy to extractextract

The informal definition of a skeleton is a line The informal definition of a skeleton is a line representation of an object that is: representation of an object that is: ii) one-pixel thick, ) one-pixel thick, iiii) through the "middle" of the object, and, ) through the "middle" of the object, and, iiiiii) preserves the topology of the object.) preserves the topology of the object.

Page 17: OCR  a survey

1717

When No Skeleton ExistsWhen No Skeleton Exists

a) Impossible to egnerate a one-pixel width skeleton to be in the middle

b) No pixel can be left out while preserving the connectedness

Page 18: OCR  a survey

1818

Possible DefectsPossible Defects

Specific defects of data may cause misrecognitionSpecific defects of data may cause misrecognitionSmall holes Small holes loops in skeleton loops in skeletonSingle element irregularities Single element irregularities false tails false tailsAcute angles Acute angles false tails false tails

Page 19: OCR  a survey

1919

How Thinning WorksHow Thinning Works Most thinning algorithms rely on the erosion of the Most thinning algorithms rely on the erosion of the

boundary while maintaining connectivity,seeboundary while maintaining connectivity,see http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Morphhttp://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Morpholo.htmlolo.html for mathematical morphology for mathematical morphology

To avoid defects, preprocessing is desirableTo avoid defects, preprocessing is desirable As an example, in a black and white applicationAs an example, in a black and white application

– They remove very small holesThey remove very small holes– They remove black elements having less than 3 black They remove black elements having less than 3 black

neighbours and having connectivity 1neighbours and having connectivity 1

Page 20: OCR  a survey

2020

An Example of Noise An Example of Noise RemovalRemoval

This pixel will be removed (N=1; has 1 black neighbour)

Page 21: OCR  a survey

2121

Generation of Feature Vectors Using Generation of Feature Vectors Using Invariant MomentsInvariant Moments

Given a Given a grayscalegrayscale subimage Z containing a character subimage Z containing a character candidate, the moments of order p+q are defined bycandidate, the moments of order p+q are defined by

yxyxm iiZqpM

iiipq

1

,

where the sum is taken over all M pixels of the subimage.

The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity:

yyxxyx iiZqpM

iiipq

1

, mm

mm yandx

00

01

00

10 where

Page 22: OCR  a survey

2222

Hu’s (1962) Central MomentsHu’s (1962) Central Moments

mqpwhereqp

pq

pq 00002

1,2,

ηpq –s are scale invariant to scale

Mi –s are rotation invariant

Page 23: OCR  a survey

2323

K-Nearest Neighbor K-Nearest Neighbor ClassificationClassification

Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer circle).

Disadvantage in practice: the distance of the green circle to all blue squares and to all red triangle shave to be computed, this may take much time

Page 24: OCR  a survey

2424

From now on we will deal with From now on we will deal with binary (binary (blackblack and and whitewhite) ) images onlyimages only

Page 25: OCR  a survey

2525

Projection HistogramsProjection Histograms These methods are typically used forThese methods are typically used for

– segmenting characters, words and text linessegmenting characters, words and text lines– detecting if a scanned text page is rotateddetecting if a scanned text page is rotatedBut But they can also provide features for recognitionthey can also provide features for recognition, too!, too!

•Using the same number of bins on each axis – and dividing by the total number of pixels - the features can be made scale independent

•Projection to the y-axis is slant invariant, but projection to the x-axis is not

•Histograms are very sensitive to rotation

Page 26: OCR  a survey

2626

Comparision of HistogramsComparision of HistogramsIt seems plausible to compare two histograms yIt seems plausible to compare two histograms y11 and y and y22

(where n is the number of bins) in the following way:(where n is the number of bins) in the following way:

n

iii xyxyd

121

However, the dissimilarity using cumulative histograms is less sensitive to errors. Define the cumulative histogram Y as follows:

k

ikk xx yY

1

For the cumulative histograms

Y1 and Y2 define D as:

n

iii xYxYD

121

Page 27: OCR  a survey

2727

Zoning for Binary Zoning for Binary Characters 1Characters 1

Contour extraction or thinning may be unusable for self-touching characters.

This kind of error often occurs to degraded machine-printed texts (generations of photocopying )

The self-touching problem may be healed by morphological opening.

Page 28: OCR  a survey

2828

Similarly to the grayscale case, we consider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the number of black pixels in each part, yielding a 25-length feature vector.

Zoning for Binary Zoning for Binary Characters 2Characters 2

Page 29: OCR  a survey

2929

Generation of Moments in the Binary Generation of Moments in the Binary CaseCase

Given a Given a binarybinary subimage Z containing a character subimage Z containing a character candidate, the moments of order p+q are defined bycandidate, the moments of order p+q are defined by

pixelsblackover

qp

pq yxm ii

where the sum is taken over all black pixels of the subimage

The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity:

pixelsblackover

qp

pqyyxx ii m

mmm yandx

00

01

00

10 where

Page 30: OCR  a survey

3030

The Central Moments can be used similarly to the grayscale The Central Moments can be used similarly to the grayscale casecase

mqpwhereqp

pq

pq 00002

1,2,

ηpq –s are scale invariant to scale

Mi –s are rotation invariant

Page 31: OCR  a survey

3131

Contour ProfilesContour Profiles

The profiles may be outer profiles or inner profiles. To construct profiles, find the uppermost and lowermost pixels on the contour. The contour is split at these points. To obtain the outer profiles, for each y select the outermost x on each contour half.

Profiles to the other axis can be constructed similarly.

Page 32: OCR  a survey

3232

Features Generated by Contour Features Generated by Contour ProfilesProfiles

First differences of profiles: First differences of profiles: X’X’LL=X=XLL(y+1)-x(y+1)-xLL(y)(y)

Width: Width: w(y)=xw(y)=xRR(y)-x(y)-xLL(y)(y)

Height/maxHeight/maxyy(w(y))(w(y))

LocationLocation of of minimaminima and and maximamaxima of the profiles of the profiles

Location of peaksin the first differencesLocation of peaksin the first differences (which (which may indicate discontinuities)may indicate discontinuities)

Page 33: OCR  a survey

3333

Zoning on Contour Curves 1 (Kimura & Zoning on Contour Curves 1 (Kimura & Sridhar)Sridhar)

Enlarged zone

A feature vector of size (4× 4) × 4 isgenerated

Page 34: OCR  a survey

3434

Zoning on Contour Curves 2 (Takahashi)Zoning on Contour Curves 2 (Takahashi)

Contour codes were extracted from inner contours (if any) as well as outer contours, the feature vector had dimension (4 ×6 ×6 ×6) ×4 ×(2)

(size ×four directions × (inner and outer))

Page 35: OCR  a survey

3535

Zoning on Contour Curves 3 (Cao)Zoning on Contour Curves 3 (Cao)

When the contour curve is close to a zone border, small variations in the curve may lead to large variations in the feature vector

Solution:

Fuzzy border

Page 36: OCR  a survey

3636

Zoning of SkeletonsZoning of Skeletons

Features: length of the character graph in each zone (9 or 3).

By dividing the length with the total length of the graph, size independence can be achieved.

Additional features: the presence or absence of junctions or endpoints

Page 37: OCR  a survey

3737

The Neural Network The Neural Network Approach for Digit Approach for Digit

RecognitionRecognition

Le Cun et al:

• Each character is scaled to a 16×16 grid

• Three intermediate hidden layers

• Training on a large set

Advantage:

• feature extraction is automatic

Disadvantage:

• We do not know how it works

• The output set (here 0-19) is small