OPTICAL CHARACTER RECOGNITION - Computer …ernest/slides/ocr0607.pdf · OPTICAL CHARACTER RECOGNITION 2 Outline Outline ... • Post-proccessing ... Document understanding of mostly

1

Màster de Visió per Computador Curs 2006 - 2007

OPTICAL CHARACTER RECOGNITION

2Outline

Outline

• Introduction• Pre-processing (document level)

– Binarization– Skew correction

• Segmentation– Layout analysis– Character segmentation

• Pre-processing (character level)• Feature extraction

– Image-based features– Statistical features– Transform-based features– Structural features

• Classification• Post-proccessing

– Classifier combination– Exploitation of context information

• Examples of OCR systems• Bibliography

2

3

Optical Character Recognition

StatisticalPattern Recognition

StructuralPattern Recognition

Document Analysis

OpticalCharacterRecognition

Methods

Applications

Introduction

Pattern Recognition

Image Processing

4

Some examples

Books, journals, reports

Postal addresses

Drawings, maps

IdentityCards

License plates

Quality controlIntroduction

PDAs

Cheques, billsOld documents

3

5

Document Image Analysis

What is a document? Objects created expressly to convey information encoded as iconic symbols– Scanned images from paper documents– Electronic documents– Multimedia documents (video with text)– …

Document image analysis is the subfield of digital image processing that aims at converting document images to symbolic form for modification, storage, retrieval, reuse and transmission.

Document image analysis is the theory and practice of recovering the symbol structure of digital images scanned from paper or produced by computer.

Introduction

G. Nagy: Twenty years of document image analysis in PAMI. IEEE Trans. on PAMI, vol. 22, nº 1, pp. 38-62 January 2000.

6

Applications of DIA

Applications of DIA

Document Imaging:

Document understanding•Recognition

•Interpretation

•Indexing

•Retrieval

•Digitization

•Storage

•Compression

•Re-printing

Introduction

4

7

DIA tasks

•Text-graphics separation

•Symbol recognition

•Interpretation

•Segmentation

•Layout analysis

•OCRDocument Understanding

•Acquisition

•Binarization

•Filtering

•Vectorization

•Acquisition

•Binarization

•Filtering

•Skew correction

Document Imaging

Mostly graphicsMostly text

Introduction

8

Outline of the course

1. Acquisition2. Pre-processing

− Binarization− Skew correction

3. Layout analysis4. Character segmentation5. OCR

− Feature extraction− Classification− Post-processing

Focus: Document understanding of mostly text documents

Introduction

5

9

Categorization of Character Recognition

Optical Character Recognition

Machine-printed character recognition

Hand-written character recognition

On-linecharacter recognition

Off-linecharacter recognition

According to the type of writing

According to the type of acquisition

Introduction

10


• Characters are totally defined by the font type:– Dimensions (segmentation)

• Character width

• Inter-character separation

• Character height

– Shape (recognition)• Typographic effects (boldface, italics, underline).

• Challenges:– Similar shapes among characters

– Multiple fonts

– Joined characters

– Digitization noise: broken lines, random noise, heavy characters, etc.

– Document degradation: old documents, photocopies, etc.

Introduction

6

11


• Classification of machine-printed OCR systems– Monofont:

• One single type of font

– Multifont:• Recognition of a fixed and known set of fonts

• It is necessary to identify and learn the differences between characters of all thetypes of fonts

– Omnifont:• Recognition of any arbitrary type of font, even if it has not been previously

learned

Introduction

12

Off-line hand-written character recognition

• Hand-written

• Off-line: acquisition by a scanner or a camera

• Challenges:– Shape variability among images of the same character

– Character segmentation

• Subproblems:– Hand-written numeral recognition: digit recognition

– Hand-printed character recognition: well-separated characters

– Cursive character recognition: non-separated characters

Introduction

7

13

On-line hand-written character recognition

• On-line acquisition– Digitizer tablets

– Digital Pen

– Tablet PC

• Advantages with respect to off-line acquisition:– Image is acquired while the text is written

– We can take advantage of dynamic information:• Temporal information: writing order, stroke segmentation, etc

• Writing speed

• Pen pressure

• Subproblems:– Cursive script recognition.

– Signature verification/recognition.

Introduction

14

Levels of difficulty in character recognition

• Little shape variability• Small number of characters• Little noiseL

evel

0 0.0. Printed characters. Specific font. Constant size. Roman alphabets.

0.1. Constrained hand-printed characters. Arabic numerals.

Lev

el 1 • Medium variation in shape

• Medium noise

1.0. Printed characters. Multiple fonts. Nº characters < 100

1.1. Loosely constrained hand-printed char. Nº char < 100

1.2. Chinese characters of few fonts

1.3. Loosely constrained hand-printed char. Nº char≈1000

Lev

el 2 • Much variation in shape

• Heavy noise

2.0. Printed characters of multiple fonts

2.1. Unconstrained hand-printed characters

2.2. Affine transformed characters

Lev

el 3 • Nonsegmented strings of characters

3.0. Touching/broken characters

3.1. Cursive handwriting characters

3.2. Characters on a textured background

Introduction

•S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999.•S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier. Kluwer Academic Publishers. 1999.

8

15


Level 00.0. Printed character of a specific font with a constant size

• Constant size

• Connectivity of characters

• Variation in the stroke thickness

• Little noise

0.1. Constrained hand-printed characters

• Characters are written according to some instructions or box guidelines

Solved problem

Introduction

16


Level 11.0. Printed characters of multiple fonts

1.1. Loosely constrained hand-printed characters

1.2. Chinese characters of few fonts.

1.3. Loosely constrained hand-printed characters. Nº characters ≈ 1000

Solved problemIntroduction

9

17


Level 22.0. Printed characters of multiple fonts

2.1. Unconstrained hand-printed characters

2.2. Affine transformed characters

Introduction

18


Level 33.0. Caràcters no separats o trencats

3.1. Cursive handwriting characters

3.2. Characters on a textured background

Introduction

10

19

Databases for OCR

Off-line Hand-written characters On-line characters

Machine-printed

CEDAR CENPARMI NIST UNIPEN Univ. Washington

• 50.000 segmented numerals from zip codes

• 5.000 zip codes • 5.000 city

names • 9.000 state

names

• 17.000 manually segmented numerals from zip codes

• More than 1.000.000 characters from forms

• Several learning and test sets (more variability)

• 91.500 sentences with a dictionary

• Definition of a format to represent on-line data

• 4.500.000 characters

• Segmented characters, words and sentences

• More than 1500 pages of articles in english

• More than 500 pages of articles in japanese

• Originals, photocopies and pages with arificially generated noise

• Page segmentation into labeled zones

Introduction

20

Performance evaluation of OCR systems

• Hand-printed Character Recognition– Institute for Posts and Telecommunications Policy (IPIP) – Japan - 1996– 5000 hand-written numerals from japanese zip codes– Performance of the best system: 97.94% (human performance: 99.84%)

• Machine-printed Character Recognition– “The fifth Annual Test of OCR Accuracy”. Information Science Research

Institute. TR-96-01. April 1996. http://www.isri.unlv.edu– 5.000.000 characters from 2000 pages in journals, newspapers, letters and

technical reports– Performance in good quality documents: 99.77% - 99.13%– Performance in medium quality documents: 99.27% - 98.21%– Performance in low quality documents: 97.01% - 89.34%

• Performance 99% => 30 errors /page (3000 characters/page)

Introduction

11

21


Introduction

0.8499.1630,000MLP

NIST SD19 database

2.3497.662007VSVM

USPS database

1.398.72000VSVM

CENPARMI database

1.6898.3210,000POE

0.8299.1810,000LeNet-5

0.5699.4410,000VSV2

0.3899.6210,000VSVMb

0.9499.0610,000GPR

MNIST database

Error (%)Recognition (%)N of test samplesClassifier

C.Y. Suen, J. Tan: Analysis of errors of handwritten digits made by a multitude of classifiers. PRL 26, pp. 369-379. 2005

22


Introduction

62.7011.7525.56100Percentage (%)39574161630Sum81830119MLP

NIST SD 19 database21131347VSVM

USPS database164626VSVM

CENPARMI database118941168POE51141782LeNet53291556VSV21761538VSVMb59112494GPR

MNIST databaseCategory 3Category 2Category 1N. of errorsClassifier

12

23

Components of an OCR system

ACQUISITION

SEGMENTATION

CHARACTERPRE-PROCESSING

FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING

• Layout analysis• Text/graphics separation• Character segmentation

• Filtering• Normalization

• Image-based features• Statistical features• Transform-based features• Structural features

Context infromation

LEARNING

Models

Introduction

DOCUMENT PRE-PROCESSING

• Filtering.• Binarization.• Skew correction.

24Acquisition

Acquisition. Scanners

A scanner is a linear camera with a lighting and a displacement system

text

Document Lights

Scan line Opening

CCD

Video circuit Digital image

Lens

13

25

Important features in a scanner:• Optical resolution / Interpolated resolution. • Bits/píxel (depth).• Speed (acquisition and calibration).• Connection (paralel, USB, SCSI).• Programming tools (TWAIN protocol, specific programming languages, such as SCL d’HP, etc.).• Automatic feeding

Acquisition. Scanners

Types of scanners:• Flatbed scanners. The CCD line is displaced along the paper• Traction scanners. The paper is displaced through the CCD line• Others. Specific scanners for negative films, cards, passports, etc.

Acquisition

26

Acquisition. Scanners vs cameras

• Resolution determines:• Quality of image• Size of image• Speed of acquisistion

• Minimal resolution for OCR: 200dpi. • 12 points character (2x3mm aprox.) at 200dpi generates an image 16x24.

• A4 page (297x210 mm) • Scanner at 200dpi : image 2376x1680. • Camera 1024x1024 pixels: resolution of 90dpi.

• Dni (85x55 mm) • Scanner at 200dpi: image 670x435 punt. • Camera 1024x1024 pixels: resolució of 300dpi.

B 200 dpi Noise

Acquisition

14

27

Acquisition. Scanners vs cameras

• Advantages of scanners:– Cost

– Resolution

– Lighting is under control

– Control of optical distortions

• Advantages of cameras:– Acquisition speed

– More flexibility to adapt to the environment and to the material to read

Acquisition

28

On-line acquisition

• Input device: simulates pen and paper– The image is acquired while it is generated– Special input device that provides x and y coordinates along the time.

• Components of the device– Pen, paper and support. At least, one of these components must be special

• Technical specifications– Resolution– Sample frequency– Precision

Digitizer tabletwithout display

Digitizer tabletwith display

Tablet PC Digital penand paper

Acquisition

15

29


ACQUISITION

SEGMENTATION


FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING




Context infromation

LEARNING

Models



Pre-processing

30

Binarization

• Global methods: – Apply the same global threshold to all the pixels of the image

• Local adaptative– Apply a different threshold to every pixel depending on the local

distribuition of gray values

• Special case: binarization of textured backgrounds

Pre-processing

O.D. Trier, T. Taxt: Evaluation of binarization methods for document images. IEEE Trans on PAMI, vol. 17, nº 3, pp. 312-315. 1995

16

31

Binarization: Otsu

• Two classes of gray-scale levels: black pixels (foreground) and white pixels (background).

• Probabilistic criterion to select the threshold:– To maximize inter-class variability

– To minimize intra-class variability

Pre-processing

N. Otsu: A threshold selection method from gray-level histograms. IEEE Trans on systems, man andcybernetics, vol. 9, nº 1, pp. 62-66, 1979

32

Otsu

• The probability distribution of gray levels is defined as:

where ni is the number of pixels with gray level i and N is the total number of pixels.

• We want to define two classes of gray levels:– C1: [1..k]

– C2: [k+1..L]

where k is the threshold

• The mean and standard deviation of each class are defined as:

N

np i

i =

∑=

=k

i

CiiP1

11 )|(µ ∑+=

=L

ki

CiiP1

22 )|(µ

11)|(

ωip

CiP =2

2 )|(ω

ipCiP =

( )∑=

−=k

i

CiPi1

12

121 )|(µσ ( )∑

+=

−=L

ki

CiPi1

22

222 )|(µσ

∑=

=L

iiT ip

1

µ

( )∑=

−=L

iiTT pi

1

22 µσ

∑=

=k

iip

11ω ∑

+==

L

kiip

12ω

Pre-processing

17

33

Otsu

• We define within-class and between-class variances:

• The following criteria of class separability are equivalent:

• Then, the optimal threshold is:

222

211

2 σωσωσ +=W

( ) ( )222

211

2TTB µµωµµωσ −+−=

2

2

W

B

σσλ = 2

2

W

T

σσκ =

2

2

T

B

σση =

1+= λκ 1+=

λλη

)(maxarg 2

1kk B

Lkσ

≤≤=

Pre-processing

34

Binarization: Local adaptative binarization

• The threshold at every pixel depends on the local distribution of gray levels on the neighborhood of the pixel.

• Niblack’s method:

– m(x,y) and s(x,y) are the mean and standard deviation in a local neighborhood of the pixel.

– Window size: 15 x 15– k = -0.2

• Eikvil et al.’s method:

– For every pixel, we define a small window S (3 x 3) and a large window L (15 x 15).

– The threshold value is selected by applying Otsu to the large window L.– The pixels in the small window S are thresholded using this value.

),(),(),( yxskyxmyxT ⋅+=

Pre-processing

W. Niblack: An introduction to digital image processing, pp. 115-116, Prentice Hall, 1986

L. Eikvil, T. Taxt, k. Moen: A fast adaptative method for binarization of document images. International Conference on Document Analysis and Recognition 2001, pp. 435-443

18

35

Binarization: evaluation

Otsu

Otsu

Eikvil

Niblack

36

• Visual criteria of evaluation– Broken line structures: gaps in lines

– Broken symbols and text: symbols and text with gaps

– Blurring of lines, symbols and text

– Loss of complete objects

– Noise in homogeneous areas: noisy spots and flase objects in bothbackground and print

• Scale of 1-5 for each criterion


19

37


38

Binarization: Textured Backgrounds

1. Selection of candidate thresholds• Iterative application of Otsu

• At each iteration Otsu is applied to the part of the histogram with the lowest mean

• The number of iterations depends on the number of peaks in the histogram

• We get a set of possible threshold values

2. Computation of a set of texture features for each possible binarization• Based on the computation of run-length histogram

of the binarized image, R(i), i∈{1,..L}

3. Selection of the optimal binarization

Pre-processing

Y. Liu, S. Srihari: Document image binarization based on texture features: IEEE Trans. on PAMI, vol. 19, nº 5, pp 540-544, 1997

20

39


• Texture features– Stroke width: run-length with the highest frequency

– Stroke-like pattern noise: relevance of stroke-like patterns in the background between consecutive threshold selections. If it is high, it denotes stroke-like patterns due to texture.

– Unit run noise: relevance of unit run-lengths. Ideally, it should be low

– Long run noise: relevance of long rung-lengths. Ideally, it should be low

– Broken character: it will be high if characters are broken.

iiiR 1),(maxargSW ≠=

1,)(max

)(max

1

≠=+

iiR

iRSPN

ji

ji

1,)(max

)1( ≠= iiR

RURN

i

1,)(max

)(≠= ∑ > i

iR

iRLRN

i

Li

)}(maxarg,0{',)(max

)(min ' iRIiR

iRBC i

i

Ii == ∈

Pre-processing

40


• Experimental results show it is enough with 2 iterations of otsu

• Decision tree to select the optimal threshold (between T1 and T2, T1 > T2)

1. Select the value with larger stroke width feature.

2. If T1 is the selected value,– If background noise features are low, T1 is the selected threshold.

– Otherwise, if T2 does not result in many, broken characters, select T2

3. If T2 is the selected value,– If the broken character feature is low, select T2

– Otherwise, if noise features are low with T1, select T1

4. If neither of both thresholds is quite good, select the average between them

Pre-processing

21

41

Binarization of textured backgrounds

Pre-processing

42

Skew Correction : Projection profiles

1. Compute the horizontal projection at several angles (depending on the desired resolution)

2. For every projection, compute a directional criterion (that estimates the difference between maxima and minima in the projection)• Sum of squared differences in adjacent rows• Variance of the number of black pixels in a scan line

3. Select the angle that maximizes the directional criterion

Pre-processing

W. Postl: Detection of linear oblique structures and skew scan in digitized documents. 8th InternationalConference on Pattern Recognition, pp. 687-689, 1986

22

43

Skew Correction : Projection profiles

• Modification of the previous algorithm: – Use only the bottom-centers of connected components to compute the

projection profiles – Reduces computation cost

H.S. Baird: The skew angle of printed documents. Proc. Of the Society of Photographic Scientist andEngineers, pp. 14-21, 1987.

44

Analysis of neighboring connected components

1. Compute connected components.2. For every connected component, search the k nearest negihbours

(k = 5)3. For every pair of connected components, compute the angle

between the centroids.4. Compute the histogram of angles.5. Estimation of the skew angle: maximum of the histogram

Pre-processing

L. O’Gorman: The document spectrum for page layout analysis. IEEE Trans. on PAMI, vol. 15, nº 11, 1993

23

45

Analysis of neighboring connected components

• Accurate angle estimation:1. Find connected components in the same line (clusters of pairs of

connected components with an angle near the rough estimation).

2. Fit a straight line (using regression) to the centroids of components in each line.

3. Make the final estimation from these text lines.

Pre-processing

46


ACQUISITION

SEGMENTATION


FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING




Context infromation

LEARNING

Models



Segmentation

24

47

Page Layout Analysis

• Layout analysis: segmentation of the image into several blockswith the same type of information: text, graphics, table, image, etc.

• Methods:– Run-length smearing

– Analysis of connected components

Segmentation

48

Page Layout Analysis

25

49

Page Layout Analysis: Run-length smearing

1. Horizontal run-length smearing• Threshold: inter-character separation:

maximum of the histogram of the width of white runs

2. Vertical smearing• Threshold: inter-line separation: estimated

during skew correction3. Logical AND between both images4. Additional horizontal smearing5. Connected components are the blocks6. Computation of features for each

connected component: aspect ratio, black pixels density, Euler number, perimeter length, perimeter to width ratio, perimeter-squared to area ratio

1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 11 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

Segmentation

J.L. Fisher, S. C. Hinds, D.P. D’amato: A rule-based system for document image segmentation. 10th International Conference on Pattern Recognition, pp. 567-572, 1990.

50


• A set of rules classifies each block into text or non-text according to the features for each connected component

26

51


Segmentation

52

Page Layout Analysis: Analysis of connected components

1. Detection of connected components

2. Definition of distance and overlap among components

3. Grouping of connected components in the same line:• Dx below a distance threshold

• Vy above an overlap threshold

)](),(min[)](),(max[),( jliljuiujix oXoXoXoXooD −=

)](),(min[)](),(max[),( jliljuiujiy oYoYoYoYooD −=

)](),(min[

),(),(

ji

jixjix oWoW

ooDooV

−=

)](),(min[

),(),(

ji

jiyjiy oHoH

ooDooV

−=

Segmentation

A.K. Jain, B. Yu: Document representation and its application to page decomposition. IEEE Trans. onPAMI, vol. 20, nº 3, pp. 294-308, 1998

27

53


4. Classification of lines between text and non-text• Text lines:

• Height below a threshold and horizontally aligned (standard deviation of bottom edges of CC below a threshold)

• Width over a threshold and all CC have similar height (ratio between mean and standard deviation less than a threshold)

5. Grouping of text lines into text regions• Vertically close and horizontally overlapped

6. Grouping of non-text lines into non-text regions• Vertically close and horizontally overlapped• Horizontally close and vertically overlapped

7. Identification of image regions from non-text regions• Large regions• The ratio of black pixels is large

8. Identification of table regions• Detection of horizontal and vertical lines with similar orientations• Similar height of CC

9. Remaining non-text regions: drawing regions

Segmentation

54


Segmentation

28

55

Page Layout Analysis: Multiscale analysis

1. Application of wavelets to obtain a multiscale representation

2. Computation of local features (local moments) from the wavelet representation

3. Training of a neural network to classify each block as text, image or graphics according to these local features.

4. Propagation of classification through adjacent blocks and between different scales of the wavelet representation

( )n

Wxw

iin

i

ifxf

WW

1

)(||

1)( ⎟⎟

⎠

⎞⎜⎜⎝

⎛−= ∑

∈

−µ

Segmentation

56

Page Layout Analysis: Performance Evaluation

• Goals– Evaluate the performance of several commercial OCR engines on a set

of journal pages

– Use the results of the evaluation to define methods for the combinationof these engines in order to improve the overall performance

• Therefore, evaluation must permit to determine the strengths andweaknesses of each method

29

57


Segmentation

Ground-truthOutput of segmentation

Error zones

CorrectMisrecognition

Unrecognition

58


30

59


• 5 types of zones:– Text

– Graphics

– Table

– Background

– Text Over Image

• Comparation between the ground-truth and the output of engines

Output zones

Ground-truth zones

Overlapping between output and ground-truth zones

Segmentation

60


• Evaluation measures: more than 100 measures grouped in sixcategories– Good recognition measures: percentages of ground truth area (grouped

by zone type), recognized as zones with same type in the output, i.e. text zones in ground truth recognized as text zones, graph zones recognized as graph zones, etc.

– Unrecognition measures: relative area of zones with type text, graph, textoverimage or table in ground truth recognized as background

– Misrecognition measures: zones in ground truth recognized as a different type, for example text zone recognized as (graph, textoverimage or table).

– Overlap measures: relative area (grouped by type) recognized twice by a zoning engine.

– Split and merge measures: how many zones are recognized and assigned in terms of splitting and merging errors.

Segmentation

31

61


• Experiments– Creation of the ground-truth for 100 journal pages

– Evaluation of six OCR engines

– Tests with two formats of images (TIFF and JPEG) and 4 imageresolutions (100 dpi, 200 dpi, 300 dpi and 400 dpi)

– Evaluation of a simple combination scheme

Segmentation

62


32

63


64


33

65


66


34

67


68


Text correct recognition

0102030405060708090

100

TIFF400 TIFF300 TIFF200 TIFF100

ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

Text correct recognition

0102030405060708090

100

JPG400 JPG300 JPG200 JPG100

ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

Text misrecognition

0102030405060708090

100


ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

Text missrecognition

0102030405060708090

100


ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

Segmentation

35

69


Graph correct recognition

0102030405060708090

100


ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

Graph correct recognition

0102030405060708090

100


ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

ToI correct recognition

0102030405060708090

100


ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

ToI correct recognition

0102030405060708090

100


ABBYY 6

ABBYY 7

PODCORE

PODCORE2

SCANSOFT

XVISION

COMB

Segmentation

70


• Conclusions– Image format makes no difference in the final results

– No ideal resolution• Sometimes better at 300dpi, sometimes, better at 400dpi

• Results are a little bit lower at 200dpi or 100dpi

– Combination can improve the results, but more advanced combinationschemes should be defined

Segmentation

36

71

Text/Graphics Separation

• Analysis of connected components– Size: characters are smaller than graphic components– Aspect ratio x-y: characters are more “squared” than graphic components– Pixel density: characters are more dense than graphic components

• Grouping of characters: Hough transform and component proximity

• Difficulties– Joined characters– Characters touching to lines

Segmentation

L.A. Fletcher, R. Kasturi: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. on PAMI, vol. 10 nº 5, pp. 910-918, 1988

72


• Detection and removal of lines at given orientations: horizontal, vertical, ±22.5, ±45, ±67.5

– Detection of long consecutive runs of black pixels after rotating the image at given orientations.

• Analysis of connected components to separate text and graphics

Z. Lu: “Detection of Text Regions From Digital Engineering Drawings”. IEEE Trans. on PAMI, Vol. 20, n.4, pp. 431-439. April 1998.

Specific methods

37

73


•Vertical and horizontal run-length smearing to join components

•Classification of final components as text or graphics based on density and size of components

•Recovering of the original image from the enclosing rectangles of text components

74

• Segmentation of characters in blocks of text• Levels of difficulty:

– Characters with uniform separation and fixed width

– Well separated characters with proportional width

– Broken characters

– Touching characters

– Broken and touching characters

– Cursive script

– Hand-printed words

– Handwritten cursive words.

Character segmentation

IntroduccióSegmentation

•R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996.•Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1, pp. 67-80, 1995.•Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. PatternRecognition, Vol. 29, n. 1, pp. 77-96, 1996.

38

75

• Relevant features for segmentation– Character width and height

– Distance between characters

– Inter-character interval (distance between character centres)

– Aspect ratio.

– Baseline and Top Baseline

– Ascenders and Descenders

Character segmentation

Segmentation

76

Character segmentation: classification of methods

• External segmentation:– Segmentation before recognition. Independent processes

– The goal is to find the exact location of character separation

– Low performance with cursive script, touching characters or handwriting

• Internal segmentation:– Based on Sayre Paradox: a letter cannot be segmented without being

recognized and cannot be recognized without being segmented

– Segmentation and recognition are done at the same time

– Recognition generates or validates segmentation hypothesis

• Holistic methods– No character segmentation

– Recognition tries to recognize words without recognizing individual characters

Segmentation

39

77


analytical holistic

external(based on image)

internal(based on recognition)

hybrid

dissectionpost-process(graphemes)

windowing Feature-based

Hidden MarkovModel

No Markov

MarkovDynamic

programming

Segmentation

78

External segmentation

• Image decomposition in sub-images using general features

• Each subimage corresponds to a possible character

• Combination of several methods:– Connected component labelling

– Run-Lenght smearing.

– Projections and X-Y tree decomposition

– Analysis of contours

– Analysis of profiles

Segmentation

40

79

External segmentation : Connected component labelling

?

Original image Labelling accordingto neighbours

End of labelling Unification of equivalent classes

Segmentation

80

External segmentation : Projections and X-Y trees

If text lines are perfectly separated, only one vertical projection is required. Otherwise, it is necessary to apply several vertical and horizontal projections

i

... ... ...1.1 1.2 1.10 2.1 2.2 1.92.4

2.4.1 2.4.2

1 2projecció hor.

projecció vert.

projecció hor.

t h s c o

.

X-Y tree

Segmentation

41

81

External segmentation : projections

)(

)1()(2)1()(

xV

xVxVxVxF

++−−=

It is more robust to use the second derivative of the projectionwith respect to the value at each point of the histogram (it enhances the projection minima)

Segmentation

82

AND

Smearingvertical

External segmentation: Run-length smearing

• Run-lengths: sequences of consecutive pixels with the samecolour in a row or column

• Smearing: inversion of runs with a length below a certainthreshold

Smearinghoritzontal

Segmentation

42

83

External segmentation: analysis of profiles

• Determination of the point of separation between characters:– Follow the profile beginning at a minimum up to the following

maximum– Distance between upper and lower profile

Upper profile

Lower profile

Segmentation

84

External segmentation: post-process

• Problem: broken and touching characters– Analysis of the bounding box: definition of several rules that permit

to join or break them properly• Estimated character size (width and height)

• Number of estimated characters

• Component aspect ratio

• Proximity/overlapping of bounding boxes

Segmentation

43

85

External segmentation: post-process

• “Hit and Deflect” strategy– Starting point: maximum of lower profile / minimum of upper profile

– Contour following:• Vertical scan to find a contour point

• Move the scan point according to the value of neighbouring pixels:– To the right/left if one pixel correspond to the character and the other does not

– Up, if both neighbouring pixels belong to the character

Segmentation

M. Shridhar, A. Badreldin. “Recognition of Isolated and Simply Connected Handwritten Numerals”. Pattern Recognition, Vol. 19, n. 1, pp. 1-12, 1986.

86

External segmentation: oversegmentation

• Segmentation of the image into subimages that do notnecessarily correspond to individual characters

• Analysis form the contour minima and maxima

– Detection of significant contour minima:• Cut points

• Lower extreme of a character

– Generation of possible cut points:• For each contour minima, search to the left and to the right those points

that correspond to a single vertical run with low density

– Compactation of nearby cut points

Segmentation

R. Bozinovic, S. Srihari. “Off-line Cursive Script Word Recognition”. IEEE Trans. on PAMI, Vol. 11, n. 1, 1989

44

87


analytical holistic

external(based on image)

internal(based on recognition)

hybrid

dissectionpost-process(graphemes)

windowing Feature-based

Hidden MarkovModel

No Markov

MarkovDynamic

programming

Segmentation

88

Internal segmentation

• Segmentation and recognition at the same time

• Two approaches:– Windowing

• Sequential scan the image from left to right

• Generation of segmentation hypotheses

• Detection of the cut points with the best recognition performance (verification step)

– Based on image features• Feature detection

• Generation of possible correspondences between features and letters

• Search for the best possible combination among all correspondences

• Two types of methods: Markov-based and non Markov-based

Segmentation

45

89

Internal segmentation: windowing

• A mobile window is used to generate possible segmentation sequences

• Each possible segmentation is validated by the recognition process

• Search for a segmentation sequence that yields to a valid final result

Segmentation

R.G. Casey, G. Nagy: Recursive segmentation and classification of composite patterns. 6th International Conference on Pattern Recognition, pp. 349-451, 1986

90

Internal segmentation: windowing

• Shortest Path Segmentation– Representation of all segmentation possibilities

with a graph:• Nodes: all possible combinations of pre-

segmented zones

• Edges: neighbour compatibility between pre-segmented zones

– Using a neural network, each node is assigned a recognized character, with a measure ofconfidence (distance)

– Finding the shortest path in the graph isequivalent to finding the best possiblesegmentation

Segmentation

C.J.C. Burges, J.I. Be, C.R. Nohl: Recognition of handwritten cursive postal words using neuralnetworks. Proc. USPS 5th Advanced Technology Conference, p. A-117, 1992

46

91

• Graph-based:– Representation of image skeleton with a graph

– Subgraph matching to find possible correspondences with the characterprototypes

– Creation of a network• Nodes: recognized prototypes labelled with the matching cost

• Edges: adjacency relationship among nodes

– Recognition: searching the optimal path in the network

Internal segmentation: Feature-based

CU

I J

S TMO

C X T

E R

Segmentation

J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9, pp. 903-909, 1995

92


ACQUISITION

SEGMENTATION


FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING




Context infromation

LEARNING

Models



Character pre-processing

47

93


Some usual pre-processing operation in OCR:

• Filtering: noise reduction

• Thinning

• Binarization

• Normalization: – Reduce character variability

– Converting the character to a normal shape:• Orientation

• Slant

• Size

• Stroke thickness


94

Normalization

Inverse transforms to reduce intra-class variance

The most usual normalization transforms are:

• Rotation. Rotated scan, text in graphic documents

• Slant. Cursive fonts or handwriting

• Stroke thickness. Bold fonts of very thin strokes, handwriting with different pen thickness

• Size. Titles, footnotes, handwriting


48

95

Normalization

⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎠

⎞⎜⎜⎝

⎛

y

x

t

t

y

x

aa

aa

y

x

2221

1211

'

'

Type Parameters Meaning

Traslation aij=0 ; i,j=1,2 tx, ty: traslation factors

a11=sx a12=0 Scaling

a21=0 a11=sy

sx, sy: scaling factors

a11=cos α a12=-sin α Rotation

a21=sin α a22=cos α α: rotaion angle

a11=1 a12=tg β Slant

a21=0 a22=1 β: slant angle


96

Normalization

• Rotation.– To determine the baseline of a set of aligned characters: projection

analysis, Hough transform

– To determine the orientation of a single characters: inertia axes (moments of second order)

• Slant.– Approximation of slant angle from the regression line of the character

pixels

– Approximation from the orientation of “vertical” segments in a word


49

97

Normalization

• Size.– Normalization to a standard size

– Normalization of the relation between x size and y size

– Usually, it is done from the bounding box of the character

– Some pixel resamping is required: interpolation to avoid ”aliasing”

• Stroke thickness– It is not straightforward to determine because of variability in the

character itself

– Approximation of stroke thickness:• It is assumed that the character stroke has length l and width w.

• The width is estimated from the area and perimeter of the character

– Then, morphological operations (dilation and erosion) are used to normalize the character to standard width


98

Non-linear normalization

• Optical distortion– Parameter estimation with least squares criterion using a calibration

image

tcoefficien Distorsion :

tcoefficienion Magnificat :

)(

)(

'

'22

22

d

m

dm

C

C

yyx

xyxC

y

xC

y

x⎟⎟⎠

⎞⎜⎜⎝

⎛

++

+⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎠

⎞⎜⎜⎝

⎛


50

99


ACQUISITION

SEGMENTATION


FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING




Context infromation

LEARNING

Models



FeatureExtraction

100

Feature-based Recognition

• Probably, the selection of the method for feature extraction is the mostimportant factor in order to achieve good recognition performance

Which are the best features to discriminatebetween the characters?

FeatureExtraction

51

101

Feature Extraction

• Selection of appropiate features:– It is a critical decision

– It depends on the specific application

– Features must be invariant to the character variations (depending on theapplication): rotation, degradation, noise, shape distortion

– Low dimensionality to avoid large learning sets

– Features determine the type of information to work with: gray-levelimage, binary image, character contour, vectorization of the skeleton, etc.

– Features also determine the type of classifier

To extract from the image the most relevant information for classification, i.e., to minimize intra-class variability while maximizinginter-class variability

FeatureExtraction

102

Feature Extraction

• Image-based features– Projection– Profiles– Crossings

• Statistical features– Moments– Zoning

• Global transforms and series expansion– Karhunen-Loeve– Fourier descriptors

• Topological and geometric features. Structural analysis– Contour analysis– Skeleton analysis– Topological and geometric features

FeatureExtraction

O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996.

52

103

Image-based Features

• All the image as feature vector– Classification by correlation

– Very sensitive to noise, character distortion and similarity between classes.

• x and/or y projections– We can use the accumulated projection too

– Sensitive to rotation, distortion and large number of characters

• Peephole. – Coding with a binary number some pre-

selected pixels of the image

– Pre-selected pixels can vary depending on thecharacter to be recognized

010111011

AFeature

Extraction

104


• Contour profiles– Left (right) profile: contour minimum (maximum) x at every y value

– Lower (upper) profile: contour minimum (maximum) y at every xvalue

– Feature:• Profile values

• Difference between consecutive profile values

• Maxima and minima of the profile

• Maxima of the difference between profile valuesFeature

Extraction

F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. PatternRecognition, 24(10), pp. 969-983, 1991

53

105


• Crossing method– Features are computed from the number of times that the character is

crossed by vectors along some orientations, for example, 0o, 45o, 90o, 135o.

– Used in commercial systems because of speed and low complexity

– Robust to some distortions and noise

– Sensitive to size variations

FeatureExtraction

106

Statistical Features

• Methods based on the statistical distribution of pixels in theimage:– Geometric moments

– Zoning.

• Features are robust to distortion and, up to a certain extent, to some style variations

• Low computation time and easy to implement

• A learning step is needed to infer the model of characters

FeatureExtraction

54

107

Statistical Features: Zoning

• The image is divided in n x m cells.

• For each cell the mean of gray levels is computed and all these values arejoined in a feature vector of length n x m.

• We can also use information from the contour or any other feature computed in every zone

orientation num.

0o

45o

90o

135o

1520

FeatureExtraction

F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. PatternRecognition, 24(10), pp. 969-983, 1991

108

Statistical Features: Geometric Moments

• Moments of order (p+q) of image f:

– moo character area (in binary images)

– Center of gravity of the character:

• Central moments (centering the character at the center of gravity):

– Central moments of order 2 (µ20, µ02, µ11) permit to compute:• Main inertia axes

• Character length

• Character orientation

∑∑== =

N

x

N

y

qp

pqyxyxfm

1 1

)())(,(

00

10

m

mx =

00

01

m

my =

∑∑ −−== =

N

x

N

y

qp

pqyyxxyxf

1 1

)())(,(µ

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=0220

112

2

1

µµµθ atan

FeatureExtraction

M. Hu: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theroy 8, pp. 179-187, 1962

55

109

Statistical Features: Geometric Moments

• Invariant moments:– Central moments µpq are traslation-invariant

– Scale-invariants

211

202202

02201

)( νννφννφ

+−=

+=

2 ,q)/2)(p(1

00

≥+= ++ qppqpq µ

µν

– Rotation-invariants (order 2):

– Invariant to general linear transforms:

( ) ( )( )2120321

2211230

2122103302

21102201

4 µµµµµµµµµµµµµ

−−−−=−=

I

I

400

11 µ

ψ I= 400

11 µ

ψ I=

FeatureExtraction

– A set of moment invariants of different orders can be defined in a similar way

T.H. Reiss: The revised fundamental theorem of moment invariants. IEEE Trans. PAMI, vol. 13, nº 8, pp. 830-834, 1991

110

Statistical Features: Zernike Moments

• Geometric moments: – Projection of the function f(x,y) over the monomial xpyq (No orthogonality =>

information redundancy)

• Zernike moments: – Change to polar coordinates to achieve orthogonality and rotation invariance

– Projection of the image over the Zernike polynomials Vnm which are orthogonalinsidethe unitary circle x2+y2=1.

∑−

=

−

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−⎟⎟⎠

⎞⎜⎜⎝

⎛−

+−−=

−≤≥

+==−=

==

2/) (

0

2

22

!2

!

2

!

)!()1()(

and even, is , ,0

),/tan( ,1 where

)(),(),(

mn

s

sns

nm

jmnmnmnm

smn

smn

s

snR

mnnmn

yxxyaj

eRVyxV

ρρ

ρθ

ρθρ θ

FeatureExtraction

A. Khotanzad, Y.H. Hong: Invariant image recognition by Zernike moments. IEEE Trans. PAMI, vol. 12, nº 5, pp. 489-497, 1990.

56

111


• The image is projected over the Zernike polynomials:

• Anm coefficients are the Zernike moments of order n and repetition m:

where x2+y2≤ 1 (the image must be re-scaled to the unitay circle) and * is theconjugate complex

• | Anm| is rotation invariant

• Relation between Zernike moments and geometric moments:

∑∑+=x y

nmnm Vyxfn

A *)],()[,(1 θρ

π

∑∑=n m

nmnmVAyxf ),(),( θρ

( )

( )π

µµπ

µµµ

ππµ

123

,23

,0

,1

022020

11200222

1,111

0000

−+=

−−=

==

==

−

A

jA

AA

A

FeatureExtraction

112


Rows 1 and 2: Zernike moments of order 1-13. This equation is used to display them:

Rows 3 and 4: Image reconstruction from Zernike moments of order 1-13. •To reconstruct the image from Zernike moments:

Orders 1 and 2, for example, represent orientation, height and width

parell. i ,1on ,),(),( 22 mnnmyxyxVAyxIm

nmnmn −≤≤+= ∑

∑ ∑=

−≤∞→

=N

nmn

nmmnm

NyxVlimyxf

0parell

, , nm ),(A),(

FeatureExtraction

57

113


• Image reconstruction using moments up to order 10 (66 moments).

FeatureExtraction

114

Statistical Features: Zernike Pseudo-Moments

• Less noise sensitive than Zernike moments

• Better recognition results

– Removing the constraint:

( ) ( )∑−

=

−

−−−++−+−=

mn

s

snm

nm smnsmns

snR

0 ! !1 !

)!12()1()(

ρρ

venmn e−

FeatureExtraction

58

115

Statistical Features: Invariant Moments

• Experiments show that a robust OCR system needs, at least, 10-15 features, i.e, we need to define between 10 and 15 geometricinvariants

• Handwritten digit recognition:– Moments up to order 6:

• Regular moments (24 moments): 94%

• Zernike moments (23 moments): 95%

• Zernike psedudo-moments (44 moments): 91.5%

– Moments of upper orders: decrease in recognition performance

FeatureExtraction

S.O. Belkasim, M. Shridhar, M. Ahmadi: Pattern Recognition with Moment Invariants: A Comparative Study and New Results. Pattern Recognition, Vol. 24, n. 12, pp. 1117-1138, 1991.

116

Transform-based features

• Instead of using the character image to compute the feature vector, a linear transform is applied to compute the features

where T is a matrix of constant values

• These transforms help to reduce the dimensionality of the feature vector, preserving the most relevant information of the shape of the character

• Original image can be reconstructed from the feature vector

• Features are invariant to some global deformations, such as traslation and rotation

• High computational cost

• Some examples:– Karhunen-Loeve.

– Sèries de Fourier.

fTg ⋅=

FeatureExtraction

59

117

Transform-based features: Karhunen-Loeve Expansion

• KLT is defined as:

where f is the feature vector of the image and f is the mean of all the samples representing the character

• Each column of T is an eigenvector of the covariance matrix:

• Usually, only M (M<d ) eigenvectors are used , corresponding to the largest eigenvalues. In this way, dimensionality is reduced

)( ffTg t −⋅=

∑==

N

ii

fN

f1

1

∑=

−⋅−=N

i

tii ffff

NC

1

)()(1

FeatureExtraction

118


• For each image, we get a feature vector x of dimension d

• The learning set is composed of n samples per class

• The set of samples is represented by matrix X, of dimension nxd.

• For each class, we can compute the covariance matrix R:

• Then, the transform matrix T is defined as:

• The transformation of an image x is:

Tii

T xxxxn

XXn

R ))((1

1

1

1 −−−

=−

= ∑

[ ]

d

d

d

d

R

Rvv

vvT

λλλλλ

L

L

L

L

≥≥

=

21

1

1

1

matrix covariance of seigenvalue:

matrix covariance of rseigenvecto:

xTy T=

FeatureExtraction

60

119


• Usually, the dimensionality is reduced and we only take the m eigenvectors of R with greater weight

• Then, the transform matrix becomes:

• For each image xi, the feature vector yi is:

• Usually m is selected in such a way that the eigenvectors explain some pre-specified percentage of the total variance (usually 0.9 or 0.95).

• The percentage of variance explained by the eigenvectors is given by:

[ ]mvvP L1=

iT

i xPy =

∑=

m

ii

1

λ

FeatureExtraction

120


• Application of KL transform to the NIST database:– Digit recognition: 96% - 97%

– Uppercase recognition: 89% - 90%

– Lowecase recognition: 77% - 82%

M. D. Garris, J.L. Blue, G. T. Candela, P. J. Grother, S. A. Janet, C.L. Wilson: NIST Form-basedhandprint recognition system (release 2.0). Technical report NISTIR 5959, National Institute ofStandards and Technology, USA, 1994.

61

121

Transform-based features: Fourier Descriptors

• Decomposition as a Fourier series of a periodic function of period T

∑∞

−∞=

⋅=n

T

nti

n eCtfπ2

)( ∫−

⋅=T

T

nti

n dtetfC0

2

)(π

∫

∫

∫

∑

⋅=

⋅=

=

⎟⎠⎞

⎜⎝⎛ ⋅+⋅+=

∞

=

T

n

T

n

T

nnn

dtTnt

tfT

b

dtTnt

tfT

a

dttfT

A

Tnt

bTnt

aAtf

0

0

00

10

2sen)(

2

2cos)(

2

)(1

2sen

2cos)(

π

π

ππ

FeatureExtraction

122

The shape contour can be described in polar coordinates:

The contour can be described as a function of the tangent angle:

Defining the function of accumulation of the tangent angle:

This function is normalized in the range [0, 2π]

Finally, this function can be descomposed in Fourier descriptors in the following way:


)()()( siysxsz +=

[ ]

[ ] ααθ

ααθ

dysy

dxsx

s

s

)(sin)0()(

)(cos)0()(

0

0

∫∫

+=

+=

∑∞

=

++=Φ1

0* )sincos()(

kkk ktbktaat

)0()()( θθ −=Φ ll

tLt

t +⎟⎠⎞

⎜⎝⎛Φ=Φ

π2)(*

FeatureExtraction

C. T. Zhan, R. Z. Roskies: Fourier descriptors for plane closed curves. Trans. on computers, vol. C-21, nº 3, pp. 269-281,1972

62

123


• In the discrete case:

• Then:

• Fourier descriptors depend on the starting point• For 64x64 images, it has been shown that 5 coefficients are enough to

discriminate between “2” and “Z”

∑=

−

−−

∆Φ=Φ

⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−

=∆Φ

+=

j

kjj

jj

jjj

jjj

ss

sxsx

sysys

siysxsz

1

1

11

)()(

)()(

)()(tan)(

)()()(

∑

∑

∑

=

=

=

∆Φ=

∆Φ−=

∆Φ−−=

m

k

kn

m

k

kn

m

kk

L

nlk

nb

L

nlk

na

klL

a

1

1

10

2cos)(

1

2sin)(

1

)(1

ππ

ππ

π

FeatureExtraction

124


. when )()(ˆ ),()(ˆ

andlength contour theis where

2sin

2cos)(ˆ

2sin

2cos)(ˆ

10

10

∞→≡≡

⎥⎦⎤

⎢⎣⎡ ++=

⎥⎦⎤

⎢⎣⎡ ++=

∑

∑

=

=

Ntytytxtx

T

T

tnd

T

tncCty

T

tnb

T

tnaAtx

N

nnn

N

nnn

ππ

ππ

[ ]

[ ]

[ ]

[ ]

pixelscontour ofnumber ,

,

, , ,/2on

sinsin2

coscos2

sinsin2

coscos2

1

1

22

11

11

22

11

22

11

22

11

22

mttT

ttyxt

yyyxxxTtn

t

y

n

Td

t

y

n

Tc

t

x

n

Tb

t

x

n

Ta

m

jjm

i

jjii

iiiiiiii

ii

m

i i

in

ii

m

i i

in

ii

m

i i

in

ii

m

i i

in

∑

∑

∑

∑

∑

∑

=

=

−−

−=

−=

−=

−=

∆==

∆=∆+∆=∆

−=∆−=∆=

−∆∆=

−∆∆=

−∆∆=

−∆∆=

πφ

φφπ

φφπ

φφπ

φφπ

Invariance to starting point

The phase shift with respecte the main axis is computed and the coefficients are rotatedaccording to this angle

2

1

2

1

2

1

2

1

11111 )(2tan

2

1

dcba

dcba

−+−+= −θ

In thediscrete

case

⎥⎦

⎤⎢⎣

⎡ −⋅⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡

11

11

**

**

cossen

sencos

θθθθ

nn

nn

dc

ba

dc

ba

nn

nn

nn

nn

FeatureExtraction

Elliptic Fourier Descriptors

F.P. Kuhl, C.R. Giardina: Elliptic fourier features of a closed contour. Comput. Vis. Graphics ImageProcess, vol. 18, pp. 236-258, 1982

63

125


Character 5 reconstructed using elliptic Fourier descriptors of order 1, 2, ..., 10; 15, 20, 30, 40, 50 and 100 respectively

Rotation invarianceThe orientation of the main semi-axis is computed and the coefficients rotated

*1

*1

1 arctgac=ψ ⎥

⎦

⎤⎢⎣

⎡⋅⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡**

**

11

11

****

****

cossen

sencos

nn

nn

nn

nn

dc

ba

dc

ba

ψψψψ

Scale invarianveCoefficients are divided by the magnitude of the main semi-axis

**1

*1

*1

22

acaE =+=

FeatureExtraction

126


• Experiments with handwritten digits (100 images per digit)– 12 elliptic descriptors: 99.7%

– 12 non-elliptic descriptors: 99.5%

• Experiments with digits + lowercase letters:– 12 elliptic descriptors: 98.6%

– 12 non elliptics descriptors: 90.1%

FeatureExtraction

Evaluation

T. Taxt, J.B. Olafsdottir, M. Daehlen: Recognition of Handwritten Symbols, Pattern Recognition, Vol. 23, n. 11, pp. 1155-1166, 1990

64

127

Structural Analysis

• Contour analysis

• Skeleton analysis

• Analysis of topological and geometric features

Methods based on the analysis of the character structure from thedetection of some features and their relationships (the basic idea is to divide the character into its basic parts)

FeatureExtraction

128

2, 4, 5, 1, 4, 3, 1

Structural Analysis: Run-length encoding

• It is the simplest structural representation

• run-length encoding represents each image row as a sequence of pairs (l,g), where each pair represents a sequence of l consecutivepíxels with gray level g.

• For binary images, only the sequence length is required

0 0 4 4 4 2 2 2 2 2 4 4 4 1 1 1 1 1 1 1 1 0 0 0 0 (2,0), (3,4), (5,2), (3,4), (8,1), (4,0)

FeatureExtraction

65

129

Structural Analysis: Run-length encoding

A graph is built on the run-lengthencoding, where:

Nodes: run-lengths.

Edges: overlapping between runs in consecutive rows. 4,7

3,3 8,8

8,8

4,8

3,3

4,7

8,8

9,9

3,3 8,8

3,3 8,8

7,7

6,6Region 1

Region 2

FeatureExtraction

130

Structural Analysis: Chain-code

Chain-codes or Freeman codes are the simplest angular approximation. Theypermit to code each vector di between two consecutive points with a codebetween 0 and 7.

6

21

0

3

4

5 7

The codification of a string S is composed of 3 fields:

• Starting point of the segment po(S)=(xo,yo).

• Segment length l(S).

• A table of directions: Dir(S) = [d0, d1, ... , dl(S)-1] where di ∈[0,7], ∀i∈[0,l(S)-1], according to the following codification::

di-1 di

0

di-1

di

1

di-1

di 2di-1

di3

di-1di

4 di-1

di

5di-1

di

6 di-1

di

7

FeatureExtraction

66

131

Structural Analysis: Contour Analysis

• Contour pixels are codified accoriding to their orientatios and using the chain-code representation

• Classification can be done with methods of structuralrecogntion based on the distance string-edit algorithm byWagner i Fischer

22221220201000070070

Chain Codes

6

21

0

3

4

5 7

FeatureExtraction

132

Structural Analysis: Skeleton Analysis

• Image thinning and representation of the skeleton with some encodingmethod allowing to compare two shapes (sometimes it is necessary to vectorize the skeleton):

– Chain codes

– Grafs

– Gramàtiques

– Zoning

– Característiques discretes.

• Skeleton problems:

– Noise sensitive

– Variability of the representation

B esquelet

FeatureExtraction

67

133


• Representation with graphs or grammars– Based on the detection of characteristic skeleton points and skeleton

polygonal approximation

– Two possibilities to represent the skeleton with a graph:• Nodes are the characteristic points while edges are the segments joining the

points

• Nodes are the segments of polygonal approximation while edges representthe adjancency relations between the segments

v1

v2

v3

v4

v5

v6

a1

a4

a7

a2 a3

a5 a6

v1

v2v3

v4

v5

v6

v7

a1a2

a3

a4a5

a6

a7

FeatureExtraction

134


Representation with graphs:

FeatureExtraction

J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9, pp. 903-909, 1995

68

135


• Zoning.

• Discrete features:– Number of loops

– Number of T joints and X joints

– Number of terminal points, corner points and isolated points

– Cross points with horizontal and vertical axes

A B C

G H I

D E F

Opció 1: stroke length within each zone

Opció 2: coding from the arcs:

ArC, ArD, CcF, DrF, DrG, FcI, GrI

on r = recta, c = corba.

FeatureExtraction

136

Structural Analysis : Topological and geometric features

• Aspect ratio x-y.• Perimeter, area, center of gravity• Minimal and maximal distance of the contour to the center of gravity• Number of holes• Euler number = (n. of connected components) - (n. of holes)• Compacity = (perimeter)2 / (4π ·area)• Information about contour curvature• Ascenders and descenders• Concavities and holes• Loops• Unions, terminal points, crossings with horizontal and vertical axes• Angular information: histogram of segment angles

FeatureExtraction

69

137


ACQUISITION

SEGMENTATION


FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING




Context infromation

LEARNING

Models



Classification

138

Classification

• Different methods, depending on the model of featurerepresentation

• Classification using feature vectors– Correlation– Euclidean distance– Mahalanobis distance– k nearest neigbours– Bayes’ classifier– Neural networks

• Classification with structural features– Dicothomic search– String edit– Graph matching– Grammars

Classification

70

139

Classification using feature vectors

• Correlation– Classification with the class with the largest correlation value

• Minimal euclidean distance– Distance to the mean mi of the class

– It does not take into account differences in variance for each class

)()()(i

T

iimxmxxD −−=

∫∫∫∫∫∫=

R iR

R ii

yxgyxf

dxdyyxgyxffS

22 ),(),(

),(),()(

Classification

140


• Minimal quadratic distance (Mahalanobis).– For each class i, the mean mi and covariance matrix Si are computed

from the set of samples

– The covariance matrix is taken into account when computing thedistance from an image to the class i

– The feature vector of the image x is projected over the eigenvectors of the class

ii

ii

iTii

Tii

Tii

S

S

mxz

zzmxSmxxD

de propis vectors:

de propis valors:

)(

)()()(2/1

1

ΨΛ

−ΨΛ=

−=−−=−

−

Classification

71

141


• k-Nearest Neighbors– Several sample models or each class

– Given an image, we take the k nearest models to the image

– The image is classified into the class with more elemetns in the set of knearest neighbors

• Weighted several nearest neighbors– k depends on the image. For each image x, the set Vx contains all the

models with a distance lower than α times the distance to the nearest model

iVV

xkV

VxD

xi

x

x

ixi

classe la apertanyen que i de dinsestan que models :

a propers més models delsconjunt :

)(

)(

)(=

( )∑∈

=

)(

)(2

)(

,)(

ixVj

ij

ix

ixxd

VxD

Classification

142


• Bayes’ classifier

– An image x is classified into the class i that maximize the posterior probability p(wi|x)

– Applying Bayes’ theorem:

– p(x) is constant and independent of class i. Therefore, it has no influence in classification

– If all classes have the same prior probability, p(wi) we can discard it too. Then:

)(

)()|()|(

xp

wpwxpxwp ii

i =

yprobabilitPosterior . class tobelongs y that probabilit: )|(

. class tobelongsit that

given image vector obseving ofy Probabilit .Likelihood : )|(

imagevector observingofy probabilit: )(

ixxwp

i

xwxp

xxp

i

i

)|(arg)|(arg ii

ii

wxpmaxxwpmax =

Classification

72

143


• If we assume a normal distribution for each class with mean mi and covariance Si, estimated from the set of samples for each class:

• Discarding constants, taking squares and applying the logarithm, we can infer the following discriminant function:

)()(2

12/12/

1

)2()|(ii

Ti mxSmx

in

i eSwxp−−−−−

−

⋅⋅= π

)()(log 1)(ii

Tii

xi mxSmxSD −−−−= −

Classification

144


• Neural networks– They can be applied directly to the image or to a feature vector,

previously computed from the image

– The most used neural networks are multi-layer feed-forward networks

Y.L. Cun et al.Backpropagation applied to zip code recognition, Neural comput. 541-551 (1989).

Each layer represents a feature subvector of a higherlevel

Classification

73

145


• A neural network is organized into several layers. Each layer has a fixednumber of nodes

• In the first layer, the nodes correspond to the values in the feature vector

• In the last layer, the nodes correspond to each of the classes

• Intermediate (hidden) layers represent feature subvectors of a higher level

• The value at each node in a given layer is computed through theapplication of a propagation function to the values of the nodes in theprevious layer, weighted by a vector of weights

1layer node ,layer nodebetween connexion theofweight : w

layer at bias node :

layer of nodes of Nº : )(

layer at node of value:

kji

)1(

1

1

−

⎟⎟⎠

⎞⎜⎜⎝

⎛+= ∑

−

=

−

kikj

kib

kkN

kix

xwbx

ki

ki

kN

i

ki

kji

kj

kj σ

Classification

146


• To design a neural network, we have to decide:– The number of layers

– The number of nodes at each layer

• Learning step, using a set of samples– The weights at each connexion are automatically determined in such a

way that minimize classification error

• Example of neural network– Three-layer perceptron (the input accounts for one layer). The final

discriminant function for each class is:

sigmoidea, )1/(1)(

)()1(

1

)0(

1

)1()1()2()2(

x

N

j

N

kjkjijii

exf

xwbfwbfxD

−= =

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛++= ∑ ∑

Classification

74

147

Classification using structural features

• Dicotomic search. The presence or absence of certainprimitives is tested in several steps. Character models can be organized with decision trees

• String or graph matching. Application of algorithms for stringedit or matching in graphs of atributes

• Grammars. We test if the character belongs to the languagegenerated by the grammar that represents the model

Classification

148

Classification using Deformable models

• Deformable models– We start with an ideal representation of the shape of the object

– This ideal shape is deformed using a set of rules or pre-defined operation in such a way that all possible valid object distortions can be generated

– Given an image, we look for the object deformation that yields the best result for an energy function defined as the combination of two measures

• Internal energy: it measures the deformation degree from the model of the object

• External energy: it measures the degree of similarity of the deformation to the image

– The image is classified into the model with the lowest global energy

Classification

75

149


• Based on a character prototype:– We define a prototype of the character that can be deforme by applying a

series of trigonometric transforms over the image space

– Basis of the transform:

– External energy bassed on the distance of the deformation to the image contour

– Bayesian combination of internal and external energy

))sin()cos(2,0(

)0),cos()sin(2(

nymxe

mynxeymn

xmn

ππππ

=

=

Classification

150


• Based on a set of point generators located on the surface of a spline:– Internal energy:

• The character is represented by a spline. We can modify the control points of the spline

• A probability is generated based on the modification of these control points

– External energy:• Image points are generated from generators located along the spline• A probability is defined based on the distance from image pixels to point

generators

– Minimization:• Probabilistic combination• EM algorithm

Classification

76

151


• Point distribution model:– The model is represented as the mean of a set of points obtained from

the skeleton of the learning samples

– PCA is applied to obtain the set of valid character deformation from the learning set

– For each image, we get the nearest deformation according to the space defined by PCA

– Internal energy:

– External energy: distance between the image and the image obtained using the nearest deformation

Pbxx +=

)( xxPb Tx −≈

xb

Classification

152

Holistic methods

• Used for recognition of handwritten script• Each word is a recognition unit. We try to recognize each word

using global features of it. Each word is a different class• Based on psychological evidences• Applications:

– Constrained domains with only a few words (bank applications, personal agendas, etc)

– Filtering of large domains to reduce the set of possible words

• Usually, they are based on the application of HMM or dynamic programming (string edit) to words (not to letters)

Classification

77

153

Estratègies holístiques

• Global word features:– Distribution of segment orientations (horizontal, vertical, diagonal)

– Terminal points

– Concavities

– Holes

– Loops

– Word length

– Ascenders and Descenders.

– Crossing point with the central line

– Fourier coefficients

• Feature representation:– Feature vectors of matrices: the word is divided into zones. Each zone corresponds

to a component of the feature vector where it is tested the presence or absence ofthe features

– Graphs: adjacency or neighboring relations between detected features

Classification

154

Hidden Markov Models

• Hidden Markov Model– It represents a double stochastic process where a Markov chain is not

directly observable. It can only be inferred from the observation of another stochastic process

– In an HMM, a sequence of observations O=(o1,...,oT) is produced by a sequence of states Q=(q1,...qT)

– An HMM is modelled by:{ }{ }{ }{ }{ }{ }

stategiven a

in symbol a observing ofy probabilit : )|()(b,)(b B

state final theofy probabilit : )(,

statesbetween y probabilitn transitio: )|(a,aA

stateeach for y probabilit Initial : )(,

states ofset : 1;S

symbols observable ofset : 1;

jj

1ijij

1ii

jqvoPkk

sqP

sqsqP

sqP

Nis

MkvV

tkt

jTjj

itjt

i

i

k

====

===Γ

=======Π

≤≤=≤≤=

+

γγ

ππ

Hidden MarkovModels

78

155


Example: Markov Model to model the weather

States: S = {rainy, sunny}Set of observations (possible values of humidity) V = {0%, 25%, 50%, 75%, 100%}Sequence of observations: O = {0%,25%,25%,50%,25%,75%,50%}

sunny rainy

P(s|s)

P(s|p)

P(p|s)P(p|p)

P(0%|s)P(25%|s)P(50%|s)P(75%|s)P(100%|s)

P(0%|p)P(25%|p)P(50%|p)P(75%|p)P(100%|p)

With the model we can:

•Compute the probability that the sequence of observations can be generated with the model of weather•Given a set of observations of humidity, to estimate the model of weather•Decide if the weather has been rainy or sunny for each of the days under observation

Hidden MarkovModels

156


• Three problems related to a HMM:– Given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find

P(O| λ), the probability that the sequence of observations can be generated from the model

• Computed by summing the probabilities of the sequence of observations for all possible sequences of states

• Forward/backward propagation methods

– Learning problem: given a learning set, O, find the parameters of the model λ that maximize P(O| λ).

• Baum-Welch algorithm

– Recognition problem: given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find the optimal sequence of states Q

• Viterbi algorithm

Hidden MarkovModels

79

157

Hidden Markov Models. OCR applications

1. External segmentation with context post-process:• The Markov model represents grapheme variations. Based on bigram

frequencies or trigram frequencies, probabilities of finding sequences of consecutives letters in the dictionary of the language

2. Internal segmentation:• The Markov chain represents character features extracted from left to right

along the text. These features are compared with the model and, with therecognition, it can be decided when a character ends and the next onebegins

3. Holistic methods:• The Markov chain represents variations within a given word, belonging to

a lexicon with the valid words

Hidden MarkovModels

158


• One state per letter

• Transition probabilities are the probabilities of finding two consecutive lettersin the language

• The observations are the pre-segmented zones of the image

• The probability of the observations given a state are the probabilities that eachpre-segmented zone corresponds to the letter associated to the state

a m n ...

pam

pmm

pma

pnm

pmn

paapnn

Hidden MarkovModels

External segmentation with context post-process

80

159


• Model Discriminant HMM:– An HMM for each letter

– Recognition is done letter by letter

– States within an HMM represent zoneswithin the letter with different featurevalues

– Word recognition is done by finding thebest individual HMM combinationaccording to all segmentation possibilities

Hidden MarkovModels


160


• Path-disciminant HMM– One HMM

– Each state corresponds to a letter

– Transitions between states correspond to the probability of changing fromone letter to another within a word

– Over segmentation of the image. Each observation corresponds to a set offeatures extracted from each segment

– The probability of each observation for a given state is the probability ofgenerating the set of features from a given letter

– Recognition looks for the sequence of states (letters) that best generates theset of features extracted from the image

a b c ...

Hidden MarkovModels


81

161


• Example: HMM for word recognition– One HMM for each word– Each letter is represented with four states. Each state is a possible result of

a previous oversegmentation– Recognition looks for the HMM (word) that maximizes the probability of

generating the sequence of observations (features extracted from theimage)

c a

...

Hidden MarkovModels

Holistic methods

162


• Feature extraction:– Computation of 9 features inside a mobile window:

• Number of pixels• Center of gravity• Moments of order 2• Location of upper and lower contour• Orientation of upper and lower contour• Number of black-white transitions in the vertical direction• Number of black pixels between upper and lower contours

• One HMM for each character• HMM are concatenated to compose words. Find the combination

of HMM with the highest probability• Results:

– Vocabulary: 2296 words– Test set: 3850 words, 80 people– Recognition: 82.05%

S. Gunter, H. Bunke: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognition, vol. 37, pp. 2069-2070, 2004.

82

163


ACQUISITION

SEGMENTATION


FEATURE EXTRACTION

CLASSIFICATION

POSTPROCESSING




Context infromation

LEARNING

Models



Post-process

164

Post-process

• Post-process to improve OCR results:– Voting. Combination of several classifiers

– Utilization of context information. Analysis of classification of individual characters in the context of adjacent characters

Post-process

83

165

Post-process. Classifier Combination

Combination of specific classifiers with good performance for some characters, fonts, etc.

Text

Classifier 1 Classifier 2 Classifier k...

ClassifiersCombination

(voting)

Output of individual

classification

Finaloutput

Knowledge

•Integrity: how the voting algorithm controls the activation and configuration of individual classifiers. High integrity => The votingalgorithm decides the best classifiers in eachsituation

•Representation of the classification results:

•Abstract: each classifier simply givesthe label of the class.

•Ranked: each classifier gives severalranked labels.

•Ranked with a degree of confidence: for each label, the classifier gives a levelof confidence.

•Combination of the classifier results. How to combine the results

Post-process

166

Post-process. Classifier Combination

Classif. e

C17=‘s’, c21=‘S’, c4=‘g’, c9=‘8’

x =

?

Post-process

84

167

Post-process. Context information

• Context analysis tries to correct errors produced by decisions taken in function of local features

• In the presence of uncertainty, hypothesis generated by the local classifier are complemented with hypothesis of neighboringcharacters

• Two points of view– Geometric conext (typographic).

– Linguistic context

13 o B?

Post-process

168


• Methods based on n-grams (Combinations of n letters).– Probability that an n-gram appears in the words of the dictionary

– When there are characters with uncertainty, we take the final decisionaccording to the n-gram with the highest probability

– bottom-up techniques

– Viterbi algorithm and Markov methods

• Methods based on grammars– A grammar is used to validate the results of the OCR

– Similar to n-grams, but they permit to consider variable-length stringsand recursivity

vint - centsvuit - cents

Xifra → Desena ‘-’ Unitat |

Unitat ‘-’ CentenaD - CU - C

?

vuit - centsPost-process

85

169


• Methods based on a dictionary– Creation of a dictionary with the set of correct words

– It permits ortographic correction of the text

– Utilization of string-edit algorithms

– Problem: words not included in the dictionary

– Requires structures for representing the dictionary providing a quickaccess

Autòmat per al diccionari: {LLIBRE, LLIURE, CAURE, COURE, COST}

L L I B R E

C A U

O S T

p a r a u l a h paraula

21

n

h(k)clau Hash

function

dictionaryPost-process

170

Examples of OCR systems

• Printed characters:– Readstar (Innovatic).

– Cuneiform (Cognitive Technology).

– Word Scan (Calera).

– OmniPage (Caere).

– Text Bridge (Xerox).

– Neuro Talker OCR (Int. Neural Machines).

– OCR Master.

– Recognita Plus.

– TypeReader Professional.

– Etc.

• Hand-written characters:– Mitek.

Examples

86

171


• Inspection of surgical sachets– Digit recognition: reference and date

– System requirements• Irregular surface: shadows and reflections

• Resolution: 175dpi

• Aqcuisition with a camera B/W

• Diffuse lighting

• Detection of 16 defects in 400ms.

• Verification system

Examples

172


• Preprocess– Skew correction: combination of the angle of the upper contour and the

segments of the box surrounding the word LOT.

– Binarization: otsu

– Thinning.

Examples

87

173


• Segmentation– Connected components from the skeleton

– Application of domain knowledge (character size and separation) tosegments touching or broken characters:

• Divide wide components

• Join thin and nearby components

Examples

174


• Feature extraction: zoning– The size of each zone is not constant: adapted to image size

– Two versions: • Value at each zone: measure of the importance of the zone with respect to the

whole character:– Number of pixels greater than a percentage of the total number in the image 1.

Otherwise 0

– The central region is more important. Value multiplied by 2.

– Three values are added to combine the values in the more discriminant zones

• Value at each zone: percentage of white pixels at each zone

0.00.52.0

111

100

121

101

111

Examples

88

175


• Classification– Version 1

• Model with the minimal distance

• If several models have the same minimal distance and the digit to verify isamong the candidates, it is verified with an ambigous signal

– Versió 2:• Mahalanobis distance

• Learning step to compute mean and covariance for each digit

• Verification of the digit string:– The string is rejected if there are more than a digit mis-recognized or more

than two ambigous digits

∑=

−=n

jjj mid

1

Examples

176

Bibliography

•S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999.

•H. Bunke, P.S.P. Wang. Handbook of Character Recognition and Document Image Analysis. WorldScientific Publishing Company, 1997.

•S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier. Kluwer Academic Publishers. 1999.

•S. Impedovo. Fundamentals in Handwiting Recognition. Springer-Verlag, 1994.

• A. Belaïd, Y. Belaïd. Reconnaissance des formes. Méthodes et Applications. Inter Editions, Paris, 1992.

• A.C. Downtown, S. Impedovo. Progress in Handwriting Recognition. World Scientific PublishingCompany, 1997.

•P.S.P. Wang. Character and Handwriting Recognition: Expanding Frontiers. Special Issue of IJPRAI, Vol. 5 nums. 1,2, 1991.

• T. Pavlidis, S. Mori. Optical Character Recognition. Special Issue of Proceedings of the IEEE, Vol. 80 no. 7, 1992.

• V.K. Govindan, A.P. Shivaprasad. Character Recognition - A Review. Pattern Recognition, Vol. 23, No. 7, pp. 671-683, 1990.

Bibliography

89

177

Bibliography

• O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996.

• R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996.

• Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1, pp. 67-80, 1995.

• Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. PatternRecognition, Vol. 29, n. 1, pp. 77-96, 1996.

• S.W. Lee. Advnces in Handwriting Recognition. World Scientific. 1999.

• C.H. Chen, L.F. Pau, P.S.P. Wang. Handbook of Pattern Recognition and Computer Vision. WorldScientific. 1993.

• J.L. Blue et al. Evaluation of Pattern Classifiers for Fingerprint and OCR Applications. PatternRecognition, Vol. 27, no. 4, pp. 485-501, 1994.

• R. Palmondon, S. Srihari. On-line and Off-line Handwriting Recognition. A ComprehensiveSurvey. IEEE Transactions on PAMI. Vol. 22, no. 1, pp. 63-84, 2000.

• G. Nagy. Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on PAMI, Vol. 22, no. 1, pp. 38-62, 2000.

• H.Bunke, T. Caelli. Hidden Markov Models. Applications in Computer Vision. World Scientific. 2001.

• R. Duda, P. Hart, D. Stork. Pattern Classification. 2ª ed., Wiley Interscience, 2000

Bibliography

178

Practical work

Binarization(Local adaptative)

Document image

Binary image Layout Analysis

N binarytext images

CharacterSegmentation

N characterimages

FeatureExtraction

Feature Vector Classification(Multiple classifiers)

Character Label

Groups of 2 people for each task

Documents

OPTICAL CHARACTER RECOGNITION - Computer …ernest/slides/ocr0607.pdf · OPTICAL CHARACTER RECOGNITION 2 Outline Outline ... • Post-proccessing ... Document understanding of mostly