34
Background & Motivation Shape Context Fast Matching Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context Matching For Efficient OCR

Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Shape Context Matching For Efficient OCR

Sudeep Pillai

May 14, 2012

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 2: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Table of contents

1 Background & MotivationMotivationBackground

2 Shape ContextWhat is a Shape Context?Matching Shape ContextsSimliarity Measure

3 Fast MatchingDimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 3: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

MotivationBackground

Motivation

Automatic translation/transcription of handwritten/printedtext

Printed text has several geometric constraints that can beutilized for improved performance

Significant push for accuracy, not too much on optimization

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 4: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

MotivationBackground

Object Character Recognition

MNIST database performance

Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples

Classifier Preprocessing Test Error Rate %

Linear Classfiers

Linear classifier (1-layer NN) None 12.0Pairwise linear classifier Deskewing 7.6

K-Nearest Neighbors

K-NN, Euclidean (L2) None 3.09K-NN, Euclidean (L3) Deskewing, noise removal 1.22

K-NN, Shape context matching Shape context extraction 0.63

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 5: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

MotivationBackground

Object Character Recognition

MNIST database performance

Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples

Classifier Preprocessing Test Error Rate %

SVMSs

SVM Gaussian Kernel None 1.4Virtual SVM, deg-9 poly, 2-pixel jittered None 0.56

Neural Nets

Deep convex net, unsup pre-training None 0.83

Convolution Nets

Committe of 35 conv. net Normalization 0.23

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 6: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

MotivationBackground

Object Character Recognition

Figure: A few digits from the MNIST database

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 7: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

MotivationBackground

Object Character Recognition

MNIST database performance

Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples

Classifier Preprocessing Test Error Rate %

Linear Classfiers

Linear classifier (1-layer NN) None 12.0Pairwise linear classifier Deskewing 7.6

K-Nearest Neighbors

K-NN, Euclidean (L2) None 3.09K-NN, Euclidean (L3) Deskewing, noise removal 1.22

K-NN, Shape context matching Shape context extraction 0.63

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 8: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

What is a Shape Context?Matching Shape ContextsSimliarity Measure

What is a Shape Context?

Definition (Shape)

A shape is represented as a sequence of boundary points:

P = {p1, . . . , pn}, pi ∈ R2

Definition (Shape Context)

Shape context is a descriptor of interest point i.e. a histogram

hi(k) = #{pj j 6= i, xj−xi ∈ bin(k)},

in which bins are uniformly divided in log-polar space

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 9: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

What is a Shape Context?Matching Shape ContextsSimliarity Measure

Shape Context Representation

Figure: Graphical representation of shape context bins

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 10: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

What is a Shape Context?Matching Shape ContextsSimliarity Measure

Shape Context Histogram

Figure: Graphical representation of shape context histograms <60

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 11: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

What is a Shape Context?Matching Shape ContextsSimliarity Measure

Matching Shape Contexts

The cost of matching point pi on the first shape to point qjon the second shape (chi-square distance)

Cij =1

2

K∑k=1

[hi(k)− hj(k)]2

hi(k) + hj(k)

Minimize the total matching cost:∑

iC(pi, qπ(i))

Optimal matching

One possible technique to solve this problem is to use Hungarianmethod in O(n3) time complexity

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 12: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

What is a Shape Context?Matching Shape ContextsSimliarity Measure

Properties of shape contexts

Invariant to translation and scale (as it is normalized by themean distance of the n2 point pairs)

Can be made invariant to rotation (local tangent orientation)

Tolerant to small affine distortion (log-polar, spatial blurproportional to r)

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 13: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

What is a Shape Context?Matching Shape ContextsSimliarity Measure

Simliarity Measure

Definition

On employing a cubic spline transformation T, the two shapes’similarity can be measured via a weighted sum

D = aDac +Dsc + bDbe

Dsc Shape context distance

Dac Appearance cost

Dbe Bending energy or transformation cost

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 14: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Dimensionality Reduction

Approximate matching is possible with full shape contextfeature

A low-dimensional feature descriptor is desirable forperformance purposes

Uniform bin approximation will make matching accuracydecline with feature dimension d2

Multiple modalities are representable even with a reducedsubspace

Use Principal Components Analysis to determine bases thatdefine this shape context subspace

Approximate matching can be performed faster once all <60

vectors are projected onto <3

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 15: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Dimensionality Reduction

Figure: Projecting histograms of contour points onto the shape context subspace.The points on the human figure on the right are colored according to their 3-D shapecontext subspace feature values

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 16: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Dimensionality Reduction

Figure: Visualization of feature subspace constructed from shape context histogramsfor two different data sets. The RGB channels of each point on the contours arecolored according to its histograms 3-D PCA coefficient values. Set matching in thisfeature space means that contour points of similar color have a low matching cost,while highly contrasting colors incur a high matching cost

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 17: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Dimensionality Reduction Tradeoffs

Larger d is

Smaller the PCA reconstruction errorLarger the distortion induced by the L1 embeddingLarger the complexity of computing the embedding

Do we really need a <60 feature vector to represent a shape?

Shapes are almost never similarApproximate measures make more senseExtract only most discriminating dimensions as descriptor

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 18: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching

X and Y are two sets of vectors in a <d feature space

Find an approximate correspondence between X and Y

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 19: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching Overview

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 20: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching Kernels

Construct a sequence of grids at resolution 0, . . . , L where agrid at a resolution l has D = 2dl cells.

Compute the histograms H lX and l

Y where

H lX and H l

Y are histograms of X and Y at resolution lH l

X(i) and H lY (i) are the number of points of X and Y in the

ith cell

Compute the number of matches for each resolution using:

I(H lX , H

lY ) =

D∑i=1

min(H lX(i), H

lY (i))

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 21: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching Kernels

Summing all the I l giving more importance to the highresolution with:

K(X,Y ) = IL+

L∑l=0

−1 1

2L−1(I l−I l+1) =

1

2LI0+

L∑l=1

1

2L−l+1I l

where I l − I l+1 is the number of new matches

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 22: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching (l = 0)

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 23: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching (l = 1)

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 24: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching (l = 2)

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 25: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Pyramid Matching

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 26: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Comparison with Optimal Matching

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 27: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Vocabulary-guided Matching

Figure: The bins are concentrated on decomposing the space where features cluster,particularly for high-dimensional features (in this figure <2). Features are small pointsin red, bin centers are larger black points, and blue lines denote bin boundaries. Thevocabulary-guided bins are irregularly shaped Voronoi cells.

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 28: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Performance

Computing partial matching

Earth Mover’s Distance O(dm3 logm)Hungarian method O(dm3)Greedy matching O(dm2 logm)Pyramid match O(dmL)

for sets with O(m) <d features and pyramids with L levels

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 29: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Affine Constraints - RANSAC

Figure: Interest points computed onimage 1

Figure: Interest points computed onimage 2

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 30: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Affine Constraints - RANSAC

Figure: Find correspondences between interest points

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 31: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Affine Constraints - RANSAC

Figure: Outlier removal via RANSAC (Random Sampling And Consensus)

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 32: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Additional improvements

RANSAC gives an initial estimate of affine transformationbetween canonical set of points and query points

Utilize affine transformation estimate to performvocabulary/geometrically guided searching/matching

Could use MLESAC/PROSAC to perform probabilisticsearching

Ability to add constraints to the pyramid matching scheme toreduce query time, and improve robustness to partial matching

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 33: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Conclusions

Investigated and implemented a shape descriptor invariant torotation and scale

Integrated an approximate matching scheme that has a lineartime complexity

Scheme extends well with increase in size of the databse ofdescriptors

Significant improvement in speed with little tradeoff inaccuracy

Source code available soon

Sudeep Pillai Shape Context Matching For Efficient OCR

Page 34: Shape Context Matching For E cient OCR · Object Character Recognition Figure:A few digits from the MNIST database Sudeep Pillai Shape Context Matching For E cient OCR. Background

Background & MotivationShape ContextFast Matching

Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Conclusions

Thanks!

Sudeep Pillai Shape Context Matching For Efficient OCR