33
Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b

Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Gradient-Based Learning Applied to Document Recognition

Douglas HohenseeCOS 598b

Page 2: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Pattern Recognition, in General

Page 3: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Feature extraction•

Represent input with low-dimensional vectors

Tends to be hand-crafted

Success of algorithm depends on the chosen class of features

Page 4: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Gradient-Based Learning•

Classifiers calculate some function

Yp

= F(Zp, W)

Zp

= the p-th

input patternW = adjustable model parametersYp

= class label•

Loss Function:

Ep

= D(Dp, F(Zp, W))

Quantify discrepancy between Dp

and Yp

Page 5: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Gradient-Based Learning•

Theoretical performance limits ([3],[4],[5])]

As # training examples increases,

P = # of training samplesh = “effective capacity”

([6],[7])0.5 <= α

<= 1.0k = constant

Ex: Structural Risk Minimization: find min of )(WkHEtrain +

Page 6: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Gradient-Based Learning•

Gradient-based minimization procedure

In practice, local minima seem not to be a problem•

“somewhat of a theoretical mystery”

Page 7: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Handwriting Recognition•

Problem 1: recognize individual characters

Problem 2: separate out characters from neighbors

Heuristic Over-Segmentation–

Generate a lot of potential cuts, and select the best combination of cuts based on scores for each candidate character

But: Is half of ‘4’

a ‘1’? Is half of ‘8’

a ‘3’?

Page 8: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Handwriting Recognition•

Again, most systems = multiple modules–

Field locator –

Field segmenter–

Recognizer–

Contextual post-processor

Usually, each module trained separately!

Presenter
Presentation Notes
1. (find regions of interest) 2. (cuts image into images of candidate chars) 3. (classify & score candidates) 4. (find best grammatically correct answer)
Page 9: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Globally Trainable System

Start with a set of modules:Xn

= Fn

(Wn

,Xn-1

)Xn

= output of n-th

moduleWn

= parameters of n-th

module

Page 10: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Globally Trainable System

if is known, then

Page 11: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Back-Propagation

Calculate error in each output neuron. •

For each neuron, calculate local error.

Adjust weights of each neuron to lessen local error. •

Assign "blame" for the local error to neurons at the previous level, giving greater responsibility to neurons connected by stronger weights.

Repeat the steps above on the neurons at the previous level, using each one's "blame" as its error.

(wikipedia)

Page 12: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

LeNet-5

1. Local receptive fields2. Shared weights

f(a) = A tanh(Sa)

3. Sub-sampling

Presenter
Presentation Notes
Convolution/sub-sampling combination inspired by Hubel and Wiesel’s notions of “simple” and “complex” cells, as in Fukushima’s “Neocognitron” OUTPUT layer: Euclidean Radial Basis Function units (RBF). Output of each RBF unit is the sum of squared Euclidean distance from a prototype
Page 13: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

LeNet-5

Page 14: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

LeNet-5•

RBF’s

can adapt•

But, E(W) has trivial

solution, with all RBF

identical and F6 const.

Page 15: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

LeNet-5

Page 16: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

LeNet-5

Page 17: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

LeNet-5

Page 18: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Limitations of convolutional networks

State information passed between modules is fixed-size

Can’t deal with variable-sized input, e.g.–

Continuous speech recognition

Handwritten word recognition

Page 19: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Multi-module Systems

Presenter
Presentation Notes
Object-oriented approach
Page 20: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Graph Transformer Networks (GTN)

Example:

Acoustic vectors-> phonemes lattice-> words lattice-> single sequence

of words (result)

Page 21: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

GTN: Word Segmentation

Candidate cuts at–

Minima in vertical projection profile

Minima of the distance between upper & lower contours

Presenter
Presentation Notes
Generates a DAG: directed acyclic graph
Page 22: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

GTN: Recognition Transformer

Page 23: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

GTN: Viterbi

algorithm

vn

: viterbi

penaltyvstart

= 0

ci

= penalty association w/arc i

Un

: {set of arcs w/destination n}

Page 24: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

GTN: Viterbi

training

Minimize:–

Penalty of the path of the correct sequence

Page 25: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Space Displacement Neural Network (SDNN)

Alternative to Heuristic Over-Segmentation

Page 26: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

SDNN

Presenter
Presentation Notes
SDNN gives a sequence of vectors which encode the likelihoods or finding characters at particular locations
Page 27: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

SDNN

Page 28: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

SDNN

Page 29: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

AMAP•

On-line handwriting recognition

“annotated image”

Each pixel is a 5-element feature vector–

4 features associated with 4 orientations of the pen trajectory–

1 feature associated with local curvature in area near pixel

Page 30: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

AMAP

Page 31: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

AMAP

Page 32: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

GTN with Bank Checks•

Multi-module system

“50% correct / 49% reject /

1% error”

criterion•

Reject low-confidence

outputs

Results: 82/17/1

Page 33: Gradient-Based Learning Applied to Document Recognition · 2008-04-01 · Gradient-Based Learning Applied to Document Recognition Douglas Hohensee COS 598b. ... Feature extraction

Conclusion: global training = good

Segmentation and recognition modules shouldn’t learn independently

It’s easier to make datasets for globally trained systems

Globally trained systems perform better