character recognition based on probability tree model

character recognition based on

probability tree

model

Presenter: Huang Kaizhu

Outline

Introduction How probability can be used in character

recognition?

What is probability tree model?Two improvement direction Integrate Prior knowledge Relax the tree structure into a hyper tree

Experiments in character recognition

Disease Diagnosis problem

How a doctor get to know a patient have a cold? A. The patient has a headache? B. The patient has a sore throat? C. The patient has a fever? D. The patient can breathe well via his nose?

Now a patient has the following symtoms.A is no, B is yes, C is no, D is yes

What is the hidden principle of the doctor in making a judgment?

Disease Diagnosis problem(cont)

A good doctor will get his answer by checking:

P1= P(Cold=true,A=N, B=Y,C=N,D=Y) Vs

P2= P(Cold=false,A=N, B=Y,C=N,D=Y) if P1>P2, the patient is judged to have a cold if P2>P1, the patient is judged to have no

cold

What is Probability Model Classifier?

A Probability model classifier is a kind of classifierbased on the probability inductions.

The focus is now changed into how to calculate:P(Cold=true,A=N, B=Y,C=N,D=Y)

andP(Cold=false,A=N, B=Y,C=N,D=Y)

Now a classification Problem is change into a distribution estimation problem

Used in character recognition

How can the probability model used in character recognition? (similar to the Disease Diagnosis Problem) Find a probability distribution of the features for every

type of character.P(‘a’, f1,f2,f3,…,fn), P(‘b’,f1,f2,f3,…,fn),…, P(‘z’,f1,f2,f3,…,fn)

Compute in what probability a unknown character belongs to each type of character. And classify this character into the class with the highest probability.

For example: P(‘a’,fu1, fu2 ,… ,fun, )> P(C,fu1, fu2 ,… ,fun, ) , C=‘b’,’c’,…’z’

We judge the unknown character into ‘a’

How can we estimate the joint Probability P(C, f1,f2,f3,…,fn)? C=‘a’,’b’…,’z’

Estimate the joint Probability

1. Estimation based on direct counting

P(Cold=true,A=N, B=Y,C=N,D=Y)

=Num(Cold=true,A=N, B=Y,C=N,D=Y)/TotalNum;

Impractical!!

Reasons: Huge samples needed. if the num of features is n ,at least 2n samples are

needed for binary features..

2. Estimation based on Dependence relationship between features

Advantage

Joint Probability can be written into a product form.

P(A,B,C,D)=P(C)P(A|C)P(D|C)P(B|C)

BY estimating each item of the above according to counting process,We can avoid the sample exploration problem

Probability tree model is a kind of model based on the above principle

Probability tree model

It assume that dependence relationship among features can be represented as a tree.

It seeks to find out a tree structure to represent the dependence relationship optimally and the probability can be written into:

)(

)|(),,...,,( )(1

121

liofnodeparentislipawhere

vvPvvvvP liPa

m

limm

Algorithm

1.Obtaining P(vi ) and P(vi,vj) for each pair of (vi,vj) by accumulating process . Vi is the feature

2.Calculating the mutual entropy

3.Utilizing Maximum spanning tree algorithm to find the optimal tree structure,which the edge weight between two nodes vi,vj is I((vi,vj)

This algorithm was proved to be optimal in [1]

))()(

),(log(),(),(

, ji

jiji

vvji vPvP

vvPvvPvvI

ji

)|()|()|()|()(

)|(),|()|()|()(

)|,()|()|()(

),|,()|()|()(

),|,(),,,|()|()(

),|,,()|()(

)()|,,,(

),,,,(

323145434

3232145434

32145434

432145434

432143215434

43521434

445321

54321

vvPvvPvvPvvPvP

vvPvvvPvvPvvPvP

vvvPvvPvvPvP

vvvvPvvPvvPvP

vvvvPvvvvvPvvPvP

vvvvvPvvPvP

vPvvvvvP

vvvvvP

Two problems of tree model

Can’t process sparse data or missing dataFor example, if the samples are too sparse, maybe nose problem never happens in all the records of the patients with cold and nose problem happens 2 times in all the records of the patients without coldThus no matter what symptom a patient has, a “cold=FALSE” judgment will be made since the

P(cold=true,A,B,C,D =FALSE)= P( cold=true,D=false|C)*…=0 < P(cold=false,A,B,C,D =FALSE);

Can’t perform well in multi-dependence relationship

2 Our improvements

To problem1:Introduce prior knowledge to overcome it

So the example in last slide:

0

0

)(

/)|(*)|()|(

)(

)|()|(

NcCounts

TotalNumBACountsNBACountsBAP

cCounts

BACountsBAP c

cc

c

0)(

)|()|(

truecoldCounts

CNDCountsCNDP truecold

truecold

0)(

/)|(*)|()|(

0

0

NtruecoldCounts

TotalNumCNDCountsNCNDCountsCNDP truecold

truecold

Key point of Technique 1

When a variable(feature) are always the same in one class, we replace its probability with a proportion of the variable probability in the whole database

To Problem2:Introduce Large Node methods to overcome it

LNCLTCLT

Algorithm

1. Find out the tree model2.Refine the tree model based on frequent itemsetBasic idea:

if two variable come out together with each other more frequently, more possible it will be combined into a large node

Experiments1---Handwritten digit Lib

Database setup:1. 60000-digit training lib ,10000-digit test lib2. Database is not sparsePurpose: evaluate the technique to problem 2

The digits recognized correctly by LNCLT are wrongly recognized into the right-bottom digits by CLT

Experiments1---Printed character Lib

Database setup:1. 8270 training lib , 2. Database is sparsePurposeTo evaluate the technique to Problem 1:sparse dataBefore introducing Prior knowledge:

Recognition rate of training data: 86.9%After introducing Prior knowledge:

Recognition rate of training data: 97.7%

Demo

Documents

character recognition based on probability tree model