23
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute of Technology Date:2005/12/10

1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

Embed Size (px)

Citation preview

Page 1: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

1

Template-Based Classification Method for

Chinese Character Recognition

Presenter: Tienwei Tsai

Department of Informaiton Management,

Chihlee Institute of Technology

Date:2005/12/10

Page 2: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

2

Outline1. Introduction2. The proposed classification

approach 3. Experiments and Discussions4. Conclusions

Page 3: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

3

1. Introduction Paper documents -> Computer

codes OCR(Optical Character Recognition) The design of classification systems

consists of two subproblems: Feature extraction Classification

Page 4: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

4

Classification

Classification of objects (or patterns) into a number of predefined classes has been extensively studied in wide variety of applications such as Optical character recognition (OCR) Speech recognition Face recognition

Page 5: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

5

Feature extraction Features are functions of the

measurements that enable a class to be distinguished from other classes.

It has not found a general solution in most applications.

Our purpose is to design a general classification scheme, which is less dependent on domain-specific knowledge.

Page 6: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

6

Two philosophies of classification Statistical

The measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning”

Structural Regard objects as compositions of

structural units, usually called primitives.

Page 7: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

7

Template matching “Template matching” is one of the most popular

techniques for visual pattern recognition. We can store a template in the computer for each

distinct class of characters to be recognized, and to compare the unknown characters with the stored set to find the best matching.

Suppose that the character images are two-tone – all black on white backgrounds. The matching degree can be calculated by counting their matching bits, which is known as the Hamming distance criterion.

However, this kind of global Boolean template is unreliable. It is because the bits along the edge of a character image are often subject to unpredictable variations and noises.

Page 8: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

8

This paper presents a template-based classification approach to recognize the characters in the rare books transcribed by ancient calligraphers.

Compared to traditional approaches, which first use some feature extraction methods, like the thinning method, to extract the features of the characters and then recognize them by these features, we apply the original character images directly to achieve this goal.

The system is operated in two phases: training and classification. In the training phase, we superimpose a number of training

samples belonging to the same character class and calculate the fraction of time that a given bit in the character bitmap is 1 to construct the template of the character class.

Then, in the classification phase, an unknown character can be recognized by finding the character class whose template is best fitted for the unknown character via the L1 (or Manhattan) distance criterion.

Page 9: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

9

2. The proposed classification approach

The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM).

Each pattern is represented by a set of D features, viewed as a D-dimensional feature vector.

Page 10: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

10

2.1 System Architecture

Figure 1. The framework of our classification approach.

Page 11: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

11

2.2 Template Generation The template for a character class is the

representative image for the character class. Basically, templates are generated from the training samples in the training phase.

For the purpose of generating the templates that are not only representative but also easy to be matched, we developed two types of templates: the statistical templates the average templates.

Page 12: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

12

Figure 2. Some training samples of character class “心” .

Page 13: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

13

2.2.1 Statistical templates

To generate a template that summarizes the characteristics of the training samples of the same character class, we superimpose the images of the training samples belonging to the same class to obtain the probability of each bit (or pixel) being white in the presence of that class of samples. We call this kind of templates the statistical templates.

The statistical templates can be generated by the following equation:

iN

jij

i

Si yxf

NyxT

1

),(1

),(

Page 14: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

14

Figure 3. The statistical template of character class “心” .

Page 15: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

15

2.2.2 Average templates

We intend to develop a binary template that can represent the majority of the training samples belonging to the same character class.

Mathematically, the binary templates, called the average templates, can be generated by the following equation:

5.0),(,0

5.0),(,1),(

yxTwhen

yxTwhenyxT

Si

SiA

i

Page 16: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

16

Page 17: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

17

2.3 Template Matching

2.3.1 Distance measurement Basically, two major types of distance

measure can be applied: • L1 (or Manhattan) distance• L2 (or Euclidean) distance

Page 18: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

18

Then the L1-metric-based distance between f and can be defined as

Similarly, the distance between the test character image f

and the average template of class ci, can be defined as

The distance between f and can also be calculated by the Hamming distance criterion, i.e., counting their matching bits, which is computationally efficient than the L1 distance criterion.

1

0

1

0

),(),(),(N

x

N

y

Si

Si yxTyxfTfD

1

0

1

0

),(),(),(N

x

N

y

Ai

Ai yxTyxfTfD

SiT

AiT

Page 19: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

19

2.3.2 Decision rules To decide the expected class of f, the following decision

rules can be applied: 1) Rule MSMD (Minimum Statistical Matching Distance):

E(x) = arg min 1 i M { },

2) Rule MAMD (Minimum Average Matching Distance):

E(x) = arg min 1 i M { }.

Therefore, the class whose template matches the unknown character image most will be regarded as the expected class of the character.

),( SiTfD

),( AiTfD

Page 20: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

20

3. Experimental Results

A famous handwritten rare book, Kin-Guan bible (金剛經 ) 18,600 samples. 640 classes. Each character image was transformed

into a 48×48 bitmap. 1000 of the 18600 samples are used for

testing; the others are used for training.

Page 21: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

21Figure 5. The accuracy rate of each decision rule.

Page 22: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

22

4. Conclusions This paper presents a template-based classification

approach for recognizing handwritten characters in Chinese paleography.

Instead of extracting features from the images of the characters, our template matching method apply the original character images directly to recognize the characters.

In our approach, two types of templates (i.e., the statistical templates and the average templates) are generated in the training phase, and two decision rules (i.e., MSMD and MAMD) based on the two types of templates are applied to recognize the unknown characters in the classification phase.

Both decision rules are better than the PMD decision rule in terms of the accuracy rate.

Page 23: 1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute

23

Future Works

In the future, some works can be done to improve this system: since features of different types complement

one another in classification performance, by using features of different types simultaneously, classification accuracy could be improved;

In order to alleviate the load of the character recognition, a coarse classification scheme needs to be involved in our system.