Project report - Bengali digit recongnition using SVM

CMPUT 551 Project Report

Bengali Handwritten Digit Recognition with Support

Vector Machines

Submitted By

Mohammad Saiful Islam

Student Id: 1270123

Date of Submission

21st

December, 2010

Introduction:

Originally character recognition is a subset of patter recognition. But the need of recognizing

characters in various fields boosts the field of patter recognition and image analysis [1].

Character recognition can be classified in to two types, online and offline. In online character

recognition, the system has to recognize the dynamic motion of the pen to recognize the

character while it is written and in offline character recognition, static text is used as input for

recognition purpose. From another perspective character recognition can be divided into two

types, machine printed and handwriting recognition.

Bengali is an eastern Indic language. It is native to the region of eastern South Asia known as

Bengal, which comprises the present Bangladesh and Indian state of west Bengal, and parts of

Indian state Tripura and Assam. With 300 million native speakers, it is ranked 6th

based on

number of native speakers [2].

In the current project I have decided to work on the handwriting recognition of Bengali language.

This is due to the fact a lot of work has been already done in the field of machine printed

character recognition but there has been a few work on the handwriting recognition. For this

project I have worked on the digits of Bengali characters. This is the tradeoff I had to make

because of limited time. Working on a project requires a good dataset for training and testing

purpose and my first and most difficult problem was to find a good dataset for Bengali alphabets

and numerals. After exhaustive searching I have found only a small set of numeral data set and

for time constraint I have able to build a small set of mine so I am working only in digit

recognition.

The work of digit recognition can be divided into several blocks, in each block there are several

internal steps. The block diagram of the whole process is given in Figure 1.

Figure 1: Process of Digit Recognition.

In the current project I have worked mainly on the classification step, where the system is given

a set of training feature vectors of digits to train itself then when a test feature vector of the digit

is given it will classify the digit into respected class.

Document

Input

Pre-processing Feature

Extraction

Classification

Post

Processing

Output

Literature Review:

In early stages of the OCR, template matching based techniques were used. These templates are

designed using small number of samples. But as the number of samples became high, this

technique failed to give good results. Then researchers turned into methods based on learning

from examples strategies like artificial neural networks. Support vector machines are applied in

modern recognition task with great accuracy. Non parametric statistical methods like 1-Nearest

neighbors (1-NN), K-NN, decision trees, since all the training samples have to be stored and

compared.

Research on OCR systems for recognizing Bengali characters have been started since mid 1980’s

and a variety of approaches were applied. Among those works some were complete systems and

some were only the part of a complete system like preprocessing, feature extraction,

classification and post processing. Researchers used many types of classifiers for OCR like

Nearest Neighbor [3], Feature based tree classifier [4], Template matching [5], Distance based

classifier [6], Neural Networks [7], Hidden Markov Model [8] and Support Vector Machines

[11]. Hasnat at el. [8] developed a Hidden Markov Model based multi font supported OCR

where they have separate HMM models for each segmented character or word. The system uses

HTK toolkit for data preparation, model training and recognition. They transformed the raw

pixel value using Discrete Cosine Transform. Rahan at el. [12] proposed a multistage method for

Bengali handwriting recognition. They said that by using distinct characteristics present in the

alphabet, alphabets can be grouped and building multi stage classifier by using these groups

made the classifier more robust. The multistage classifier can outperform single stage classifier

because of the ability to detect extreme variance in the training and the test examples. Arora at

el. [9] compared two most popular methods for handwriting recognition, ANN and SVM for

Devnagari Charecters, which is similar to Bengali and found that SVM can work equally well

compared to ANN, which is widely used in handwriting recognition. Bhowmik at el. [10] made a

comparative study among multilayer perception, radial basis function network, and SVM for

Bengali character recognition and found that SVM outperforms the other two methods. They

proposed that a hierarchical learning architecture (HLA), based on SVM will perform better than

the single stage SVM. For the first stage of the classifier they have developed groups on the

basis of confusion matrix obtained by SVM. Chanda at el. [11] used SVM to automatically

identify an individual based on handwriting for Bengali language. They have experimented with

discrete directional features and gradient features and got satisfactory result for gradient features.

Umapada at el. [13] used SVM for recognizing multi-oriented Bengali printed characters. For

recognition of multi-sized/multi-oriented characters the features were computed from different

angular information obtained from the external and internal contour pixels of the characters.

These angular information were computed in such a way that they do not depend on the size and

rotation of the characters. Circular and convex hull rings had been used to divide a character into

smaller zones to get zone-wise features for higher recognition results. Liu et al. [14] compared

six classifiers like MLP, MQDF, DLQDF, PNC, CFPC, SVM for Bengali handwritten digit

classification and they found out the SVM produces the highest classification accuracy. They

concluded that good results can be obtained by gray scale image classification opposed to binary

classification using gray scale normalization, and by moment or bi-moment normalization. Maji

at el. [15] found that though polynomial kernels with SVMs are mainly used for digit recognition

with raw pixels, they are impractical due to high complexity at runtime. So they proposed using

improved features with a low complexity classifier. Their experiments with standard digit

databases showed high accuracy compared to complex classifier using RBF kernels. Edson at el.

[19] showed that SVM performs better than HMM for offline handwriting recognition.

Methods:

For this project I have chosen to use Multiclass Support Vector Machine (MSVM). The original

Binary Support Vector Machine (SVM) was invented by Vladimir Vapnik, and soft margin case

was proposed by Corinna Cortes and Vladimir Vapnik [16]. The MSVM is a special case of the

binary SVM which extends the capability of SVM to classify data into multiple classes. In this

assignment I have used the both the linear and nonlinear version of MSVM by using kernels

which was proposed by Bernhard Boser, Isabelle Guyon and Vapnik [17].

Generally SVM is a non-probabilistic binary linear classifier which constructs a hyper plane or a

set of hyper planes in a high dimensional space, which can be used for classification. A special

property is that they simultaneously minimize the empirical classification error and maximize

the geometric margin; hence they are also known as maximum margin classifiers [18]. In this

project the basic SVM is implemented using soft margins. Corinna Cortes and Vapnik suggested

the soft margin to allow mislabelled examples [16]. If there exist no hyper plane that can fully

separate the two classes than Soft Margin will create a hyper plane that splits the examples as

good as possible, maximizing the distances to near cleanly split data points. The kernel trick is

used to transform the feature space. The transformation may be non linear thus the classifier may

be hyper plane in the higher dimension but non linear in the original input space. The data may

not be separable in the original space but this transformation may turn them into linearly

separable in the higher dimensional space.

Solving a multiclass problem is a special binary class problem. The original problem is

transformed into several binary classification problems. Each of the problems yields a binary

classifier, which is assumed to produce an output function that gives relatively large values for

examples from the positive class and relatively small values for examples belonging to the

negative class. There are two common methods to solve multiclass problems with binary

classifiers. The one-versus-all method and the one-versus-one method.

Let we have C classes. The one-versus-all method will create C distinct classifiers. The ith

classifier is trained using data points from class i as positive and all other negative. For a new

data, it is assigned to a class whose classifier gives the highest value. For the one-versus-one

method, we need C(C-1)/2 binary classifiers. Classifier Cij will classify class i as positive and

class j as negative. For a new example, majority voting for the positive result is done. After each

classifier is applied to the data, it is assigned to the class with largest number of votes.

For this project I have implemented the one-versus-all method. The reason to choose this method

over the other one is pretty obvious. There are 10 classes for this assignment and I am building

10 classifiers now. But if I have implemented the other method I would need to implement 45

classifiers. Though the other method gives more accurate result than this one I think the time

needed for the training of 45 classifiers is too much compared to the time required for this

method and I would get good accuracy with this method.

The three types of kernels used in the project are

1. Linear kernel Klin(x,y) = x.y 2. Polynomial kernel Kpoly(x,y) = (x.y+1)

d

3. RBF kernel Krbf(x,y)= exp(-||x-y||2)/2σ2

In the implementation the learner function will take as input two parameters X and y as the

training data and will output a model which will be used by the classifier to classify new data.

For the current problem y is a vector of number ranging 0-9. But for the binary classifier ci, it

needed a label vector Yi such that,

Yij = +1 when yj= i

-1 otherwise

First the label vector y is transformed into 10 separate label vectors for each classifier. Then the

test data is provided to each of the 10 classifiers to find 10 weight vectors (li) and offsets (bi).

These values together with the original training data comprise the model.

In the classify function, the new data is classified using 10 separate classifiers. At first I was

using sign function as the result of these classifiers. But it would create inaccuracy in the result

because of the ambiguous states created so the method used to improve this situation proposed

by Vapnik [4] is to use continuous values of SVM decision function rather than their signs. The

class of a data point is whichever class has the decision function with highest value regardless of

the sign.

Figure 2: MSVM with continuous decision function

One of the difficult phases of this project was choosing the classifiers. From the literature review

I have learned that for handwriting recognition, especially Bengali handwriting recognition the

most popular classifiers used were Artificial Neural Networks (ANN), Hidden Markov Model

(HMM) and Support Vector Machines (SVM). There are also some mixed multi-layer

approaches.

Comparison between ANN and SVM on different properties is given below [9].

Complexity of training: The parameters of neural classifiers are generally adjusted by gradient

descent. By feeding the training samples a fixed number of sweeps, the training time is linear

with the number of samples. SVMs are trained by quadratic programming (QP), and the training

time is generally proportional to the square of number of samples. Some fast SVM training

algorithms with nearly linear complexity are available.

Model selection: The generalization performance of neural classifiers is sensitive to the size of

structure, and the selection of an appropriate structure relies on cross-validation. The

convergence of neural network training suffers from local minima of error surface. On the other

hand, the QP learning of SVMs guarantees finding the global optimum. The performance of

SVMs depends on the selection of kernel type and kernel parameters, but this dependence is less

influential.

Classification accuracy: SVMs have been demonstrated superior classification accuracies to

neural classifiers in many experiments.

Storage and execution complexity: SVM learning by QP often results in a large number of

SVs, which should be stored and computed in classification. Neural classifiers have much less

parameters, and the number of parameters is easy to control. In a word, neural classifiers

consume less storage and computation than SVMs.

Unlike ANN, the computational complexity of SVM does not depend on the dimensionality of

the input space. ANN use empirical risk minimization, while SVM use structural risk

minimization. SVM often outperforms ANN because SVM are less prone to over fitting. For

these reasons I preferred SVM over ANN.

The HMM has attracted the attention of many researchers in pattern recognition, and in

handwriting, speech and signature verification. This statistical learning theory has the ability to

absorb both the variability and the similarity between patterns. It is based on the empirical risk

minimization (ERM) principle, which is the simplest of induction principles, where a decision

rule is chosen. The decision rule is based on a finite number of known examples (training set).

There are some problems related to HMM. First is finding the probability of observation

sequence given the model and computing it is very expensive even using dynamic programming

using back propagation. Second one is to adjust the parameters to maximize the probability the

current observation and there is no way to analytically find the global maxima so it could stick

into local maxima. Again determining the number of states in the model and determining the

number of models is an important task cause performance of the classifier depends on this.

Several of the literatures I have reviewed stated that SVM can show a good performance on

handwritten character recognition, especially for Bengali character and digit recognition. Some

of the literature compared the performance of SVM, HMM and ANN and showed that SVM can

even sometimes outperforms other two methods in Handwriting recognition. Last but not the

least; SVM is new approach for classification in Machine Learning compare to other methods

which has created a great interest in both academia and industry. I wanted to explore this new

field in the given assignment to gain some inner knowledge in this method.

An important part of any handwritten character recognition is the preprocessing part. In this part

first continuous characters are segmented to find the individual characters next individual

characters are read in monochrome or grayscale mode to obtain the raw features to be used in

training/ testing step. Often several intermediate steps are applied like applying filters to improve

the raw features to ensure greater classification accuracy. Segmentation and filtering is itself a

huge research area so I am skipping this part in my project. I assume that I am given a set of

segmented images of digits. In most of the previous literatures filters are used to improve the

features but I wanted to test the accuracy on raw pixels.

Recognition if Bengali characters are very difficult for different reasons. There are 13 vowels

which can take into modified forms when connected with consonants. Some of the characters

have half forms when connected together. These compound characters make character

segmentation very difficult. All the individual characters

“Matra”. This makes it difficult to isolate individual characters fro

isolated dots, which are vowel modifiers, namely,

which add up to the confusion. Ascenders and Descender

there is no database to use so I had to b

these difficulties I preferred to work on the digits only.

Hypotheses:

For this project I have a set of hypotheses, which I intend to test using experiments. They are

1. SVM can show good performance

2. Use of RBF kernels will boost the performance compared to Linear and Polynomial

Kernels

3. Using raw pixels we can achieve good accuracy on the recognition.

4. Training the classifier using samples from one person and

different persons will reduce the accuracy of recognition.

Experimental design:

To test the stated hypotheses, I have planned to run a set of experiments.

have to first select the dataset. From the internet I have found only a small dataset of grayscale

image but I wanted to test with monochrome image to so I have built a

dataset was created by using a tablet to write single digits at a time in

each image in a monochrome bitmap format.

the images are saved in a monochrome format they consist only 0 and 1.

background and 0 represents the actual digits.

digits written, 70 for each of the digits.

testing. A sample set of digits are given below.

All the individual characters are joined by a head line called

”. This makes it difficult to isolate individual characters from the words. There are various

h are vowel modifiers, namely, “Anuswar”, “Visarga” and “Chandra Bindu”,

sion. Ascenders and Descender recognition is also complex.

there is no database to use so I had to build one of my own which would take a lot of time.

these difficulties I preferred to work on the digits only.

of hypotheses, which I intend to test using experiments. They are

SVM can show good performance in Bengali handwritten digit recognition

Use of RBF kernels will boost the performance compared to Linear and Polynomial

Using raw pixels we can achieve good accuracy on the recognition.

Training the classifier using samples from one person and testing with samples from

different persons will reduce the accuracy of recognition.

, I have planned to run a set of experiments. For the experiments I

From the internet I have found only a small dataset of grayscale

image but I wanted to test with monochrome image to so I have built a dataset

dataset was created by using a tablet to write single digits at a time in a paint software and sav

each image in a monochrome bitmap format. Each image has dimension of 20 by 20 pixel and as

the images are saved in a monochrome format they consist only 0 and 1. 1 represents the white

background and 0 represents the actual digits. Two persons wrote all the digits and there are 700

0 for each of the digits. 500 digits are used for training and 2

A sample set of digits are given below.

a head line called

m the words. There are various

“Anuswar”, “Visarga” and “Chandra Bindu”,

recognition is also complex. Again

uild one of my own which would take a lot of time. Given

of hypotheses, which I intend to test using experiments. They are

in Bengali handwritten digit recognition.

Use of RBF kernels will boost the performance compared to Linear and Polynomial

testing with samples from

For the experiments I

From the internet I have found only a small dataset of grayscale

dataset of my own. The

software and saving

Each image has dimension of 20 by 20 pixel and as

1 represents the white

all the digits and there are 700

gits are used for training and 200 are used for

All the training samples are written by one person but the 200 test samples are written by two

persons, 100 each, to test the last hypothesis.

The images are read using Octave to get a 20 by 20 matrix of 0 and 1. Each of the matrixes is

then reshaped to get a 1 by 400 vector which represents an image. 700 such vectors are stacked

to form a 700 by 400 feature vector and they are labeled appropriately from 0 to 1.

The dataset found in the internet was the ISI Bengali numeral dataset [20]. The original dataset

has 19,392 training samples and 4000 test samples, where the images are gray scaled with noisy

background and the gray level of the foreground varies considerably. I was only able to get a

partial dataset because the obtaining the full one would require some time. The partial set has

500 samples, 50 for each digit. I have used first 40 samples of each digit as training set and last

10 as test set. An example of the dataset is given below.

The tif format images are read using octave. The images had various sizes so I rescaled them to

20 by 20 pixels. They are all gray scaled so pixel values ranges from 0 to 255 where 0 denotes

the most dark color and 255 denotes white background. The width of the stroke is greater than

one.

For each of the three kernels a set of experiments is done using varying regularization parameter

beta and kernel parameter d/sigma. For each experiment the classifier is trained using the

training sample and then tested using the test samples. Next the recognition accuracy is recorded

for result analysis.

As I wanted to test the effect of regularization parameter beta and kernel parameters d and sigma

so no cross validation is used. Again no feature selection method was applied. For each of the

kernels 10 beta is used starting from 2-5

to 16 (2-5

, 2-4

,…..24). For RBF kernel 15 sigma is used

starting from 2-15

to 16 (2-15

, 2-14

,…..24). For polynomial kernel 10 d is used from 0 to 9.

Experiments:

First set of experiments are done using 100 test samples from one person. Percentage of accuracy

for different beta using linear kernel is given in Table 1. From the table it can be seen that linear

kernels shows very good accuracy and the performance is not dependent on the regularization

parameter beta.

Beta 0.0312 0.0625 0.125 0.25 .5 1 2 4 8 16 %

Accuracy 99 99 99 99 99 99 99 99 99 97

Table 1: Percentage accuracy for different beta using linear kernel – Built in data, one person

Percentage of accuracy for different beta and d using Polynomial kernel is given in Table 2.

From the table it can be observed that polynomial kernels don’t show good performance for all d.

for smaller degree (1-3) the classifier is able to show good performance but for larger degree the

performance drops dramatically. Again beta affects the performance with one degree (d=3)

beta 0.03125 0.0625 0.125 0.25 0.5 1 2 4 8 16 d/ %

accuracy

0 10 10 10 10 10 10 10 10 10 10

1 99 99 99 99 99 99 99 99 99 97

2 88 98 98 99 99 99 99 99 99 99

3 10 63 88 89 97 99 99 99 99 99

4 10 10 10 10 10 10 38 94 98 98

5 10 10 10 10 10 10 10 10 10 10

6 10 10 10 10 10 10 10 10 10 10

7 10 10 10 10 10 10 10 10 10 10

8 10 10 10 10 10 10 10 10 10 10

9 10 10 10 10 10 10 10 10 10 10

Table 2: Percentage accuracy for different beta and d using polynomial kernel

Percentage of accuracy for different beta and sigma using RBF kernel is given in Table 3. From

the table it can be seen that with larger sigma (2, 4, and 8) the classifier can give good results but

for smaller sigma performance drops. Again increase of beta has a negative effect on the

classifier.

beta 0.03125 0.0625 0.125 0.25 0.5 1 2 4 8 16 sigma/%

accuracy

0.000977 10 10 10 10 10 10 10 10 10 10

0.001953 10 10 10 10 10 10 10 10 10 10

0.003906 10 10 10 10 10 10 10 10 10 10

0.007812 10 10 10 10 10 10 10 10 10 10

0.015625 10 10 10 10 10 10 10 10 10 10

0.03125 10 10 10 10 10 10 10 10 10 10

0.0625 10 10 10 10 10 10 10 10 10 10

0.125 30 30 30 30 30 30 10 10 10 10

0.25 92 92 92 92 92 92 10 10 10 10

0.5 92 92 92 92 92 92 10 10 10 10

1 14 15 18 27 88 21 10 10 10 10

2 90 90 90 90 90 91 92 13 10 10

4 98 98 98 98 98 96 96 96 95 95

8 98 98 98 99 99 97 95 95 95 95

16 99 99 97 97 95 94 94 94 95 95

Table 3: Percentage accuracy for different beta and sigma using RBF kernel

From the above experiments it is clear that SVM classifier using raw pixel features can achieve

good performance on Bengali handwritten digit recognition. But it was strange to see that using

non linear kernels (Polynomial or RBF) did not boost the performance where as they tend to

show lower performance for some parameter values. This can be explained from the training set

used. The number of features in the training set is 400 and number of examples is 500. So

applying non linearity in the feature vector is doing no good here. Rather using non linear

functions can made the classifier prone to over-fitting in such cases which explains the

performance degradation. Because of this problem we can see that, highly regularized version of

kernels performs better.

Next set of experiments are done with the gray scaled data. Percentage of accuracy for different

beta using linear kernel is given in Table 4. From the table it can be seen that we can get an

average result using linear kernels and the performance is not dependent on the regularization

parameter.

beta 0.0312 0.0625 0.125 0.25 .5 1 2 4 8 16 %

Accuracy 10 62 75 80 79 81 81 80 80 80

Table 4: Percentage accuracy for different beta using linear kernel – ISI data

From Table 5. We can see that the polynomial kernel only gives reasonable accuracy for degree

= 1 and the performance does not depend on the regularization parameter.

beta 0.03125 0.0625 0.125 0.25 0.5 1 2 4 8 16 d/ %

accuracy

0 10 10 10 10 10 10 10 10 10 10

1 10 67 77 80 79 81 81 80 80 80

2 10 10 10 10 10 10 10 10 10 10

3 10 10 10 10 10 10 10 10 10 10

4 10 10 10 10 10 10 10 10 10 10

5 10 10 10 10 10 10 10 10 10 10

6 10 10 10 10 10 10 10 10 10 10

7 10 10 10 10 10 10 10 10 10 10

8 10 10 10 10 10 10 10 10 10 10

9 10 10 10 10 10 10 10 10 10 10


Next set of experiments are done with different sigma and beta using RBF kernel which is

presented in table 6. Here we can see that the RBF kernel consistently performs bad for all beta

and all sigma.

beta 0.03125 0.0625 0.125 0.25 0.5 1 2 4 8 16 sigma/%

accuracy

0.000977 10 10 10 10 10 10 10 10 10 10

0.001953 10 10 10 10 10 10 10 10 10 10

0.003906 10 10 10 10 10 10 10 10 10 10

0.007812 10 10 10 10 10 10 10 10 10 10

0.015625 10 10 10 10 10 10 10 10 10 10

0.03125 10 10 10 10 10 10 10 10 10 10

0.0625 10 10 10 10 10 10 10 10 10 10

0.125 10 10 10 10 10 10 10 10 10 10

0.25 10 10 10 10 10 10 10 10 10 10

0.5 10 10 10 10 10 10 10 10 10 10

1 10 10 10 10 10 10 10 10 10 10

2 10 10 10 10 10 10 10 10 10 10

4 10 10 10 10 10 10 10 10 10 10

8 10 10 10 10 10 10 10 10 10 10

16 10 10 10 10 10 10 10 10 10 10

Table 6: Percentage accuracy for different beta and sigma using RBF kernel – ISI data

The main cause of non-linear classifiers not doing good in this data set is over fitting due to

small sample size. And the dataset has some ambiguity too. There are many Bengali digits which

can be easily confused for each other because of writing style of different peoples. Some sources

of confusion are given below.

Another source of error is background noise and varying gray levels. The foreground gray levels

are also varying. Normalizing the images using linear normalization or moment normalization

can remove the noises and thus provide good results. Again the digits size are different so the

feature vectors are different for same digit. As I am using raw features without any normalization

this would affect performance. By using gradient features that are independent of size or

orientation of the image we would get good results.

The last set of experiments is done with the built in data but with training and testing sample

taken from different persons. The goal is to see the change in performance depending on

individual person handwriting style. From Table 7 we can observe that the accuracy drops

around 50% for linear kernels and it’s independent of beta.

beta 0.0312 0.0625 0.125 0.25 .5 1 2 4 8 16 %

Accuracy 53 52 53 53 53 53 53 53 54 54

Table 7: Percentage accuracy for different beta using linear kernel – different person.

From Table 8 it can be seen that, performance drops by 50% and for only one degree (d=1) we

have reasonable results. Here also beta affects for only d=3.

Beta 0.03125 0.0625 0.125 0.25 0.5 1 2 4 8 16 d/ %

accuracy

0 10 10 10 10 10 10 10 10 10 10

1 53 53 53 53 53 53 53 53 54 55

2 43 49 51 53 54 54 54 54 55 54

3 24 22 43 49 53 53 53 53 53 53

4 10 10 10 10 10 10 19 33 42 52

5 10 10 10 10 10 10 10 10 10 10

6 10 10 10 10 10 10 10 10 10 10

7 10 10 10 10 10 10 10 10 10 10

8 10 10 10 10 10 10 10 10 10 10

9 10 10 10 10 10 10 10 10 10 10


For RBF kernel the only reasonable results come from sigma = 8 and the higher the beta the

lower the performance.

beta 0.03125 0.0625 0.125 0.25 0.5 1 2 4 8 16 sigma/%

accuracy

0.000977 10 10 10 10 10 10 10 10 10 10

0.001953 10 10 10 10 10 10 10 10 10 10

0.003906 10 10 10 10 10 10 10 10 10 10

0.007812 10 10 10 10 10 10 10 10 10 10

0.015625 10 10 10 10 10 10 10 10 10 10

0.03125 10 10 10 10 10 10 10 10 10 10

0.0625 10 10 10 10 10 10 10 10 10 10

0.125 11 11 11 11 11 11 10 10 10 10

0.25 44 44 44 44 44 44 10 10 10 10

0.5 46 46 46 46 46 46 10 10 10 10

1 10 10 11 11 11 11 10 10 10 10

2 43 43 43 43 43 43 40 10 10 10

4 51 51 51 51 51 49 44 39 35 38

8 54 54 54 54 53 49 47 46 45 43

16 56 55 50 50 46 45 46 46 46 46

Table 9: Percentage accuracy for different beta and sigma using RBF kernel

The performance drop can be easily explained through the samples. There is a considerable

difference between the handwriting of two persons.

the feature vector differs between the training and the test set.

be done in raw feature to improve performance.

person and testing it with samples from another person degrades the performance. Again over

fitting due to small number of training data degrades the performance for the non

Few comparisons between two sample set

set and lower ten samples are from test set.

From the above experiments I can say that SVM can perform well for handwritten digit

recognition but the digits need to be preprocessed before giving input to the system.

Normalization need to be done to discard background noise or gray level variability.

can work well if all the samples are normalized to same size or else some kind of oriental

features should be used. A good training set is indeed needed so that the system can cope up with

the high variance of the handwriting pattern of differen

significantly. Last non linear kernels are only useful when the number of training sample is

greater than number of features or else over fitting can reduce performance in which case linear

kernels gives good performance.

for each digit to train the classifier and it would give very high accuracy.


difference between the handwriting of two persons. Again the size of the digits varies

the feature vector differs between the training and the test set. Some kind of normalization must

be done in raw feature to improve performance. So training the classifier with samples from one


fitting due to small number of training data degrades the performance for the non

o sample set are given below. The upper ten digits are from training

set and lower ten samples are from test set.



Normalization need to be done to discard background noise or gray level variability.


A good training set is indeed needed so that the system can cope up with

the high variance of the handwriting pattern of different people or else the performance drops

Last non linear kernels are only useful when the number of training sample is


nce. If we need to built for one person only, we need a small sample

for each digit to train the classifier and it would give very high accuracy.


size of the digits varies much so

Some kind of normalization must

aining the classifier with samples from one


fitting due to small number of training data degrades the performance for the non-linear kernels.

The upper ten digits are from training



Normalization need to be done to discard background noise or gray level variability. Raw pixels


A good training set is indeed needed so that the system can cope up with

t people or else the performance drops

Last non linear kernels are only useful when the number of training sample is


for one person only, we need a small sample

Conclusion:

Handwriting recognition is a very big research area of pattern recognition and image processing

because of its high level of applicability in different places. SVM is the state of the art method

for handwriting recognition which can provide very good accuracy for general systems. In this

project we learnt how SVM can be applied for Bengali digit recognition. We have seen that with

proper set of training data, use of good image processing techniques, oriented features can

provide us with high level of accuracy in digit recognition for Bengali script using SVM.

References:

1. Line Eikvil, "Optical Character Recognition",

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.3684

2. "Statistical Summaries". Ethnologue. 2005. Retrieved 2007-03-03.

3. A. K. Roy and B. Chatterjee, "Design of a Nearest Neighbor Classifier for Bengali

Character Recognition", J. IETE, vol. 30, 1984.

4. U. Pal and B. B. Chaudhuri, "OCR in Bangla: An Indo-Bangladeshi Language", Proc. of

12th Int. Conf. on Pattern Recognition, IEEE Computer Society Press, pp. 269-274, 1994.

5. B. B. Chaudhuri and U. Pal, "An OCR System To Read Two Indian Language Scripts:

Bangla And Devnagari (Hindi)", Proc. Fourth ICDAR, 1997.

6. Veena Bansal and R.M.K. Sinha, A Devanagari OCR and A Brief Overview of OCR

Research for Indian Scripts in Proceedings of STRANS01, held at IIT Kanpur, 2001.

7. A. A. Chowdhury, Ejaj Ahmed, S. Ahmed, S. Hossain and C. M. Rahman, "Optical

Character Recognition of Bangla Characters using neural network: A better approach".

2nd ICEE 2002, Khulna, Bangladesh.

8. Md. Abul Hasnat, S. M. Murtoza Habib, and Mumit Khan, Segmentation free Bangla

OCR using HMM: Training and Recognition, Proc. of 1st DCCA2007, Irbid, Jordan,

2007.

9. S. Arora, D. Bhattacharjee, M. Nasipuri, L. Malik, M. Kundu, D. K. Basu, Performance

Comparison of SVM and ANN for Handwritten Devnagari Character Recognition, CoRR

, 2010

10. T. K. Bhowmik, P. Ghanty, A. Roy and S. K. Parui, SVM-based hierarchical

architectures for handwritten Bangla character recognition, INTERNATIONAL

JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, VOL 12(2), PG 97-

108

11. Sukalpa Chanda, Katrin Franke, Umapada Pal, Tetsushi Wakabayashi, "Text Independent

Writer Identification for Bengali Script," icpr, pp.2005-2008, 2010 20th International

Conference on Pattern Recognition, 2010

12. A. F. R. Rahman, R. Rahman, M. C. Fairhurst, Recognition of handwritten Bengali

characters: a novel multistage approach, Pattern Recognition, Volume 35, Issue 5, May

2002, Pages 997-1006

13. Umapada Pal, Partha Pratim Roy, Nilamadhaba Tripathy, Josep Llados, Multi-oriented

Bangla and Devnagari text recognition, Pattern Recognition, Volume 43, Issue 12,

December 2010, Pages 4124-4136

14. Cheng-Lin Liu, Ching Y. Suen, A new benchmark on the recognition of handwritten

Bangla and Farsi numeral characters, Pattern Recognition, Volume 42, Issue 12, New

Frontiers in Handwriting Recognition, December 2009, Pages 3287-3295

15. Subhransu Maji , Jitendra Malik , Subhransu Maji , and Jitendra Malik, Fast and

Accurate Digit Classification, http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-

2009-159.pdf

16. Corinna Cortes and V. Vapnik, "Support-Vector Networks", Machine Learning, 20, 1995.

17. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers.

In D. Haussler, editor, 5th Annual ACM Workshop on COLT, pages 144-152, Pittsburgh, PA,

1992. ACM Press

18. http://en.wikipedia.org/wiki/Support_vector_machine

19. Edson J.R. Justino, Flavio Bortolozzi, Robert Sabourin, A comparison of SVM and

HMM classifiers in the off-line signature verification, Pattern Recognition Letters,

Volume 26, Issue 9, 1 July 2005, Pages 1377-1385

20. ISI Bengali Numerals. http://www.isical.ac.in/~ujjwal/download/BanglaNumeral.html

Documents

Project report - Bengali digit recongnition using SVM