19
Recognition of Hand Written English Characters & Numerals CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Embed Size (px)

Citation preview

Page 1: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Recognition of Hand Written English Characters

& Numerals

CS771Machine Learning : Tools, Techniques &

Application

Gaurav Krishna Y9227224Harshit Maheshwari 10290Pulkit Jain 10543Sayantan Marik 13111057

Page 2: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Preprocessing Feature Extraction Classification Techniques Brief Discussions on Results

◦ Parameter Selection◦ Comparative Results for Different Techniques

Final Results

Outline

Page 3: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

We have 55 examples for each of the 62 classes

Problems Handled and Preprocessing Steps◦ Varied Size of characters Resized the characters in images of size 32X32◦ ILL centered

Centered the Images◦ Varied thickness of strokes

Thinning ( Done only for 13 Features) These steps were done in MATLAB

Preprocessing

Page 4: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Following Features are Considered SET 1*

◦ Haralick Texture Features◦ Zoning Feature◦ Eccentricity◦ Raw Moment ◦ Covariance

SET 2◦ Contour Feature◦ Histogram Feature◦ 13 Point Feature◦ Holes Feature

We used Java( jFeatureLib Library) to extract these features

* In graph plotting, when we say all feature we mean set 1

Feature Extraction

Page 5: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Random Forest Classifier Neural Network

◦ Single Hidden Layer◦ Double Hidden Layer

SVM Classifier ( Using SMO Algorithm) K-Nearest Neighbour

Classification Techniques Used

Page 6: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

BRIEF DISCUSSION OF RESULTS

Page 7: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Effect of Scaling the Images

Used features is image pixel values

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Series1Series2

Page 8: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Determining Zoning Parameter

Page 9: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Zoning, Haralick Features and Eccentricity without Thinning

Deciding SVM Complexity Parameter

2 3 4 5 6 777

77.2

77.4

77.6

77.8

78

78.2

78.4

78.6

78.8

79

78.2991

78.8856 78.8856

78.2991

77.7126 77.7126

Effect of SVM Complexity Paramter

Page 10: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

SVM Classification on Individual Features

Haralick Histogram Raw moments + Covariance

Eccentricity Holes0

2

4

6

8

10

12

Series1

Haralick 9.6774

Histogram 1.7595

Raw moments + Covariance 7.0381

Eccentricity 5.8651

Holes 2.346

Page 11: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Using Different Feature Sets on SVM

Holes + Zoning

Holes+ Zoning + Haralick

Contour Contour + Haral-

ick

Contour + Zoning

His-togram

His-togram + Zoning + Thinning

13 Point Feature + Thinning

Zoning Haralick + Zoning

Haralick + Zoning

+ Raw moment

+ Co-variance

Accu-racy

76.8328 78.0059 72.7273 74.1935 78.0059 34.8974 69.2082 60.4106 75.5681 78.0059 77.1261

51525354555657585

Accuracy

Page 12: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Using all the extracted features with SVM

Pixel values+SVM+

without thinning

Using 64 Zones 64 zones + Im-age in 32x32

pixels

SVM with 64 zones

Neural Network with 64 zones

Accuracy 74.4868 76.8328 62.7566 72.7273 61.8768

5

15

25

35

45

55

65

75

85

Accuracy

Accura

cy

Page 13: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Features: Haralick, Eccentricity, Zoning

Nearest Neighbors in K-NN

Page 14: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Single Hidden Layer Doble Hidden Layer + Feature Set 1

Contour+Zoning +Single Hidden

Accuracy 71.85 57.478 70.3812

5

15

25

35

45

55

65

75

Accuracy

Using Neural NetworkFeature Set 1 {Zoning, Haralick, Eccentricity, Raw Moments}

Page 15: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Using Random Forest Classifier

Zoning Contour feature + Zoning62

64

66

68

70

72

74

76

78

80

68.0352

77.7126

Random Forest Classifiers

Page 16: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

We divided the data into 90% Training Set and 10% Test Set.

We got the accuracy of 78.8% (Using SVM on Haralick, Zoning (8X8), Eccentricity)

Ten fold cross validation gives the accuracy of 77.39%

Training error of 5% obtained.◦ Training and Testing on whole dataset gave 95%

accuracy.

Final Results

Page 17: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

THE MNIST DATABASE of handwritten digitshttp://yann.lecun.com/exdb/mnist/

The Chars74K dataset Character Recognition in Natural Imageshttp://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/

Handwritten Character Recognition using Neural Networkshttp://home.iitk.ac.in/ sunithb/NN.pdf

References

Page 18: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

Thanks

Page 19: CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y9227224 Harshit Maheshwari 10290 Pulkit Jain 10543 Sayantan Marik 13111057

END