6
Joint Marathi Character Recognition using KNN Classifier (1) S. D. Bhosale (2) Dr. U. B. Shinde (1) Research Scholar, National Institute of Electronics & Information Technology, Aurangabad, India. (2) Department of Electronics and Telecommunication Engineering, CSMSS. Chh.. Shahu College of Engineering, Aurangabad, India. ABSTRACT Compound characters which are one of the features of Marathi script, derived from Devanagari, occur frequently used in Marathi Language in daily use. Recognition of these characters is difficult challenge to the researchers due to their complex structure. This paper presents a approach for recognition of compound Marathi compound characters. Compound characters itself complex in structure in Marathi due to various parameters. It is written with combination of two or more characters in alphabets. The character may be formed with different sequence of combinations of basic characters, such as vowels and consonants or both. The recognition of compound characters makes this task more challenging to the researchers for Marathi language. The frequency of use of compound characters in Marathi language is more as compared to other languages. The various researchers used different classification techniques such as Neural Network, Soft Computing, Multiclass SVM Classifier etc. The recognition is carried out using KNN. The initial stages of feature extraction are based upon the structural features and the classification of the characters is done accordingly. Keywords: Digital image processing, Marathi compound character, OCR, Segmentation, classification, KNN, Neural network I. INTRODUCTION: Handwritten character recognition, irrespective of the script, finds potential application areas for automation in various fields like postal automation bank automation, form filling and other apllications.. Joint character recognition for Marathi scripts is quite a challenging task for the researchers. This is due to the various characteristics of these scripts like their large character set, complex shape, presence of modifiers, presence of compound characters and similarity between characters as many compound characters looks like the same as other. Marathi script derived from Devanagari, and it is official language of Maharashtra. It is the fourth most spoken language in India and fifteenth most spoken language in the world. Marathi script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper and lower case characters like English. Every character has a horizontal line at the top named as the header line. The header line joins various characters in a word. Vowels are combined with consonants with specific characteristic marks. These marks occur in line, at the top, at the bottom of a character in a word. Marathi language also has a complex system of compound characters in which two or more consonants are combined forming a new special character. Compound characters in Marathi script occur more frequently in the script as compared to other same languages. Joint character classification is a form of pattern recognition process. Presence of unwanted objects or disoriented patterns will affect the percentage accuracy of recognition. The most basic way of recognizing patterns is by using the probabilistic methods. It is very difficult to achieve 100% accuracy in recognition of joint characters. There are verities of writing styles because different peoples will write the same character differently and using different methods. Joint character recognition has many possible application areas in various fields like postal automation, bank automation, form filling etc. Joint character recognition is an important field of Optical Character Recognition (OCR). The objective of OCR is automatic reading of optically sensed document text materials to translate human readable characters to machine understandable codes in the document. OCR is popular for its various application potentials in banks, library automation post-offices and various organizations. Joint character recognition aims at converting handwritten characters in images into text that can be stored, edited. This field of research finds applications in various areas that aim in automation so as to reduce the human efforts like postal automation bank automation form filling etc. Joint character recognition for Indian scripts is quite a challenging task due to several reasons. One of the Indian Script is Marathi Script. Marathi is forth most widely used and is used by more than 50 million people. Unconstrained Marathi writing language is more complex than English language due to the possible variations in the shape, number and direction of the constituent strokes. Marathi Character recognition is complicated process due to presence of multiple conjuncts, lower and upper modifiers and the JASC: Journal of Applied Science and Computations Volume VI, Issue I, January/2019 ISSN NO: 1076-5131 Page No:1403

Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

Joint Marathi Character Recognition using KNN

Classifier

(1) S. D. Bhosale (2) Dr. U. B. Shinde

(1) Research Scholar, National Institute of Electronics & Information Technology, Aurangabad, India. (2)Department of Electronics and Telecommunication Engineering, CSMSS. Chh.. Shahu College of Engineering,

Aurangabad, India.

ABSTRACT

Compound characters which are one of the features of

Marathi script, derived from Devanagari, occur

frequently used in Marathi Language in daily use.

Recognition of these characters is difficult challenge

to the researchers due to their complex structure. This

paper presents a approach for recognition of

compound Marathi compound characters. Compound

characters itself complex in structure in Marathi due

to various parameters. It is written with combination

of two or more characters in alphabets. The character

may be formed with different sequence of

combinations of basic characters, such as vowels and

consonants or both. The recognition of compound

characters makes this task more challenging to the

researchers for Marathi language. The frequency of

use of compound characters in Marathi language is

more as compared to other languages. The various

researchers used different classification techniques

such as Neural Network, Soft Computing, Multiclass

SVM Classifier etc. The recognition is carried out

using KNN. The initial stages of feature extraction

are based upon the structural features and the

classification of the characters is done accordingly.

Keywords:

Digital image processing, Marathi compound character,

OCR, Segmentation, classification, KNN, Neural network

I. INTRODUCTION:

Handwritten character recognition, irrespective of the

script, finds potential application areas for automation

in various fields like postal automation bank

automation, form filling and other apllications.. Joint

character recognition for Marathi scripts is quite a

challenging task for the researchers. This is due to the

various characteristics of these scripts like their large

character set, complex shape, presence of modifiers,

presence of compound characters and similarity

between characters as many compound characters looks

like the same as other. Marathi script derived from

Devanagari, and it is official language of Maharashtra.

It is the fourth most spoken language in India and

fifteenth most spoken language in the world. Marathi

script consists of 16 vowels and 36 consonants total 52

alphabets. Marathi is written from left to right. It has no

upper and lower case characters like English. Every

character has a horizontal line at the top named as the

header line. The header line joins various characters in a

word. Vowels are combined with consonants with

specific characteristic marks. These marks occur in line,

at the top, at the bottom of a character in a word.

Marathi language also has a complex system of

compound characters in which two or more consonants

are combined forming a new special character.

Compound characters in Marathi script occur more

frequently in the script as compared to other same

languages.

Joint character classification is a form of pattern

recognition process. Presence of unwanted objects or

disoriented patterns will affect the percentage accuracy

of recognition. The most basic way of recognizing

patterns is by using the probabilistic methods. It is very

difficult to achieve 100% accuracy in recognition of

joint characters. There are verities of writing styles

because different peoples will write the same character

differently and using different methods. Joint character

recognition has many possible application areas in

various fields like postal automation, bank automation,

form filling etc.

Joint character recognition is an important field of

Optical Character Recognition (OCR). The objective of

OCR is automatic reading of optically sensed document

text materials to translate human readable characters to

machine understandable codes in the document. OCR is

popular for its various application potentials in banks,

library automation post-offices and various

organizations. Joint character recognition aims at

converting handwritten characters in images into text

that can be stored, edited.

This field of research finds applications in various areas

that aim in automation so as to reduce the human efforts

like postal automation bank automation form filling etc.

Joint character recognition for Indian scripts is quite a

challenging task due to several reasons. One of the

Indian Script is Marathi Script. Marathi is forth most

widely used and is used by more than 50 million

people. Unconstrained Marathi writing language is

more complex than English language due to the

possible variations in the shape, number and direction

of the constituent strokes. Marathi Character

recognition is complicated process due to presence of

multiple conjuncts, lower and upper modifiers and the

JASC: Journal of Applied Science and Computations

Volume VI, Issue I, January/2019

ISSN NO: 1076-5131

Page No:1403

Page 2: Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

number of disconnected and multistroke characters, in a

word where all characters are connected through

Shirorekha.

The consonants in compound characters are joined in

various patterns in Marathi. One way of creating

compound character is by removing the vertical line of

a character and then joining to other on its left hand

side. This type of joining is more commonly used.

Another way of writing a compound character is to join

the characters side by side or one above the other. In

some compound characters, one of the consonants

completely changes its form and then gets joined to the

other consonant. The compound characters not only

exhibit a variation in the shape of the character but also

in the aspect ratio as per the joining strategy it changes

its characterastics. The features like aspect ratio or

number of end points cannot serve as efficient features

due to these various joining strategies in order to attain

acceptable accuracy of recognition. There can be two

ways of recognition of compound character.

II. BLOCK DIAGRAM OF

PROPOSED WORK:

Fig1: Block Diagram of proposed method

Recognition can be done by separating the characters in

the consonant character while the other way is

recognition without separation. If we attempt for

recognition of the compound character after separation

of the characters in it, it would certainly pose more

difficulties as there are number of joining strategies as

studied earlier survey. This paper propose a system for

compound character recognition without separation of

the characters. But, in case if the consonants in the

compound characters get separated or split during pre-

processing, still the compound characters can be

recognized in our proposed system.

In our proposed system, we aim at recognizing

compound characters in Marathi language. This is done

by KNN. The above figure shows the block diagram of

the proposed system, which consist different phases

starting with character input, pre-processing, structural

classification, resized character

III. ALGORITHM & DESCRIPTION:

1. Image Acquisition

Image acquisition in image processing can be broadly

defined as the action of getting an image from some

source, a hardware-based source, so it can be passed

through processes need to occur afterward performing

image acquisition in image processing is always the

first step in the workflow sequence. The Input Image

that we get is completely unprocessed like original one.

One of the ultimate goals of this process is to have a

source of input that operates within such controlled and

measured guidelines that the same image can

reproduced under the same conditions so various

corresponding factors are easier to locate.

2. Preprocessing

A point operator converts gray scale character images to

binary. This operator separates pixels that have values

within specified range means the object from the rest or

the background. This is done by choosing a threshold

that separates object and the background. Here, the

threshold is chosen by using uniform thresholding after

normalization in our method. In uniform thresholding,

pixels above a threshold are set to white and those

below the threshold are set to black and that threshold

we have to choose. Uniform thresholding requires the

knowledge of the gray levels otherwise the target

features might not get selected or may get misclassified

after thresholding process. The joint characters were

tested and checked for the global features for various

threshold values before finalization of a threshold. On

testing about one third of the characters in the database,

the normalized threshold value of 0.75 was found to be

an optimum value that gave correct feature selection of

global features in most of the cases. Pre-processing

plays very important role in handwritten character

recognition as in any other pattern recognition task.

Joint characters show various undesirable effects like

unwanted strokes, gaps or breaks which occur due to

binarization. Many a times when a character is joint

JASC: Journal of Applied Science and Computations

Volume VI, Issue I, January/2019

ISSN NO: 1076-5131

Page No:1404

Page 3: Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

character, it exhibits lesser width at the curvature than

at other parts of the character. This point is more likely

to break during binarization process.

A system is proposed to recognize compound Marathi

characters in this paper. Character recognition is very

vast field. Various methods are used for the same.

Marathi characters are more complex than English

characters due to various characterastics. Classical

methods of character recognition are not considered to

be as successful for recognition of Marathi characters as

Marathi characters differ in size, shape and style from

person to person and from time to time with same

person writing. Hence this project uses the KNN

technique using evolutionary computational algorithm

for character recognition.

The main advantage of using this technique is that it

provides features extraction and detection that is

suitable for character recognition and improved

efficiency over earlier methods. Neural network gains

more successful in character recognition as compared

with other classical methods as it functions like neural

network with evolutionary computational algorithms.

Preprocessing aims to produce data that are easy for the

computer related system accurately. Preprocessing

enhances the image features thus reducing the effect of

variations in method & rendering it suitable for further

processing. In this project the various operations

performed during preprocessing are noise reduction,

normalization, binarization, edge detection, dilation &

filling. Initially load the dataset image. Then crop the

required character image (i.e. character to be

recognized) manually. Then cropped image is converted

into gray scaled image for further processing. After that

binarization is carried out. After this step, edge of the

binary image is detected. Image dilation & filling of

holes is performed after binarization.

3. Segmentation

The text line segmentation methods can be normally

classified into two types bottom-up and top-down. In

the bottom-up approach, the neighboring components

are grouped using some easy rules depending on the

geometric relationship between neighboring blocks. The

projection based methods are the top-down algorithms

which is one of most successful methods for machine

printed text. The projection based methods are also

successful for handwritten text where text lines are

straight or easily separable. But due to different writing

styles of the people, the text line segmentation is still

very challenging task. In general, text-line segmentation

techniques are script independent methods. In the

proposed system we have used projection based method

for segmentation of lines and characters.

4. Feature Extraction

KNN is used for feature extraction. Features are

extracted using single level decomposition as discusses

earlier. The approximation coefficients obtained for

every character after single level decomposition. The

modified KNN features are also generated in order to

improve the recognition results. Feature extraction is

the next step after preprocessing the data. After the

preprocessing the feature set is extracted with various

properties. Extracted features from the character images

are used to train the neural network and with the help of

various algorithms. In this stage, the features of the

characters that are used for classifying them at

recognition stage are extracted. The edges & end points

of the image are detected & are considered as features

for neural network.

Features are extracted using KNN as discusses earlier.

The single level decomposition leads approximation

features. The modified features are also generated in

order to improve the recognition results. The modified

features are obtained by convolving the approximation

features with themselves.

5. Recognition

Finally character recognition is carried out using neural

network in our proposed paper. In this process a

character to be recognized is assigned to the network

and label is given to next step. Several other patterns of

the same character are taught to neural network under

the computational algorithm under the same name and

characterastics. Hence system learns several variations

in the handwritten characters and these variations gets

adaptive to it. At this time of training of the neural

network weight matrix is initialized to 0. During the

training process the input assigned to matrix is defined.

Each character processes corresponding weight matrix.

As the learning of the character progresses, the weight

of the character is to be updated. In order to train the

network, features of the character which are previously

extracted from the character are given to neural

network. To recognize these patterns, the instructions

are then given to the network. In accordance with this,

the weight matrix is updated. Every time, the weights

are adjusted in such a way as to give an output closer to

the desired output than before used. In this project, feed

KNN is used If the features of the character are not

matched with target, the error is back propagated &

weights are updated with similar one. The process is

continues till features of selected character are matched

with target. Finally output is displayed. The work is

carried out in MATLAB.

IV. RESULT & CONCLUSION:

Compound character is one of the features of the

Marathi script and commonly used. This paper presents

JASC: Journal of Applied Science and Computations

Volume VI, Issue I, January/2019

ISSN NO: 1076-5131

Page No:1405

Page 4: Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

a system for compound character recognition for

Marathi script. For this we used Marathi language OCR

and used KNN for better results. The recognition of

characters is done using KNN recognition scheme.

Fig2: GUI design for project work

Fig3: Importing an image

Fig4: Binarization Process

Fig5: Line by line segmentation

Fig6: Word to word segmentation

Fig7: Recognition and displaying of compound

characters

This approach is KNN approach is presented for the

recognition of compound Marathi character. So using

KNN we get better results as compared to other

methods.

V. REFERENCES [1] H. S. Baird,” Anatomy of a versatile page reader”,. Proc.

of the IEEE, 80(7):1059-1065, 1992.

[2] G. Nagy. “Twenty years of document image analysis”,

PAMI. IEEE Trans. On Pattern Analysis and Machine

Intelligence, 22(1):38-62, 2000.

JASC: Journal of Applied Science and Computations

Volume VI, Issue I, January/2019

ISSN NO: 1076-5131

Page No:1406

Page 5: Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

[3] C. Y. Suen, S. Mori, S. H. Kim, and C. H. Leung.,”

Analysis and recognition of Asian scripts - The state of

the art.”, Proc. of the 6th Int. Conf. on Document

Analysis and Recognition (ICDAR), pages 866-878,

2003.

[4] U. Pal and BB Chaudhuri. “Indian script character

recognition: A survey. Pattern Recognition,” 37(9):1887-

1899, 2004.

[5] V. K. Govindan and A. P. Shivprasad, “Character

Recognition - A Review,”Pattern Recognition, vol.23

no.7,pp 671-683, 1990.

[6] SuryaPrakash Kompalli, Srirangaraj Setlur, Venugopal

Govindaraju, Ramanaprasad Vemulapati ,”Creation of

data resources and design of an evaluation test bed for

Devanagari script recognition.”,13th International

Workshop on Research Issues on Data Engineering:

Multi-lingual Information Management

[7] SuryaPrakash Kompalli, Srirangaraj Setlur, Venugopal

Govindaraju, Ramanaprasad Vemulapati ”Creation of

data resources and evaluation tool for multi-lingual

OCR.”,.

Symposium on Document Image Understanding

Technology - 2003 .

[8] D. Trier, A. K. Jain, T. Taxt, “Feature Extraction

Method for Character Recognition - A Survey”, Pattern

recognition, vol.29, no.4, pp.641-662, 1996.

[9] Huang YS, Suen CY. A method of combining multiple

experts for the recognition of unconstrained handwritten

numerals. 1EEE Transactions on Pattern Analysis and

Machine Intelligence 1995; 17(1): 90-94

[10] R.M.K. Sinha, H. Mahabala,,”Machine recognition of

Devanagri script”, IEEE Trans. System, Man Cybern.

9(1979) 435-441.

[11] Plamondon, R. Srihari, S.N. ,Ecole Polytech.,Montreal,

Que.; Online and Offline HandwritingRecognition : A

comprehensive Survey,1EEE Transactions On Pattern

Analysis And Machine Intelligence. VOL. 22, NO. 1.

JANUARY 2000 63

[12] U. Pal , B.B. Chaudhuri , “Printed Devanagri script OCR

system”, Vivek 10 (1997) 12-24.

[13] S. Palit, B.B. Chaudhuri,,”A feature-based scheme for

the machine recognition of printed Devanagri script”,

P.P. Das, B.N. Chatterjee (Eda.) Pattern Recognition,

Image Processing and Computer Vision, Narosa

Publishing House: New Delhi, India 1995, pp. 163-168.

[14] I.K. Sethi, B. Chatterjee, “Machine recognition of

constrained hand-printed Devanagri numerals”, J.

Inst.Electron. Telecom. Eng. 22 (1976) 532-535.

[15] R.M..K. Sinha, “A syntactic pattern analysis system and

its application to Devanagri script recognition”, Ph.D.

Thesis , Electrical Engineering Department, Indian

Institute of Technology, India, 1973.

[16] V. Bansal, R.M.K. Sinha, “Partitioning and searching

dictionary for correction of optically read Devanagri

characters strings”, Proceedings of the Fifth International

Conference on Document Analysis and Recognition ,

1999, pp. 653-656.

[17] S. Arora, D.Bhattacharya, M. Nasipuri, L.Malik, “A

Novel Approach for Handwritten Devanagari Character

Recognition” in IEEE –International Conference on

Signal And Image Processing, Hubli, Karnataka, Dec 7-

9, 2006.

[18] M. Hanmandlu and O.V. Ramana Murthy, “Fuzzy Model

Based Recognition of Handwritten Hindi Numerals”, In

Proc. Intl. Conf. on Cognition and Recognition, pp. 490-

496, 2005.

[19] R. Bajaj, L. Dey, and S. Chaudhury, “Devanagri numeral

recognition by combining decision of multiple

connectionist classifiers”, Sadhana, Vol.27, pp.-59-72,

2002.

[20] U. Bhattacharya, S. K .Parui, B. Shaw, K. Bhattacharya,

“Neural combination of ANN and HMM for handwritten

Devanagri Numeral Recognition”, In Proc. 10th IWFHR,

pp.613-618, 2006.

[21] S. Kumar and C. Singh, “A Study of Zernike Moments

and its use in Devanagri Handwritten Character

Recognition”, In Proc. Intl. Conf. on Cognition and

Recognition, pp. 514-520, 2005.

[22] N. Sharma, U. Pal, F. Kimura and S. Pal, “Recognition

of Offline Handwritten Devanagri Characters using

Quadratic Classifier”, In Proc. Indian Conference on

Computer Vision Graphics and Image Processing, pp-

805-816, 2006.

[23] Feature Extraction Techniques Implementation

Review and Case Study

Uma Bhati Department of Computer Science &

Engineering JSS Academy of Technical Education

Noida-201301

[24] A Review of Research on Devnagari Character

Recognition

Vikas J Dongre Vijay H Mankar Department of

Electronics & Telecommunication, Government

Polytechnic, Nagpur, India

[25] Segmentation of Marathi Handwritten Characters

and Numerals

Ratnashil N Khobragade Assistant Professor, P G

Dept of CS, SGB Amravati University, Amravati,

Maharastra, India

[26] A Streamlined OCR System for Handwritten

Marathi Text Document Classification and

Recognition Using SVM-ACS Algorithm

Surendra Pandurang Ramteke Department of

Electronics & Telecommunication Engineering,

Shram Sadhana Bombay Trust College of

JASC: Journal of Applied Science and Computations

Volume VI, Issue I, January/2019

ISSN NO: 1076-5131

Page No:1407

Page 6: Joint Marathi Character Recognition using KNN Classifier · script consists of 16 vowels and 36 consonants total 52 alphabets. Marathi is written from left to right. It has no upper

Engineering and Technology, Bambhori,

Maharashtra, India

[27] Feature Extraction for Marathi Compound

Character Using Edge Map

Mrs.Snehal S.Golait Research Scholar ,Department

of Computer Science and Engineering,

G.H.Raisoni College of Engineering,Nagpur,

JASC: Journal of Applied Science and Computations

Volume VI, Issue I, January/2019

ISSN NO: 1076-5131

Page No:1408