11
4588 www.ijifr.com Copyright © IJIFR 2015 Reviewed Paper International Journal of Informative & Futuristic Research ISSN (Online): 2347-1697 Volume 2 Issue 12 August 2015 Abstract Character recognition plays an important role in many recent applications. Recently there is growing trend among worldwide researchers to recognize handwritten words of many languages and scripts. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as Sanskrit has hampered information extraction from a large body of documents of cultural and historical importance. There are various feature extraction techniques such as Gradient feature extraction, Stroke method, Fourier descriptor and chain code histogram. A proper feature extraction technique can increase the recognition ratio. In this work, a curvelet transform and shape moment are investigated for Marathi and Sanskrit handwritten word identification system. Curvelet transform supports the edges and curve discontinuities. After the curvelet transform several group of curvelet coefficients are generated at different scale and angles. These coefficient are used with GLCM to compute features as contrast, correlation, homogeneity and energy of word image. Genetic algorithm is chosen for the optimization and finding the number of hidden nodes which uses as an input to neural network in classification step. After optimization, classification techniques have been used for training and testing purpose. Most of the researchers are used various classifiers for recognition of online handwritten word like Hidden Marko Model, K- Nearest Neighbour and Support Vector Machine. It is found that Neural Network gives better accuracy for recognition purpose. Marathi And Sanskrit Word Identification By Using Genetic Algorithm Paper ID IJIFR/ V2/ E12/ 049 Page No. 4588-4598 Subject Area Computer Engineering Key Words Wiener filter, Curvelet transform, Genetic algorithm, Neural Network Received On 20-08-2015 Accepted On 28-08-2015 Published On 30-08-2015 Priyanka Pradip Kulkarni 1 Research Scholar, Department Of Computer Engineering G.H.Raisoni Institute of Engineering and Management, Jalgaon-Maharashtra Sonal Patil 2 Assistant Professor Department Of Computer Engineering G.H.Raisoni Institute of Engineering and Management, Jalgaon-Maharashtra Ganesh Dhanokar 3 Assistant Professor Department Of Computer Engineering G.H.Raisoni Institute of Engineering and Management, Jalgaon-Maharashtra

Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4588

www.ijifr.com Copyright © IJIFR 2015

Reviewed Paper

International Journal of Informative & Futuristic Research ISSN (Online): 2347-1697

Volume 2 Issue 12 August 2015

Abstract Character recognition plays an important role in many recent applications. Recently there is growing trend among worldwide researchers to recognize handwritten words of many languages and scripts. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as Sanskrit has hampered information extraction from a large body of documents of cultural and historical importance. There are various feature extraction techniques such as Gradient feature extraction, Stroke method, Fourier descriptor and chain code histogram. A proper feature extraction technique can increase the recognition ratio. In this work, a curvelet transform and shape moment are investigated for Marathi and Sanskrit handwritten word identification system. Curvelet transform supports the edges and curve discontinuities. After the curvelet transform several group of curvelet coefficients are generated at different scale and angles. These coefficient are used with GLCM to compute features as contrast, correlation, homogeneity and energy of word image. Genetic algorithm is chosen for the optimization and finding the number of hidden nodes which uses as an input to neural network in classification step. After optimization, classification techniques have been used for training and testing purpose. Most of the researchers are used various classifiers for recognition of online handwritten word like Hidden Marko Model, K- Nearest Neighbour and Support Vector Machine. It is found that Neural Network gives better accuracy for recognition purpose.

Marathi And Sanskrit Word Identification

By Using Genetic Algorithm

Paper ID IJIFR/ V2/ E12/ 049 Page No. 4588-4598 Subject Area Computer

Engineering

Key Words Wiener filter, Curvelet transform, Genetic algorithm, Neural Network

Received On 20-08-2015 Accepted On 28-08-2015 Published On 30-08-2015

Priyanka Pradip Kulkarni1

Research Scholar, Department Of Computer Engineering G.H.Raisoni Institute of Engineering and Management, Jalgaon-Maharashtra

Sonal Patil 2

Assistant Professor Department Of Computer Engineering G.H.Raisoni Institute of Engineering and Management, Jalgaon-Maharashtra

Ganesh Dhanokar 3

Assistant Professor Department Of Computer Engineering G.H.Raisoni Institute of Engineering and Management, Jalgaon-Maharashtra

Page 2: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4589

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

1. Introduction

Handwritten character recognition (HCR) is a part of offline character recognition. Handwritten

characters have infinite variety of style from one person to another person. Due to this wide range

of variability; it is difficult to recognize by a machine. Although the research in Optical Character

Recognition (OCR) has been going on for last few decades, the goal of this area is still out of reach.

Most of the researchers have tried to solve the problems based on the image processing and pattern

recognition techniques. OCR (Optical Character Recognition) is an active field of research in

Pattern Recognition. OCR methodologies can be classified based on two criteria; data acquisition

process which can be on-line or off-line and type of the text which is printed text or hand-written

text[1] Devnagari is the most admired Indian script, used by more than 500 million people, which

forms the basis for several Indian languages including Hindi, Sanskrit, Kashmiri, Marathi and so

on. English character recognition is extensively studied by many researchers and various

commercial systems are available for it. But in case of Indian languages, the research work is very

limited due to the complex structure of the language. Normally, HCR can be divided into three

steps namely pre-processing, feature extraction, segmentation and classification. Pre-processing

stage is to produce a clean character image, it is can be used directly and efficiently by the feature

extraction stage. Feature extraction stage is to remove redundancy from data. Segmentation stage is

to increase the efficiency for next stage. A classification stage is to recognize characters or words

.Feature extraction in HCR is a very important field of image processing and objects recognition

[2].

1.1 Devanagari Character Set

Devanagari is an Indian, syllabic alphabetic type of script that is used to write several languages

like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many

languages that are spoken in various parts of India. The word Devanagari is a combination of two

words deva which means God and nagari which means urban establishment. Put together, these

words mean Script of the Gods or Script of the urban establishmentThe basic set of symbols of

Devanagari script consists of 13 vowels (or swar), 37 consonants (or vyanjan) as shown in Figure

1.1

Figure 1.1: (a). Devanagari Vowels (b). Devanagari Consonants

1.2 Word Identification Architecture

Page 3: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4590

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

Optical word identification involves many steps to completely recognize and produce machine

encoded text. These phases are termed as: Pre-processing, Segmentation, Feature extraction,

Classification. The architecture of these phases is shown in figure 1.2 and these phases are listed

below with brief description [3].

Figure 1.2: Architecture of System

Pre-processing

The pre-processing phase normally includes many techniques applied for binarization, noise

removal, skew detection, slant correction, normalization, contour making and skeletonization like

processes to make character image easy to extract relevant features and efficient recognition.

Feature Extraction

Feature extraction is used to extract relevant features for recognition of characters based on these

features. First features are computed and extracted and then most relevant features are selected to

construct feature vector which is used eventually for recognition. The computation of features is

based on structural, statistical, directional, moment, transformation like approaches.

Classification

Each pattern having feature vector is classified in predefined classes using classifiers. Classifiers are

first trained by a training set of pattern samples to prepare a model which is later used to recognize

the test samples. The training data should consist of wide varieties of samples to recognize all

possible samples during testing. Some examples of generally practiced classifiers are- Support

Vector Machine (SVM), K- Nearest Neighbour (K-NN), Artificial Neural Network (ANN) and

Probabilistic Neural Network (PNN).

Page 4: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4591

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

2. Literature Survey

Studies have been carried on recognition of different words. The most advanced and efficient OCR

systems are designed for English, Chinese and Japanese like scripts and languages. In context to

Indian languages and scripts, recently a significant research work is proposed. Most of Indian work

on character recognition is dominated by Devanagari script, which is used in writing of Hindi,

Marathi, and Nepali and Sanskrit languages. The work is also performed on compound word for

recognition and classification purpose by extracting different features. Different classifiers also used

in comparison to increase the accuracy rate of recognition.

Namita Dwivedi have described recognition of Sanskrit word using Prewitt’s operator for

extracting the features from an image thinning process is applied in pre processing

technique Thinning is an important pre-processing step in OCR. The purpose of thinning is

to delete redundant information and at the same time retain the characteristic features of the

image. Freeman Chain code is one of the representation techniques that is useful for image

processing, shape analysis and pattern recognition fields is used with heuristic approach for

feature extraction. Genetic algorithm is used for non-linear segmentation of multiple

characters. This recognition model was built from SVM classifiers for higher level

classification accuracy [1].

U. Pal, Wakabayashi and Kimura also presented comparative study of Devanagari

handwritten character recognition using different features and classifiers [4]. They used four

sets of features based on curvature and gradient information obtained from binary as well

as gray scale images and compared results using 10 different classifiers as concluded the

best results 74.74% and 75.17% for features extracted from binary and gray image

respectively obtained with Mirror Image Learning (MIL) classifier.

Sarbajit Pal et al.[5] have described projection based statistical approach for handwritten

character recognition. They proposed four sided projections of characters and projections

were smoothed by polygon approximation.

Nikita Gaur and Dayashankar Singh et al.[6] have described gradient feature extraction

approach for recognition of Sanskrit word they have used sobel operator for edge detection

The Sobel operator is used in image processing, particularly with edge detection

algorithms. Technically, it is a discrete differentiation operator, computing an

approximation of the gradient of the image intensity function. The skeletonized and

normalized binary pixels of English characters were used as the inputs of the MLP network.

The results of structure analysis show that if the number of hidden nodes increases the

number of epoches taken to recognize the handwritten character is also increases. A lot of

efforts have been made to get higher accuracy as 94%.

Brijmohan Singh, Ankush Mittal, M.A. Ansari, Debashis Ghosh et al.[7] have described a

holistic system of offline handwritten Devanagari word recognition. In this paper, they

proposed a Curvelet feature extractor with SVM and k-NN classifiers based scheme for the

recognition of handwritten Devanagari words.

Vedgupt Saraf, D.S. Rao et al.[8] have proposed a genetic algorithm scheme towards the

recognition of off-line Devnagari handwritten characters. They tested their proposed system

on different individuals’ samples and obtained 98.78% recognition accuracy.

Prachi patil,saniya ansari et al.[9]have proposed Online Handwritten Devnagari Word

Recognition using HMM based Technique. Feature extraction of input image is done by

Page 5: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4592

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

android technology. Using that features HMM recognizes the word. They tested proposed

system on different word images and obtained 95.70% recognition accuracy.

Gaurav kumar,Pradeep kumar Bhatia et al.[10] have proposed Neural Network based

Approach for Recognition of Text Images. They used Otsu method for binarization, after

thet median filter is used for noise removal. Feature extraction takes place through Fourier

transform. A multilayer feed forward neural network is created and trained through Back

Propagation algorithm for classification achieves 93.58% accuracy.

Sushama Shelke,Shaila Apte et al.[11] have proposed A Multistage Handwritten Marathi

Compound Character Recognition Scheme using Neural Networks and Wavelet Features In

this the recognition is carried out using multistage feature extraction and classification

scheme. The average recognition rate is found to be 96.14% and 94.22% respectively for

training and testing samples with wavelet approximation features and 98.68% and 96.23%

respectively for training and testing samples with modified wavelet features.

Sandhya Arora1. Debotosh Bhattacharjee et. al.[12] proposed Performance Comparison of

SVM and ANN for Handwritten Devnagari Character Recognition. They extracted shadow

features, chain code histogram features, view based features and longest run features. These

features are then fed to Neural classifier and in support vector machine for classification.

They achieved the accuracy rate as 99.62% through SVM and 99.50% using NN.

M. N. Sandhya Arora, D. Bhattacharjee et. al.[13] proposed Recognition of non-compound

handwritten devnagari characters using a combination of MLP and minimum edit distance

they used two well-known and established pattern recognition techniques: one using neural

networks and the other one using minimum edit distance and characters are represented

using shadow feature and chain code histogram. The method is carried out on a database of

7154 samples. The overall recognition is found to be 90.74%.

3. Steps for Identification of Marathi and Sanskrit Word

The basic steps for recognition of handwritten Devnagari Marathi and Sanskrit word recognition

system are pre-processing, feature extraction and recognition.

3.1 Dataset

The standard (benchmark) database for Indian script is neither available freely nor commercially,

hence, Collect the samples of Devanagari handwritten words data from different professionals

belonging to schools, colleges. The contributors to these data samples are of different educational

backgrounds of metric, graduate, post graduate level qualification and different professions as

student, teacher, and children. This database contains digital images. Consists of 624 samples for

training purposes which are collected from 26 different writers of different profile and educational

backgrounds. We have taken all these samples papers written in isolated manner. The writers wrote

the word in different size using different pens to write. These words are having different character

width. In some words there are distortions also introduced and amount of such distortion depends

on quality of pen ink used to write words and speed of the writer to write the words.

3.2 Image pre-processing

The raw data is subjected to a number of preliminary processing steps to make it usable in the

descriptive stages of character analysis. Pre-processing aims to produce data that are easy for the

character recognition systems to operate accurately.

Page 6: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4593

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

3.2.1 Smoothing

Smoothing operations are used to blur the image and reduce the noise. Blurring is used in pre-

processing steps such as removal of small details from an image. In binary images, smoothing

operations are used to reduce the noise or to straighten the edges of the characters, for example, to

all the small gaps or to remove the small bumps in the edges (contours) of the characters. Generally

filters are used to filter unwanted things or object in a spatial domain or surface. In digital image

processing, mostly the images are affected by various noises. The main objectives of the filters are

to improve the quality of image by enhancing is to improve interoperability of the information

present in the images for human visual. In this system wiener filter is used for smoothing.

3.2.2 Gaussian filter

Noise in digital images can be described as random fluctuations in brightness and color. It will

corrupt information in an image such that the intensity at each pixel is a combination of the true

signal and the noise. Although many scanners and cameras have some built in noise r Reduction

most digital images will display some degree of noise as a result of either Scanning ,transmission or

conversion Two of the most common types of noise are additive Gaussian and salt and pepper.

Gaussian noise, or amplifier noise, will impair the image with a linear addition of white noise,

meaning that it is independent of the image itself and usually evenly distributed over the frequency

domain .As the name suggests, the intensity of the noise at each pixel follows a Gaussian normal

distribution. It is most apparent for high frequencies and can in most cases be reduced by some kind

of low pass filter. A Gaussian blur (also known as Gaussian smoothing) is the result of blurring an

image by a Gaussian function to reduce image noise and reduce detail. Mathematically, applying a

Gaussian blur to an image is the same as convolving the image with a Gaussian function.

3.2.3 Wiener Filter

The inverse filtering is a restoration technique for deconvolution, i.e., when the image is blurred by

a known low pass filter, it is possible to recover the image by inverse filtering or generalized

inverse filtering. However, inverse filtering is very sensitive to additive noise. The approach of

reducing degradation at a time allows us to develop a restoration algorithm for each type of

degradation and simply combine them. The Wiener filtering executes an optimal trade-off between

inverse filtering and noise smoothing. It removes the additive noise and inverts the blurring

simultaneously. The Wiener filtering is optimal in terms of the mean square error. In other words, it

minimizes the overall mean square error in the process of inverse filtering and noise smoothing.

The Wiener filtering is a linear estimation of the original image [14]. The approach is based on a

stochastic framework. The orthogonality principle implies that the Wiener filter in Fourier domain

can be expressed as follows:

Where Sxx(f1,f2),Sηη(f1,f2) are respectively power spectra of the original image and the additive

noise, and H(f1,f2) is the blurring filter. It is easy to see that the Wiener filter has two separate part,

an inverse filtering part and a noise smoothing part. It not only performs the deconvolution by

inverse filtering (high pass filtering) but also removes the noise with a compression operation (low

pass filtering).

Page 7: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4594

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

3.3 Feature Extraction Technique

Any given image can be decomposed into several features. Feature extraction technique is

accurately retrieve features of characters. The extracted features are organized in a database, which

is the input for the recognition phase of the classifier. Feature extraction is a very important in

recognition system because it is used by the classifier to classify the data.

3.3.1 Curvelet Transform

A feature extraction scheme based on digital Curvelet transform has been used in[14]. In this work,

the words from the sample images are extracted using conventional methods. A usual feature of

handwritten text is the orientation of text written by the writer. Each sample is cropped to edges and

resized to a standard width and height suitable for digital Curvelet transform. The digital Curvelet

transform at a single scale is applied to each of the samples to obtain Curvelet coefficients as

features. For large set of characters, as in Devanagari language, automatic curve matching is highly

useful. Considering this, here the use of curvelet transforms which represents edges and

singularities along curves more precisely with the needle-shaped basis elements. The elements own

super directional sensitivity and smooth contours capturing efficiency. Since Curvelet are two

dimensional waveforms that provide a new architecture for multiscale analysis, they can be used to

distinguish similar appearing characters better.

The Curvelet frame preserves the important properties, such as parabolic scaling, tightness and

sparse representation for surface-like singularities of co-dimension one. Since many of the

characters in a word not only consist of edge discontinuities but also of curve discontinuities. The

most widely used Wavelet transform works well with edge discontinuities but a curve discontinuity

affects all the Wavelet coefficients. On the other hand, the curve discontinuities in any character or

word are well handled with Curvelet transform with very few numbers of coefficients. Hence,

Curvelet-based feature are likely to work well for Devanagari character and word recognition. The

curvelet transform includes four stages:

i.) Sub-band decomposition: is to divide the image into resolution layers where each layer

contains details of different frequencies.

ii.) Smooth Partitioning: Each sub band is smoothly windowed into ”squares” of an appropriate

scale.

iii.) Renormalization: Each resulting square is renormalized to unit square.

iv.) Ridge let analysis: Each square is analyzed in the ortho-ridgelet system

Curvelet are appropriate bases for representing images (or other functions) which are smooth apart

from singularities along smooth curves, where the curves have bounded curvature, i.e. where

objects in the image have a minimum length scale. The features extracted from input image by

using curvelet transform as contrast, correlation, homogeneity and energy.

a. Contrast:-Contrast is the difference in luminance or color that makes an object (or its

representation in an image or display) distinguishable. In visual perception of the real world,

contrast is determined by the difference in the color and brightness of the object and other

objects within the same field of view.

b. Correlation:- the process of establishing a relationship or connection between two or more

things.

c. Homogeneity:- the quality or state of being homogeneous.

Page 8: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4595

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

d. Energy:-It is a function that would capture the solution we desire and perform gradient

descent to compute its lowest value, resulting in a solution for the image segmentation.

3.3.2 Shape Moment Feature

In fact, the main problem in character recognition system is the large variation in shapes within a

class of character. This variation depends on font styles, document noise, photometric effect,

document skew and poor image quality. The large variation in shapes makes it difficult to

determine the number of features that are convenient prior to model building. Various shape based

and boundary based features are taken from individual character. The various Moment based

features like total Kurtosis, Skewness, moment, percentile and quartile are calculated from the

character images.

3.4 Optimization

Genetic algorithms are a very good means of optimizations in such problems. They optimize the

desired property by generating hybrid solutions from the presently existing solutions. These hybrid

solutions are added to the solution pool and may be used to generate more hybrids. These solutions

may be better than the solutions already generated. All this is done by the genetic operators, which

are defined and applied over the problem [15]. Genetic algorithms are a good means of

optimizations. Here proposed work use genetic algorithm to compute optimum number of hidden

nodes which is the input to Multilayer Perceptron Neural Network. The system selects the optimum

number of nodes when the mean square value becomes minimum at the same time the NN reaches

the maximum number of epoch.

Fitness Function: - In Genetic Algorithms, the fitness function is used to test the goodness of the

solution. This function, when applied on any of the solution from the solution pool, tells the level of

goodness.

3.5 Classification

The decision making part of a recognition system is the classification stage and it uses the features

extracted from the previous stage. There are various methods for classification. K Nearest Neighbor

(KNN), Artificial Neural Network(ANN) and Support Vector Machine (SVM). The characteristics

of the some classification methods that have been successfully applied to Offline Devnagari Marathi

and Sanskrit word recognition and results of Neural Network classification is better than other

classification methods, applied on Handwritten Devnagari word.

3.6 Artificial Neural Network

The network needs to be trained first with some predefined standard character patterns to perform

the recognition task. BPNN algorithm is used for this, which is considered as the unsupervised form

of learning method where every neuron competes with each other in the basis of their activation

value. The connection weights towards the winner neuron get adjusted during training process.

Some random values are assigned initially to all the connection weights, during the training process

these values are converged to some fixed values. The training process is similar to an unsupervised

training method [16].

Artificial neural networks are one of the popular techniques used for classification due to their

learning and generalization abilities. They have been traditionally used for character recognition

application. Out of various architectures, multilayer perceptron (MLP) is widely used. The MLP is

a fully connected network, where every neuron in a layer is connected to each and every neuron in

the next layer by a weighted link through which the state of the neuron is transmitted. It consists of

Page 9: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4596

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

an input layer, a hidden layer and an output layer. A bias is similar to weight. It acts exactly as a

weight on a connection from a unit whose activation is always one. Each neuron in the hidden layer

includes a nonlinear activation function [11].

A multilayer perceptron (MLP) is a feed forward artificial neural network model that maps sets of

input data onto a set of appropriate outputs. A MLP consists of multiple layers of nodes in a

directed graph, with each layer fully connected to the next one. Except for the input nodes, each

node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a

supervised learning technique called back propagation for training the network. MLP is a

modification of the standard linear perceptron and can distinguish data that are not linearly

separable.

When the network weights and biases are initialized, the network is ready for training. The

multilayer feed forward network can be trained for function approximation (nonlinear regression)

or pattern recognition. The training process requires a set of examples of proper network behavior-

network inputs p and target outputs The process of training a neural network involves tuning the

values of the weights and biases of the network to optimize network performance, as defined by the

network performance function net.performFcn. The default performance function for feed forward

networks is mean square error mse-the average squared error between the networks outputs a and

the target outputs t. It is defined as follows:

When training large networks, and when training pattern recognition networks, trainscg and trainrp

are good choices. Their memory requirements are relatively small, and yet they are much faster

than standard gradient descent algorithms.

The optimisation technique used for training this architecture was the Scaled Conjugate Gradient

(SCG) method. SCG method was used because it gave better results and has been found to solve the

optimization problems encountered when training an MLP network more efficiently than the

gradient descent and conjugate gradient methods [17].

4. Result

The experimental result is given below in detail.

4.1 Experimental Results

In-order to train the classifiers, a set of training Marathi and Saskrit words are required and their

respective features which are extracted through feature extraction step. There are two basic phases

of pattern classification. They are training and testing phases. In the training phase, data is

repeatedly presented to the classifier, while weights are updated to obtain a desired response. In

testing phase, the trained system is applied to data that it has never seen to check the performance

of the classification. Hence, we need to design the classifier by partitioning the total data set into

training and testing data set. From the total data set, In neural network before training the data the

whole data needs to partition into training data, validation testing data and testing data. Each time

when the neural network train the whole data divide into these three types. Each time a neural

network is trained can result in different solution due to different initial weight and bias value and

different division of data into training, validation and test sets. That’s why each time it gives

different output to the same input. The table given in the 4.1 below shows the percentage accuracy

for sample five word images from Marathi word dataset and 4.2 shows the percentage accuracy for

Page 10: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4597

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

sample five word image from Sanskrit word dataset. The accuracy for both Marathi and Sanskrit

words are different each time when we train the neural network.

Table 4.1: Accuracy Result Table for Marathi Word Sample

Table 4.2: Accuracy Result Table for Sanskrit Word Sample

5. Conclusion

In research work standard datasets for Marathi and Sanskrit word is not publically available, since

first of all prepared database of 624 samples for Marathi and Sanskrit word by considering all

possible constraints such as variations in writing styles, similar word sample consisting with some

noise etc. Words are written from different people of different age. The technique discussed above

is curvelet transform and shape moment involves the extraction of the shape feature. Feature

Page 11: Marathi And Sanskrit Word Identification By Using Genetic ... · like Sanskrit, Hindi, Marathi, Bhojpuri, Nepali, Konkani, Sindhi,Marwari, Pali, Maithli and many languages that are

4598

ISSN (Online): 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 2, Issue - 12, August 2015 24th Edition, Page No: 4588-4598

Priyanka Pradip Kulkarni, Sonal Patil , Ganesh Shankar :: Marathi And Sanskrit Word Identification By Using Genetic Algorithm Geography

extraction for compound word is difficult. But curvelet transform supports the edges and curve

discontinuities. Hence, Curvelet Transform proves to be useful in Marathi and Sanskrit word

recognition system. This work present neural network based classification system for the

recognition of offline handwritten Marathi and Sanskrit words. Among all the methods it is found

that NN gives better accuracy for recognition purpose.

However, the accuracy of proposed scheme can be enhanced by increasing the number of training

samples or applying the proposed scheme at different resolution scheme. This recognition system

can be extended for the recognition of other words, sentence and documents.

6. References [1] N. Dwivedi and N. arya, “Sanskrit Word Recognition using Prewitts Operator and Support Vector

Classification,” IEEE ICECCN, 2013.

[2] B.Indira and R. Saqib, “Devanagari Character Recognition: A Short Review,” International Journal of

Computer Applications, vol. 4, December 2010.

[3] S. Kubatur and M. Sid-Ahmed, “A Neural Network Approach to Online Devanagari Handwritten

Character Recognition,” IEEE, pp. 209–214, December 2012.

[4] U. Pal and F. Kimura, “Comparative Study of Devanagari Handwritten Character Recognition using

Different Feature and Classifiers,” 10th International Conference on Document Analysis and Recognition,

2009.

[5] S. Pal and J. Mitra, “A Projection Based Statistical Approach for Handwritten Character Recognition,”

Proceedings of International Conference on Computational Intelligence and Multimedia Applications, vol.

2, pp. 404–408, 2007.

[6] N. Gaur and D. Singh, “Sanskrit Word Recognition using Gradient Feature Extraction,” VSRD-IJCSIT,

vol. 2, pp. 167–174, 2012.

[7] B. Singh and A. Mittal, “Handwritten Devanagari Word Recognition: A Curvelet Transform Based

Approach,” International Journal on Computer Science and Engineering (IJCSE), vol. 3, April 2011.

[8] V. Saraf and D. Rao, “Handwritten Devanagari Character Recognition using Gradient Features,”

International Journal of Soft Computing and Engineering (IJSCE), vol. 2, April 2013.

[9] P. Patil and S. Ansari, “Online Handwritten Devnagari Word Recognition using Hmm Based Technique,”

International Journal of Computer Applications, vol. 17, June 2014.

[10]G. kumar and P. kumar Bhatia, “Neural Network Based Approach for Recognition of Text Images,”

International Journal of Computer Applications, vol. 14, January 2013.

[11] S. Shelke and S. Apte, “A Multistage Handwritten Marathi Compound Character Recognition Scheme

using Neural Networks and Wavelet Features,” International Journal of Signal Processing, Image

Processing and Pattern Recognition, vol. 1, March 2011.

[12] S. Arora and D. Bhattacharjee, “Performance Comparison of SVM and ANN for Handwritten Devnagari

Character Recognition,” IJCSI International Journal of Computer Science, vol. 7, May 2010.

[13] S. Arora and D. Bhattacharjee, “Recognition of Non-compound Handwritten Devnagari Characters using

A Combination of MLP and Minimum Edit Distance,” IJCSS.

[14] S. Pannirselvam and S.Ponmani, “Preprocessing of Handwritten Documents using Various Filters – A

Survey,” International Journal of Advanced Research in Computer Science and Software Engineering,

vol. 3, July 2013.

[15] R. Kala and H. Vazirani, “Offline Handwriting Recognition using Genetic Algorithm,” IJCSI

International Journal of Computer Science Issues, vol. 1, March 2010.

[16] R. Dineshkumar and J. Suganthi, “Sanskrit Character Recognition System using Neural Network,”

Indian Journal of Science and Technology, vol. 8, pp. 65–69, January 2015.

[17] M. Abdella and T. Marwala, “The Use of Genetic Algorithms and Neural Networks to Approximate

Missing Data in Database,” Computing and Informatics, vol. 24, pp. 577– 589, 2005.