Modi script character recognition

“MODI SCRIPT CHARACTER RECOGNITION”

BY-Neha Kulkarni

PICT, Pune

CAN YOU READ THIS ????

TABLE OF CONTENTS Introduction Aim Motivation Objectives Challenges Related work done Benefits Architecture UML diagrams Implementation details Demo of working modules Conclusion References

MODI SCRIPT

INTRODUCTION• Modi is an ancient script• Crores of Modi documents • Origin : 12th century and used uptil the

20th century• No machine transliterator available• Documents wilting• Recent OCR techniques being used for

revival

CHARACTER SET OF MODI SCRIPT

STYLES OF MODI SCRIPT

BAHAMANIKALIN

CHITNISI

PESHVEKALIN

ANGLAKALIN

AIM» The aim of this project is to recognize individual

Modi characters from Modi document.

MOTIVATION» MODI is an ancient script (13th century to 1950).» An article is Sakal newspaper dated 9th July

2014, was the driving force behind this project.» Due to immense importance of historical

research there is a need to transliterate Modi Script documents into Devanagari script.

» Manual transliteration is extremely time consuming and costly (approx Rs. 2500/- per page).

OBJECTIVES» Study of existing systems» Study of Modi Script» Taking sample inputs of Modi Documents from

various people and experts» Processing these inputs with the help of image

processing algorithms and recognising using Neural Networks

CHALLENGES» Negligible research regarding Modi Script in

Information Technology» No previous knowledge of Modi script» Handwriting differs from person to person» Modi, being a cursive script is difficult to

process with the help of algorithms» No punctuations in the script

RELATED WORK» “An Approach for Recognizing Modi Lipi using

Otsu’s Binarization Algorithm and Kohenen Neural Network”, is a proposed system in alpha stage which claims to give output with an accuracy of 70%.

» Drawback of this system is that only 22 Modi Script characters have been considered and it also proved to be less efficient in recognising similar looking characters.

» No commercially viable Modi Script Recognition System is available.

BENEFITS» The “ 7/12 cha utara “ or land records that are

mostly in Modi Script would be transliterated.» Many long standing legal disputes would be settled

due to this» Many historical secrets would be unearthed because

research work in Modi Script would become easy» New light would be thrown on the Governance,

Economy, Rule of our ancestors which would be beneficial for everyone

ARCHITECTURE

UML DIAGRAMS

» CLASS DIAGRAM

» STATE DIAGRAM

» USE-CASE DIAGRAM

» SEQUENCE DIAGRAM – FULL SYSTEM

» SEQUENCE DIAGRAM – FAILURE

» SEQUENCE DIAGRAM – HCR

» ACTIVITY DIAGRAM

IMPLEMENTATION DETAILS

INPUT

SYSTEM

GREY-SCALE

BINARIZE

CHAIN CODE FOR FEATURE EXTRACTION

KOHONEN NEURAL NETWORK

OUTPUT

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION


RECOGNITION

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION


RECOGNITION

IMAGE ACQUISITION PHASE

• Acquire a scanned image• Store it in a buffer• Forward it to preprocessing phase

Image acquired using scanner

PREPROCESSING PHASE

PURPOSE : • Suppress unwanted distortions• Enhance image quality• In MSCR, preprocessing includes:

Grayscale conversion Otsu’s binarization

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION


RECOGNITION

GRAYSCALE CONVERSION

• Single intensity value for each pixel

Gray = 0.2126 * R + 0.7152 * G + 0.0722 * B

BINARIZATION• Converting grayscale image to bi-level image• Two possible value for a single bit – 0 or 1• Performance of MSCR depends on accuracy of

this process• Purpose : extract text from image, remove

noise and reduce size of image

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION


RECOGNITION

OTSU’S THRESHOLDING

• Converting a grayscale image to monochrome

• Algorithm : o Iterate through all possible threshold

valueso Calculate measure of spread (variance)

for the pixel levels o Find threshold value where sum of

background and foreground spread is minimum

o Calculate within class varianceo Select final threshold value depending

on minimum variance

Histogram for 6 level gray image

Result of Otsu’s method

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION


RECOGNITION

CHAIN CODE ALGORITHM FOR FEATURE EXTRACTION

This representation is based on 4-connectivity or 8- connectivity of the segments.

In a clockwise direction and assigning a direction to the segments connecting every pair of pixels.

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION


RECOGNITION

RECOGNITION PHASE• Process of matching segmented characters

with data set used to train the network• When character image matches with the

data set successful recognition• Recognition is done by using Kohonen neural

network trained from actual drawn letters to recognise Modi characters from input characters

• Only one output neuron from a number of input neurons

Demo of working modules» Home Page

» Handwritten Character Recognition Page

» Input image recognition page

» Text editor

» Help page

CONCLUSION» The system has an overall Recognition Percentage of 85% as compared

to the efficiency rate of 72% of the previously proposed Modi Character recognition system using Otsu Binarization and Kohonen Neural Networks.

» This improvement in the efficiency is due to the additional use of the Chain Code algorithm.

» The system finds huge applications for historians, farmers, research enthusiasts and common man alike.

» While recognition of handwritten characters is an important task, it however is not the final stage in linguistic research. Transliteration of the recognized Modi characters into the common and easily readable Devanagari script is the next logical step. We have begun work on the same using SAX parser and xml and the future is surely very bright in this field.

REFERENCES» Sidra Anam, Saurabh Gupta, “An Approach for Recognizing Modi Lipi using Ostu’s Binarization Algorithm and

Kohenen Neural Network”, International Journal of Computer Applications (0975 – 8887) ,Volume 111 – No 2, February

» Gupta, A., Srivastava, M. , Mahanta, C. , “Offline handwritten character recognition using neural network” , Computer Applications and Industrial Electronics (ICCAIE), 2011 IEEE International Conference on Date of Conference: 4-7 Dec. 2011, Print ISBN: 978-1-4577-2058-1

» Prof. Mrs. Snehal R. Rathi, Rohini H.Jadhav, Rushikesh A. Ambildhok,"Recognition and Conversion of Handwritten Modi Characters “

» International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 128-131)

» D.N.Besekar, Dr. R.J.Ramteke, "A Chain Code Approach for Recognizing Modi Script Numerals”, Research Paper,

ISSN – 2249-555X

» Amritha Sampath, C. Tripti, V. Govindaru, “Online Handwritten Character Recognition for Malayalam”, CCSEIT '12 Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, Pages 661-664, ACM New York, NY, USA ©2012 , ISBN: 978-1-4503-1310-0

» Bindu S. Moni, G. Raju, “Handwritten Character Recognition System using a Simple Feature”, ICACCI '12 Proceedings of the International Conference on Advances in Computing, Communications and Informatics, Pages 728-734, ACM New York, NY, USA ©2012, ISBN: 978-1-4503-1196-0

» Cinthia O. de A. Freitas, Luiz S. Oliveira, Simone B. K. Aires, Flávio Bortolozzi, “Zoning and Metaclasses for Character Recognition”, SAC '07 Proceedings of the 2007 ACM symposium on Applied computing, Pages 632-636, ACM New York, NY, USA ©2007, ISBN:1-59593-480-4

» Samit Kumar Pradhan, Atul Negi, “A syntactic PR approach to Telugu handwritten character recognition”,

DAR '12 Proceeding of the workshop on Document Analysis and Recognition, Pages 147-153, ACM New York, NY, USA ©2012, ISBN: 978-1-4503-1797-9

» Dayashankar Singh, Maitrayee Dutta, Sarvpal H. Singh, “Neural network based handwritten hindi

character recognition system”, COMPUTE '09 Proceedings of the 2nd Bangalore Annual Compute Conference, Article No. 15, ACM New York, NY, USA ©2009, ISBN: 978-1-60558-476-8

» Manisha S. Deshmukh, Manoj P. Patil, Satish R. Kolhe, “Off-line Handwritten Modi Numerals Recognition using Chain Code”, WCI '15 Proceedings of the Third International Symposium on Women in Computing and Informatics, Pages 388-393, ACM New York, NY, USA ©2015,ISBN: 978-1-4503-3361-0

» The Times of India, Pune Edition, “Band of researchers, enthusiasts strive to keep Modi script alive”, TNN | Feb 21,2014, 05.48 AM IST, “timesofindia.indiatimes.com/city/pune/Band-of-researchers-enthusiasts-strive-to-keep-Modi-script-alive/articleshow/30761335.cms”, Accessed 8 March 2015

» Sakal News Paper(9th July 2014) , Accessed 8 March 2014

» Lulu C. Munggaran, SuryariniWidodo, Cipta A.M and Nuryuliani, “Handwritten Pattern Recognition Using Kohonen Neural Network Based on Pixel Character”, (IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 5, No. 11, 2014.

A MODI Document

Thank You !!!

Data & Analytics

Modi script character recognition