Upload
neha-kulkarni
View
95
Download
1
Embed Size (px)
Citation preview
“MODI SCRIPT CHARACTER RECOGNITION”
BY-Neha Kulkarni
PICT, Pune
CAN YOU READ THIS ????
TABLE OF CONTENTS Introduction Aim Motivation Objectives Challenges Related work done Benefits Architecture UML diagrams Implementation details Demo of working modules Conclusion References
MODI SCRIPT
INTRODUCTION• Modi is an ancient script• Crores of Modi documents • Origin : 12th century and used uptil the
20th century• No machine transliterator available• Documents wilting• Recent OCR techniques being used for
revival
CHARACTER SET OF MODI SCRIPT
STYLES OF MODI SCRIPT
BAHAMANIKALIN
CHITNISI
PESHVEKALIN
ANGLAKALIN
AIM» The aim of this project is to recognize individual
Modi characters from Modi document.
MOTIVATION» MODI is an ancient script (13th century to 1950).» An article is Sakal newspaper dated 9th July
2014, was the driving force behind this project.» Due to immense importance of historical
research there is a need to transliterate Modi Script documents into Devanagari script.
» Manual transliteration is extremely time consuming and costly (approx Rs. 2500/- per page).
OBJECTIVES» Study of existing systems» Study of Modi Script» Taking sample inputs of Modi Documents from
various people and experts» Processing these inputs with the help of image
processing algorithms and recognising using Neural Networks
CHALLENGES» Negligible research regarding Modi Script in
Information Technology» No previous knowledge of Modi script» Handwriting differs from person to person» Modi, being a cursive script is difficult to
process with the help of algorithms» No punctuations in the script
RELATED WORK» “An Approach for Recognizing Modi Lipi using
Otsu’s Binarization Algorithm and Kohenen Neural Network”, is a proposed system in alpha stage which claims to give output with an accuracy of 70%.
» Drawback of this system is that only 22 Modi Script characters have been considered and it also proved to be less efficient in recognising similar looking characters.
» No commercially viable Modi Script Recognition System is available.
BENEFITS» The “ 7/12 cha utara “ or land records that are
mostly in Modi Script would be transliterated.» Many long standing legal disputes would be settled
due to this» Many historical secrets would be unearthed because
research work in Modi Script would become easy» New light would be thrown on the Governance,
Economy, Rule of our ancestors which would be beneficial for everyone
ARCHITECTURE
UML DIAGRAMS
» CLASS DIAGRAM
» STATE DIAGRAM
» USE-CASE DIAGRAM
» SEQUENCE DIAGRAM – FULL SYSTEM
» SEQUENCE DIAGRAM – FAILURE
» SEQUENCE DIAGRAM – HCR
» ACTIVITY DIAGRAM
IMPLEMENTATION DETAILS
INPUT
SYSTEM
GREY-SCALE
BINARIZE
CHAIN CODE FOR FEATURE EXTRACTION
KOHONEN NEURAL NETWORK
OUTPUT
PHASES OF MSCR
IMAGE ACQUISATION
GREYSCALING
OTSU THRESHOLDING
CHAIN CODE FEATURE
EXTRACTION
KOHONEN NEURAL NETWORK
RECOGNITION
PHASES OF MSCR
IMAGE ACQUISATION
GREYSCALING
OTSU THRESHOLDING
CHAIN CODE FEATURE
EXTRACTION
KOHONEN NEURAL NETWORK
RECOGNITION
IMAGE ACQUISITION PHASE
• Acquire a scanned image• Store it in a buffer• Forward it to preprocessing phase
Image acquired using scanner
PREPROCESSING PHASE
PURPOSE : • Suppress unwanted distortions• Enhance image quality• In MSCR, preprocessing includes:
Grayscale conversion Otsu’s binarization
PHASES OF MSCR
IMAGE ACQUISATION
GREYSCALING
OTSU THRESHOLDING
CHAIN CODE FEATURE
EXTRACTION
KOHONEN NEURAL NETWORK
RECOGNITION
GRAYSCALE CONVERSION
• Single intensity value for each pixel
Gray = 0.2126 * R + 0.7152 * G + 0.0722 * B
BINARIZATION• Converting grayscale image to bi-level image• Two possible value for a single bit – 0 or 1• Performance of MSCR depends on accuracy of
this process• Purpose : extract text from image, remove
noise and reduce size of image
PHASES OF MSCR
IMAGE ACQUISATION
GREYSCALING
OTSU THRESHOLDING
CHAIN CODE FEATURE
EXTRACTION
KOHONEN NEURAL NETWORK
RECOGNITION
OTSU’S THRESHOLDING
• Converting a grayscale image to monochrome
• Algorithm : o Iterate through all possible threshold
valueso Calculate measure of spread (variance)
for the pixel levels o Find threshold value where sum of
background and foreground spread is minimum
o Calculate within class varianceo Select final threshold value depending
on minimum variance
Histogram for 6 level gray image
Result of Otsu’s method
PHASES OF MSCR
IMAGE ACQUISATION
GREYSCALING
OTSU THRESHOLDING
CHAIN CODE FEATURE
EXTRACTION
KOHONEN NEURAL NETWORK
RECOGNITION
CHAIN CODE ALGORITHM FOR FEATURE EXTRACTION
This representation is based on 4-connectivity or 8- connectivity of the segments.
In a clockwise direction and assigning a direction to the segments connecting every pair of pixels.
PHASES OF MSCR
IMAGE ACQUISATION
GREYSCALING
OTSU THRESHOLDING
CHAIN CODE FEATURE
EXTRACTION
KOHONEN NEURAL NETWORK
RECOGNITION
RECOGNITION PHASE• Process of matching segmented characters
with data set used to train the network• When character image matches with the
data set successful recognition• Recognition is done by using Kohonen neural
network trained from actual drawn letters to recognise Modi characters from input characters
• Only one output neuron from a number of input neurons
Demo of working modules» Home Page
» Handwritten Character Recognition Page
» Input image recognition page
» Text editor
» Help page
CONCLUSION» The system has an overall Recognition Percentage of 85% as compared
to the efficiency rate of 72% of the previously proposed Modi Character recognition system using Otsu Binarization and Kohonen Neural Networks.
» This improvement in the efficiency is due to the additional use of the Chain Code algorithm.
» The system finds huge applications for historians, farmers, research enthusiasts and common man alike.
» While recognition of handwritten characters is an important task, it however is not the final stage in linguistic research. Transliteration of the recognized Modi characters into the common and easily readable Devanagari script is the next logical step. We have begun work on the same using SAX parser and xml and the future is surely very bright in this field.
REFERENCES» Sidra Anam, Saurabh Gupta, “An Approach for Recognizing Modi Lipi using Ostu’s Binarization Algorithm and
Kohenen Neural Network”, International Journal of Computer Applications (0975 – 8887) ,Volume 111 – No 2, February
» Gupta, A., Srivastava, M. , Mahanta, C. , “Offline handwritten character recognition using neural network” , Computer Applications and Industrial Electronics (ICCAIE), 2011 IEEE International Conference on Date of Conference: 4-7 Dec. 2011, Print ISBN: 978-1-4577-2058-1
» Prof. Mrs. Snehal R. Rathi, Rohini H.Jadhav, Rushikesh A. Ambildhok,"Recognition and Conversion of Handwritten Modi Characters “
» International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 128-131)
» D.N.Besekar, Dr. R.J.Ramteke, "A Chain Code Approach for Recognizing Modi Script Numerals”, Research Paper,
ISSN – 2249-555X
» Amritha Sampath, C. Tripti, V. Govindaru, “Online Handwritten Character Recognition for Malayalam”, CCSEIT '12 Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, Pages 661-664, ACM New York, NY, USA ©2012 , ISBN: 978-1-4503-1310-0
» Bindu S. Moni, G. Raju, “Handwritten Character Recognition System using a Simple Feature”, ICACCI '12 Proceedings of the International Conference on Advances in Computing, Communications and Informatics, Pages 728-734, ACM New York, NY, USA ©2012, ISBN: 978-1-4503-1196-0
» Cinthia O. de A. Freitas, Luiz S. Oliveira, Simone B. K. Aires, Flávio Bortolozzi, “Zoning and Metaclasses for Character Recognition”, SAC '07 Proceedings of the 2007 ACM symposium on Applied computing, Pages 632-636, ACM New York, NY, USA ©2007, ISBN:1-59593-480-4
» Samit Kumar Pradhan, Atul Negi, “A syntactic PR approach to Telugu handwritten character recognition”,
DAR '12 Proceeding of the workshop on Document Analysis and Recognition, Pages 147-153, ACM New York, NY, USA ©2012, ISBN: 978-1-4503-1797-9
» Dayashankar Singh, Maitrayee Dutta, Sarvpal H. Singh, “Neural network based handwritten hindi
character recognition system”, COMPUTE '09 Proceedings of the 2nd Bangalore Annual Compute Conference, Article No. 15, ACM New York, NY, USA ©2009, ISBN: 978-1-60558-476-8
» Manisha S. Deshmukh, Manoj P. Patil, Satish R. Kolhe, “Off-line Handwritten Modi Numerals Recognition using Chain Code”, WCI '15 Proceedings of the Third International Symposium on Women in Computing and Informatics, Pages 388-393, ACM New York, NY, USA ©2015,ISBN: 978-1-4503-3361-0
» The Times of India, Pune Edition, “Band of researchers, enthusiasts strive to keep Modi script alive”, TNN | Feb 21,2014, 05.48 AM IST, “timesofindia.indiatimes.com/city/pune/Band-of-researchers-enthusiasts-strive-to-keep-Modi-script-alive/articleshow/30761335.cms”, Accessed 8 March 2015
» Sakal News Paper(9th July 2014) , Accessed 8 March 2014
» Lulu C. Munggaran, SuryariniWidodo, Cipta A.M and Nuryuliani, “Handwritten Pattern Recognition Using Kohonen Neural Network Based on Pixel Character”, (IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 5, No. 11, 2014.
A MODI Document
Thank You !!!