View
78
Download
1
Category
Tags:
Preview:
Citation preview
Statistical Machine Translation (SMT) for Indian Language
Presented By:
Nakul Sharma, Parteek Bhatia.
Thapar University, Patiala.
Main Agenda
• Introduction to SMT.• Tools.• Popular Machine Translation Systems.• Machine Translation Projects in India.• Machine Translation Tools and Punjabi
Language.• Conclusion and future work.• References.
Introduction
• Part of Corpus based Machine Translation.• System consists of 3 components:– Language Model (LM).– Translation Model (TM).– Decoder.
System Architecture
T s
S T
Language ModelP(T)
Translation Model P(S|T)
Decoder
Language Model (LM)
• Gives probability of single word given all words of the sentence.
• N-gram model.• P(s)=P(w1,w2,w3,……….,wn)
=P(w1)P(w2/w1)P(w3/w1.w2)P(w4/w1w2w3)……..
P(wn/w1w2w3w……wn-1).
Translation Model (TM)
• Computes conditional probability P (T|S).• Break the process into smaller units (words,
phrases..)• Here T:Target Language, S:Source language.• For Example, (aUH baag wYWch s/UN gaYI|
she slept in garden).
Decoder
• Search for a sentence T is performed that maximizes P(S|T) i.e.– Pr (S, T) = argmax P(T) P (S|T).
• Start with null hypothesis, i.e. sequence starts with sequence of sentences.
Main Agenda
• Introduction to SMT.• Tools for SMT.• Popular Machine Translation Systems.• Machine Translation Projects in India.• Machine Translation Tools and Punjabi
Language.• Conclusion and future work.• References.
Tools for SMT
• LM Tools– CMU Statistical Language Modeling (SLM) Toolkit– SRILM
• TM Tools– GIZA++– MGIZA
• Decoder– Moses– ISI Rewriter Decoder– Pharaoh
LM Tools
• CMU Statistical Language Modeling (SLM) Toolkit. – Set of unix software tools.– Written by Roni Rosenfeld.
• SRILM– Developed by SRI Speech Technology and research
laboratory.– Applying Language Models.
Architecture for LM
Architecture of LM.
TM Tools
• GIZA++– Implements different models like HMM.– Performs word alignment.
• MGIZA++– Multi-threaded word alignment– Memory optimization.
This is the t3 final:-First column: ids of source wordsSecond column:ids of target words.Third column: Probability of alignment words.
Decoder Tools
• Moses– Automatic training of translation models for any
language pair.– Works with SRILM and GIZA++.
• ISI Rewriter Decoder– Performs searching in development of SMT.– Works with CMU-Statistical Language Modeling
toolkit and GIZA++.
Popular Machine Translation Systems
• Google Translator.• Bing Translator.• Systran.• Hindi to Punjabi Machine Translation System.• METAL.
Main Agenda
• Introduction to SMT.• Tools.• Popular Machine Translation Systems.• Machine Translation Projects in India.• Machine Translation Tools and Punjabi
Language.• Conclusion and future work.• References.
Machine Translation Project in India
• Anglabharat and Anubharati• Anusaaraka• MaTra• Mantra• UCSG-based English-Kannada MT• UNL based MT between English, Hindi and
Marathi• Tamil-Hindi Anusaarka and English-Tamil MT• English-Hindi SMT.
Machine Translation Tools and Punjabi Language
• Punjabi University.–On-line Hindi-Punjabi & Punjabi-Hindi
Machine Translation. • Thapar University.– Punjabi language server which includes
Punjabi-UNL Encoverter and UNL-Punjabi Encoverter.
Conclusion and Future Work
•There are applications supporting regional language translation.•Future research directions in tree-tostring alignment template,clause based restructuring.•Combination of various MT techniques leading to efficient translation.
References[01]. Adam Lopez, “Statistical Machine Translation”, ACM Computing Surveys, Vol. 40, No. 3, Article 8, Aug 2008.
[02]. Durgesh Rao; ―Machine Translation in India: A Brief Survey.
[03]. Franz Josef Och., ―GIZA++: Training of statistical translation models available at: ‖ http://fjoch.com/GIZA++.html accessed on 26/03/2010.
[04]. Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010.
[05]. Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010.
[06] Gurpreet Singh Lehal, ―A Survey of the State of the Art in Punjabi Language Processing , Language in India, oct ‖2009.
[07] Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010
[08] ISI ReWrite Decoder User's Manual, Version 0.2, available at http://www.isi.edu/~germann/software/ReWrite-Decoder/isi-decoder-manual.html accessed on 12/03.2010
[09] Jamie G. Carbonell, Teruko Mitamurs, Eric H. Nyberg, ―The KANT Perspective: A Critique of Pur Transfer (and Pure Interlingua, Pure Statistic,….)
[10] Jayprasad J Hegde, Ananthakrishnan R, Kavitha M, Chandra Shekhar, Ritesh Shah, Sawani Bade, Sasikumar M, ―MaTra: A Practical Approach to Fully- Automatic Indicative English-Hindi Machine Translation.
[11] Jean Senellart, Péter Dienes, Tamás Váradi, ―New Generation Systran Translation System, MT Summit VIII, Sept 2001.
References(Cont.)[12] On line Translation System available at:
www.translate.google.com accessed on 03/04/2010.[13] Online manual of CMU Statistical Language Modeling Toolkit
available at: http://mi.eng.cam.ac.uk/~prc14/toolkit_documentation.html accessed on 15/03/2010.
[14] P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer ―The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263-311. (1993).
[15] Parteek Bhatia, Sandeep Singh, ―Punjabi Deconverter Architecture , National Seminar on Creation of Lexical Resources ‖for Indian Language Computing and Processing, CDAC Mumbai, March 26-28, 2007
Recommended