Upload
judith-stevens
View
216
Download
0
Embed Size (px)
Citation preview
Dr. Hemant DarbariProgramme Co-ordinator
Applied Artificial Intelligence Group, & ACTS Advanced Computing Training School
C-DAC, Pune
TAG Based ParsingTAG Based Parsing
for for
Machine Translation - Machine Translation - English to Indian LanguageEnglish to Indian Language
WELCOME
OutlineOutline MANTRA: IntroductionMANTRA: Introduction
Parsing Process in TAG: An OverviewParsing Process in TAG: An Overview
Workflow of TAG ParserWorkflow of TAG Parser
Generation Process in MANTRAGeneration Process in MANTRA
Generation Process in MANTRA for Multlingual TranslationGeneration Process in MANTRA for Multlingual Translation
Sample Outputs of MANTRASample Outputs of MANTRA
Samples of Constructions Solved through TAG Samples of Constructions Solved through TAG
Issues Regarding Structural Differences and Translation AccuracyIssues Regarding Structural Differences and Translation Accuracy
System specifications System specifications
MANTRA: AchievementsMANTRA: Achievements
MANTRA: IntroductionMANTRA: Introduction
MANTRAMANTRA
MANTRAMANTRA is an acronym of is an acronym of
MAMAchichiNNe assisted e assisted TRATRAnslation tool.nslation tool.
A Tree Adjoining Grammar (TAG) based Machine Translation System of A Tree Adjoining Grammar (TAG) based Machine Translation System of
Applied AI Group of C-DAC, PuneApplied AI Group of C-DAC, Pune
MANTRA translates English documents into Hindi and other Indian MANTRA translates English documents into Hindi and other Indian
Languages, such as Oriya <O>, Tamil <T>, Urdu <U>, Marathi <M> & Languages, such as Oriya <O>, Tamil <T>, Urdu <U>, Marathi <M> &
Bangla <B>Bangla <B>
MANTRA covers the following domains: MANTRA covers the following domains: Administration, Finance, Administration, Finance, Agriculture, Small Scale Industries, Information Technology and Agriculture, Small Scale Industries, Information Technology and Healthcare, Tourism and Proceedings and documents of Rajya SabhaHealthcare, Tourism and Proceedings and documents of Rajya Sabha
Parsing Process in TAG -Parsing Process in TAG -
An OverviewAn Overview
TAG Stands for Tree Adjoining Grammars
• The formalism of this grammar is based on investigation and research of Arvind Joshi (1987)
• Tree is the basic building blocks of this formalism
• In contrast to other formalism, where dependencies are defined between elements of rule (node), in TAG dependencies are defined between different trees .
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
A TAG is defined as a 5-tuple grammar
G = (N, T,S,I,A) where
• N is a finite set of non-terminal symbols
• T is a finite set of terminals
• S is a distinguished non-terminal,
• I is a finite set of trees called initial trees and
• A is a finite set of trees called auxiliary trees
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
• This is an LR Parser
• Combines both top-down and bottom-up operations that’s why it is called hybrid parser.
• And Supports Multiple Parallel Parses
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
A state S is defined as a 10-tuple,
S=[a, dot, side, pos, l, ft, fr, star, t~, b~]
where:
• a: is the current tree being parsed.
• dot: current position of the dot in the tree a.
• side: is the side of the symbol the dot is on
side E {left, right}.
• pos: is the position of the dot
pos E {above, below}.• l:latest index in the input lexical array
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
Contd…
A state S is defined as a 10-tuple,S=[a, dot, side, pos, l, ft, fr, star, t~, b~]
where:• star: is the position of most recently adjuncted node• foot_l: index of input lexical array that is found before foot
node• foot_r: index of input lexical array that is found after foot node• tl* : index of input lexical array corresponding to point of
adjunction as star• bl* : index of input lexical array that is found just before the
foot node at star
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
There are two fundamental trees in TAG
• Initial Tree• Auxiliary Tree
Sentences can be represented using a derived tree, constructed from initial and auxiliary tree through Adjunctions and/or Substitutions operation
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
Initial tree
• Initial trees represent basic syntactic relation in a sentence
• Every initial tree at the interior node is labeled with a non-terminal symbol
• Every Frontier node is either labeled with terminal symbols or non-terminal symbols which are marked with substitution marked ‘ ‘
• A derivation start with an initial tree combining via either substitution or adjunction
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
VP
INITIAL TREES
S_rD_r
V
NP
NP_r
the
left
boy
ND
Non-terminal Nodes
Terminal nodesFrontier nodes
[ α 1]
[ α 3]
[ α 2]
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
Initial treeExample for Initial Tree: Ram has arrived
NP
Ram
N
S
NP0 VP
V
VP
VP* (NA)V
has arrived
-> nodes marked as ( ) are substitution mark to indicate initial tree
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
Substitution
• Substitution is simple attachment operation
• Substitution replaces a frontier node with another tree whose top node has the same label
• After substitution the result is a derived tree
• Only initial or derived tree can be substituted in another tree
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
NP
pretty
NP*
[β1][β3]
[β2]
adj*adv
adj
adj
-> nodes marked as ( * ) are foot nodes.
Auxiliary TreesAuxiliary Trees
VP
VP * adv
today
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
Substitution operation
D_r
the
NP_r
boy
ND
[ α ]
[Derived tree]
[ α 1]
NP_r
boy
N
the
D
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
Adjunction • Adjunction is an insertion operation .
• Adjunction inserts an auxiliary tree into another tree
• The foot node label of auxiliary tree must match the label of node at which it adjoins.
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
NP_r
boy
N
the
D
N*
N
good
adj
Derived tree from substitution operation now become initial tree for adjunction.
[ α ][β3]
Adjunction Operation b/w Initial & Auxiliary TreeSub Tree
Β3 is inserted here below this node
Sub tree is substituted here
Foot node
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
NP_r
the
ND
Nadj
good boy
Derived tree after Adjunction
Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)
WORK FLOW DIAGRAM
OF
TAG PARSER
Dot Traversal Of Tag Parser
A
B CC D
E F GH I
start end
Predict Operation
Scanner operation :
Complete Operation :
: S_r NA :
: a : : d :
: S : NA
: b : : c :: S* NA :
: S NA :
: a : : d :: S :
: b : : c :
: e :
Derived Tree
• The Earley-type Recognizer for TAGs follows:
The following seven operations on each state
• s = [c~, dot, side,pos, l, f,, fr, star, t~, b~]
1.Scanner
2. Move dot down
3. Move dot up
4. Left Predictor
5. Left Completor
6. Right Predictor
7. Right Completor
Generation Process in
MANTRA
Generation Process in MANTRA
STEP1:
TAG Generator selects a sentence initial tree from corresponding target language.
STEP2:
TAG generator performs the synthesis as per the target language structure (sentence order)
STEP3:
TAG generator performs the following operation such as- substitution,
adjunction , node anchoring, and node embedding.
Generation Process in MANTRAfor
Multilingual Translation
Generator inputMultilingual translation through TAG based parsing and
generation in MANTRA
Jaipur is the pink city of India
GENERATOR O/PEnglish - Hindi
Generator O/P
English - Oriya
English Urdu
Generator o/p
English MarathiGenerator o/p
English TamilGenerator o/p
Sample Outputs of MANTRA
Sample Outputs For English - Hindi
Sample Outputs For English - Marathi
Sample Outputs For English - Oriya
Sample Outputs For English - Urdu
Sample Outputs For English - Bangla
Sample Outputs For English - Tamil
Samples of Constructions Solved through
TAG
Samples of Constructions Solved through
TAG
Passive constructions: The deputation of officers to the post will be governed by the OM referred to above.
Stative Constructions: The leave sanctioned to Shri Bhat stands cancelled.
Transposing and reframing of clause order and phrase order:
Officers possessing experience of the post are hereby promoted......
®ÖÛú¤üß Ûêú †®Öã³Ö¾Ö ¸ÜÖ®Öê¾ÖÖ»Öê †f¬ÖÛúÖ¸üß ...(relative
clause formation)
†f¬ÖÛúÖ¸üß •ÖÖê ®ÖÛú¤üß Ûúê †®Öã³Ö¾Ö ¸üÜÖŸÖê Æïü..(shifting of clause or
phrase order)
Changing of verb class: transitive verb to linking verbs
The post carries a special pay. (transitive) ‡ÃÖ ¯Ö¤ü Ûêúúú ÃÖÖ£Ö ¾Ö¿ÖêÂÖ ¾ÖêŸÖ®Ö ÛúÖ
¯ÖÏÖ¾Ö¬ÖÖ®Ö Æîü … (linking verb)He will be designated as Secretary (Finance).
ˆ®ÖÛúÖ ¯Ö¤ü®ÖÖ´Ö ÃÖf“Ö¾Ö (f¾Ö¢Ö) Ûêú ¹ý¯Ö ´Öë ÆüÖêÝÖÖ … (linking verb)
¾Öê ÃÖf“Ö¾Ö (f¾Ö¢Ö) ¯Ö¤ü®ÖÖf ´ÖŸÖ ÆüÖëÝÖê …
(transitive)
Hanging frozen expressionsOrders have been issued vide Office Memorandum No dol/08/1a to all the rajbhasha
officials
ÃÖ³Öß ¸üÖ•Ö³ÖÖÂÖÖ †f¬ÖÛúÖf¸üµÖÖë ÛúÖê †Ö¤êü¿Ö •ÖÖ¸üß Ûú¸ü f¤ü‹ ÝÖµÖê Æïü ¤êüfÜÖ‹ ÛúÖµÖÖÔ»ÖµÖ –ÖÖ¯Ö®Ö ÃÖÓܵÖÖ ¸Ö. f¾Ö/08/Ûú
Issues regarding
Structural Differences
&
Translation Accuracy
Plural Adjective required Singular Nouns:
The adjective like all, both etc takes singular noun form in sentence rather than the plural.
Ex: Rajasthan State Transport Corporation (RSTC) has bus services to all the major destinations of north India..
Relative pronoun sentence has syntax variation output
Ex: Bikaner is also one major hub for the tourists looking for an adventurous Camel ride, which gives an insight into the exquisite lifestyle of remote Rajasthan.
English to Oriya
Honorific Problem:
It is not possible to provide honorific mark at the contextual behavior.
Ex: The majestic Ashoka pillar records visit of emperor Ashoka to Sarnath.
English to Oriya
Accuracy in Translation from English to Oriya is 50%Accuracy in Translation from English to Oriya is 50%
Postposition not joined to the root
Jaipur , popularly-known-as the Pink-City , is the capital of Rajasthan-state , India
Position of clause
Kaziranga National Park is best known for the one-horned Rhinoceros.
English to Marathi
Accuracy in Translation from English to Marathi is 30%Accuracy in Translation from English to Marathi is 30%
English to Urdu
Urdu is a inflectional or isolating language like Hindi. Basically, the variations in the lexical choices are major features in Urdu.
Problem identified in syntactic level
Arrangement of clausesActivisation of the passive sentence
Accuracy in Translation from English to Urdu is 40%Accuracy in Translation from English to Urdu is 40%
System Specification in MANTRASystem Specification in MANTRA
Available Platforms
Technology
Web Based Solution
(Internet)
Java, EJB
Enterprise Solution
(Intranet)
VC++
Desktop solutions
(Standalone)
VC++
Desktop solutionsDesktop solutions
StandaloneStandalone
SQL versions
(Normal, Encrypted)
My SQL versions
(Normal, Encrypted)
Access version
(Normal)
SQL Express version
(Normal)
MSDE version
(Normal)
MANTRA: AchievementsMANTRA: Achievements
MANTRA Technology MANTRA Technology is a recipient is a recipient
of the Computer world Smithsonian of the Computer world Smithsonian
Award and is a part of theAward and is a part of the
“1999 Innovation Collection” “1999 Innovation Collection” in the in the
National Museum for American National Museum for American
History.History.
MANTRA: Achievements
Launched on 14th Sept 2007 by Honorable Minister of Home Affairs, GOI
MANTRA: Achievements
Papers to be Laid on the Table [PLOT]
List Of Business [LOB]
Parliamentary Bulletin Part-I
MANTRA: Achievements
Launched on 29th August 2007 by Honorable Vice-President of India
Thank You!Thank You!