Upload
moris-armstrong
View
238
Download
0
Embed Size (px)
Citation preview
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 1/24
The translation of examples, citations, definitions and glosses
in the Papillon project
PAPILLON-02 international seminar, NII, Tokyo, 16-18 July 2002
Christian Boitet
GETA, CLIPS, IMAG, CNRS, INPG & UJF
Grenoble, France
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 2/24
Outline• The problem:
given the “pivot” architecture of monolingual dictionaries
translate all “free language elements” into all languages
& store the results, respecting the overall structure
• Proposed solutions:Storing: use auxiliary lexies and axies
Translating-1: shared tools for human translation
Translating-2: partial MT using UNL
• Perspectives
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 3/24
French Dictionary Interlingual Dictionary Japanese Dictionary
Vocable Carte n.f.
Lexie carte à jouer
Lexie carte géographique
地図
カード
Acception 343
UNL: card(icl>play)
Acception 345
UNL: map(fld>geography)
Internal architecture of the database
Architecture derived from Gilles Sérasset’s Ph.D. Thesis
4/24Translation in Papillon (Ch. Boitet)17/7/2002 (Papillon-02)
Interlingual links motivated by translations = "AXIEs"
Possibilitity to link 1 lexie to >1 acception
Links to other representations: AXIE—1——n—>UW
PAPILLON scenario & diagramFrench DiCo
Vocable carte n.f.
lexie carte.1 carte à jouer
lexie carte.2 carte géographique
Japanese DiCo
地図
カードAcception 343
UNL: card(icl>play),card(icl>thing)…
Acception 345
UNL: map(fld>geography)
Interlingual links
Acception 1002
UNL: card(fld>money)
a
Thai DiCo
English DiCo
Vocable card N
lexie card.1 playing card
lexie card.2 money card
Vocable=lexie map
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 5/24
A monolingual DiCo entry (again)
1. Name of the lexical unit: MEURTRE2. Grammatical properties: nom, masc3. Semantic Formula: action de tuer: ~ PAR L'individu X DE
L'individu Y4. Government pattern: X = I = de N, A-poss Y = II = de N, A-poss5. (Quasi-)synonyms: {QSyn} assassinat, homicide#1; crime6. Semantic derivations & collocations:
{V0} tuer {A0} meurtrier-adj / *Nom pour X*/ {S1} auteur [de ART Ø] //meurtrier-n /*Nom pour Y*/ {S2} victime [de ART Ø] /*Très choquant*/
7. Examples: La mésentente pourrait être le mobile du meurtre.8. Full Idioms:
appel au meurtre crier au meurtre
Structure derived from Alain Polguère’s work on DiCo
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 6/24
Fixed and free language elements
• Fixed
Stereotyped definition in semantic formula: action de tuer: Logical argument frame: ~ PAR L'individu X DE L'individu YGrammatical properties: nom, masc
• Free
Examples: La mésentente pourrait être le mobile du meurtre.
Citations (e.g. for SPIRIT): the spirit is strong, but the flesh is weak (Bible, ref.XXX)
Free definitions in semantic formula (e.g. for a disease noun such as LEUCOCYTE): sort of cell contained in the blood and attacking infectious agents
Glosses (sometimes = quasi-synonyms): character (mood)
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 7/24
The problem (1)
• Necessity to translate all free language elementsThe translation in L2 of an example for X(L1)
is not in general a good example for the translation of Y in L2
• Il utilise souvent des cartes IGN*He often uses IGN roadmaps/maps
He often uses AA maps IGN = Institut Géographique NationalAA = Automobile Association
Hence, the size of the problem is quadratic!
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 8/24
The problem (2)
• Where to store these translations?
• Not in the lexies, which must remain monolingual• Not in the axies, which must remain pure links
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 9/24
Solution for the storing problem
• Use auxiliary lexies and axiesterminology: x-lexie, x-axie
x {def, cit, ex, glo}
• Each free language element becomes an x-lexiecit-lexies and ex-lexies are simpler than normal lexies
• X-lexies are linked through x-axiesAn x-axie contains lists of x-lexies
and, in case of an external reference to UNL
a UNL graph (if x ≠ glo), or a UW (glo-axie)
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 10/24
Multilingual links = AXIES• Normal axies
for each language L, 0:n links to lexies of Lfor each semantic system S available,
0:n links to entities of S UNL UWs, WordNet synsets, NTT SemCat, Ontos concepts, LexiQuest Lex-concepts…
• Auxiliary axies for examples, citations…for each language L, 0:n links to lexies of L if UNL-annotated, 1 UNL graph
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 11/24
A « Montaigne » environmemt for Human Translation
• Idea: let users SHARE translation memory & tools on a serverSpecs in 1995 around Eurolang Optimizer™
no funding although « Francophony » interested…
Internet version: see www.yakushite.net (OKI)
• First version built for Laosee www.laosoftware.com (V. Berment)
• Future: bilingual editor as appletuse of Papillon server architecture (private spaces etc.)
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 12/24
Scenario & possible GUI
… …
source segment N-2
translated segment (done)
source segment N-1
translated segment (done)
suggestion(s) from the TM
source segment Ntranslated segment
(currently being created)
source segment N+1
source segment N+2
dictionary suggestions
Typical layout of a bilingual editor in a TSS
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 13/24
Design & implementation issues
• Peer-to-peer architecturePapillonMontaigne
• Possibility to modify input text & segmentation
• Integrate with private lexicon(s)
• Open to plug-ins (voice input…)
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 14/24
Automating translation using UNL
• UNL = a project
a language to represent NL utterance meanings
a format for multilingual documents (htmlxml)
• Elements of the UNL languageUWs: headword(restrictions) book(icl>do)
Attributes: @future, @past, @complete…, @entry
Relations: agt, aoj, mod, obj, tim…
(Hyper)graphs: subgraph is connex & has an entry node
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 15/24
pos pos
objagt
goal(icl>abstract thing, fld>sport)
met
head(pof>body).@def
pltcorner(icl>body)
modleft(aoj<thing)
Ronaldo(icl>human)
score(icl>event,agt>human,fld>sport).@entry.@past.@complete
goal(icl>concrete thing, fld>sport){fr}Ronaldo marqua un but avec la tête dans le coin gauche des buts.{/fr}{en}Ronaldo scored a goal with his head into the left corner of the goal. {/en}
A simpleUNL input graph
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 16/24
Possible interactive disambiguation at analysis time
Le capitaine a rapporté un vase de chine.
de Chine, le capitaine a rapporté un vase.
Le capitaine a rapporte (un vase de chine).
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 17/24
Interactive disambiguation (2)
capitaine
Officier qui commande une compagnie d'infanterie, un
escadron de cavalerie, une batterie d'artillerie
Officier qui commande un navire de commerce
Chef d'une équipe sportive
- gives a correct unique multilevel concrete (UMC) tree- then a correct unique multilevel abstract (UMA) tree- and finally a correct UNL graph
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 18/24
Possible text-graph « coedition » at reading time
• applicable if there is a UNL graph associated with a segment one wants to modify
• goal : share the revisions across languages, by reflecting them on the UNL graph
• Ex: FB2204 (Forum Barcelona 2004) « Une cité retrouvera une zone côtière après un
forum »
• add ".@def" on the nodes for "city", ”forum”• transform “forum” into “Forum”• replace "retrieve" by "recover" • add ".@complete" on the node containing it.
« La cité récupérera une zone côtière après le Forum »
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 19/24
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 20/24
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 21/24
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 22/24
Principles of coedition (1)
• It is impossible in principle to deduce the modification on the graph from a modification on the text For example, replacing "un" ("a") by "le" ("the") does not entail that the following noun is determined (.@def),
because it can also be generic• "il aime la montagne" = "he likes mountains"
• Revision is not done by modifying directly the text, but by using a menu system
• Menu items have a "language side" and a hidden "UNL side"
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 23/24
Principles of coedition (2)
• when a menu item is chosen,only the graph is transformed, the action to be done on the text is delayed and shown
• at any time, the new graph may be deconverted • If is is satisfactory, that shows that errors were due
to the graph and not to the deconverter, and the graph may be sent to deconverters in other languages.
• Versions in some other languages known by the user may be displayed, so that improvement sharing is visible and encouraging.
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 24/24
Conclusion
• Need for a translation task (#7) in Papillon• Seamless integration with x-lexies + x-axies• Possible combination of TA & MT• Mutualization spirit (Papillon, Montaigne) for TA• Use of UNL (2 « pivot » architectures) for MT• Mutualization again in MT part (humans involved)
Interactive disambiguationCoedition textUNL graph… & of course lexical data contribution through Papillon!