24
17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 1/24 The translation of examples, citations, definitions and glosses in the Papillon project PAPILLON-02 international seminar, NII, Tokyo, 16-18 July 2002 Christian Boitet GETA, CLIPS, IMAG, CNRS, INPG & UJF Grenoble, France [email protected]

1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

Embed Size (px)

Citation preview

Page 1: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 1/24

The translation of examples, citations, definitions and glosses

in the Papillon project

PAPILLON-02 international seminar, NII, Tokyo, 16-18 July 2002

Christian Boitet

GETA, CLIPS, IMAG, CNRS, INPG & UJF

Grenoble, France

[email protected]

Page 2: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 2/24

Outline• The problem:

given the “pivot” architecture of monolingual dictionaries

translate all “free language elements” into all languages

& store the results, respecting the overall structure

• Proposed solutions:Storing: use auxiliary lexies and axies

Translating-1: shared tools for human translation

Translating-2: partial MT using UNL

• Perspectives

Page 3: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 3/24

French Dictionary Interlingual Dictionary Japanese Dictionary

Vocable Carte n.f.

Lexie carte à jouer

Lexie carte géographique

地図

カード

Acception 343

UNL: card(icl>play)

Acception 345

UNL: map(fld>geography)

Internal architecture of the database

Architecture derived from Gilles Sérasset’s Ph.D. Thesis

Page 4: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

4/24Translation in Papillon (Ch. Boitet)17/7/2002 (Papillon-02)

Interlingual links motivated by translations = "AXIEs"

Possibilitity to link 1 lexie to >1 acception

Links to other representations: AXIE—1——n—>UW

PAPILLON scenario & diagramFrench DiCo

Vocable carte n.f.

lexie carte.1 carte à jouer

lexie carte.2 carte géographique

Japanese DiCo

地図

カードAcception 343

UNL: card(icl>play),card(icl>thing)…

Acception 345

UNL: map(fld>geography)

Interlingual links

Acception 1002

UNL: card(fld>money)

a

Thai DiCo

English DiCo

Vocable card N

lexie card.1 playing card

lexie card.2 money card

Vocable=lexie map

Page 5: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 5/24

A monolingual DiCo entry (again)

1. Name of the lexical unit: MEURTRE2. Grammatical properties: nom, masc3. Semantic Formula: action de tuer: ~ PAR L'individu X DE

L'individu Y4. Government pattern: X = I = de N, A-poss Y = II = de N, A-poss5. (Quasi-)synonyms: {QSyn} assassinat, homicide#1; crime6. Semantic derivations & collocations:

{V0} tuer {A0} meurtrier-adj / *Nom pour X*/ {S1} auteur [de ART Ø] //meurtrier-n /*Nom pour Y*/ {S2} victime [de ART Ø] /*Très choquant*/

7. Examples: La mésentente pourrait être le mobile du meurtre.8. Full Idioms:

appel au meurtre crier au meurtre

Structure derived from Alain Polguère’s work on DiCo

Page 6: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 6/24

Fixed and free language elements

• Fixed

Stereotyped definition in semantic formula: action de tuer: Logical argument frame: ~ PAR L'individu X DE L'individu YGrammatical properties: nom, masc

• Free

Examples: La mésentente pourrait être le mobile du meurtre.

Citations (e.g. for SPIRIT): the spirit is strong, but the flesh is weak (Bible, ref.XXX)

Free definitions in semantic formula (e.g. for a disease noun such as LEUCOCYTE): sort of cell contained in the blood and attacking infectious agents

Glosses (sometimes = quasi-synonyms): character (mood)

Page 7: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 7/24

The problem (1)

• Necessity to translate all free language elementsThe translation in L2 of an example for X(L1)

is not in general a good example for the translation of Y in L2

• Il utilise souvent des cartes IGN*He often uses IGN roadmaps/maps

He often uses AA maps IGN = Institut Géographique NationalAA = Automobile Association

Hence, the size of the problem is quadratic!

Page 8: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 8/24

The problem (2)

• Where to store these translations?

• Not in the lexies, which must remain monolingual• Not in the axies, which must remain pure links

Page 9: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 9/24

Solution for the storing problem

• Use auxiliary lexies and axiesterminology: x-lexie, x-axie

x {def, cit, ex, glo}

• Each free language element becomes an x-lexiecit-lexies and ex-lexies are simpler than normal lexies

• X-lexies are linked through x-axiesAn x-axie contains lists of x-lexies

and, in case of an external reference to UNL

a UNL graph (if x ≠ glo), or a UW (glo-axie)

Page 10: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 10/24

Multilingual links = AXIES• Normal axies

for each language L, 0:n links to lexies of Lfor each semantic system S available,

0:n links to entities of S UNL UWs, WordNet synsets, NTT SemCat, Ontos concepts, LexiQuest Lex-concepts…

• Auxiliary axies for examples, citations…for each language L, 0:n links to lexies of L if UNL-annotated, 1 UNL graph

Page 11: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 11/24

A « Montaigne » environmemt for Human Translation

• Idea: let users SHARE translation memory & tools on a serverSpecs in 1995 around Eurolang Optimizer™

no funding although « Francophony » interested…

Internet version: see www.yakushite.net (OKI)

• First version built for Laosee www.laosoftware.com (V. Berment)

• Future: bilingual editor as appletuse of Papillon server architecture (private spaces etc.)

Page 12: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 12/24

Scenario & possible GUI

… …

source segment N-2

translated segment (done)

source segment N-1

translated segment (done)

suggestion(s) from the TM

source segment Ntranslated segment

(currently being created)

source segment N+1

source segment N+2

dictionary suggestions

Typical layout of a bilingual editor in a TSS

Page 13: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 13/24

Design & implementation issues

• Peer-to-peer architecturePapillonMontaigne

• Possibility to modify input text & segmentation

• Integrate with private lexicon(s)

• Open to plug-ins (voice input…)

Page 14: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 14/24

Automating translation using UNL

• UNL = a project

a language to represent NL utterance meanings

a format for multilingual documents (htmlxml)

• Elements of the UNL languageUWs: headword(restrictions) book(icl>do)

Attributes: @future, @past, @complete…, @entry

Relations: agt, aoj, mod, obj, tim…

(Hyper)graphs: subgraph is connex & has an entry node

Page 15: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 15/24

pos pos

objagt

goal(icl>abstract thing, fld>sport)

met

head(pof>body).@def

pltcorner(icl>body)

modleft(aoj<thing)

Ronaldo(icl>human)

score(icl>event,agt>human,fld>sport).@entry.@past.@complete

goal(icl>concrete thing, fld>sport){fr}Ronaldo marqua un but avec la tête dans le coin gauche des buts.{/fr}{en}Ronaldo scored a goal with his head into the left corner of the goal. {/en}

A simpleUNL input graph

Page 16: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 16/24

Possible interactive disambiguation at analysis time

Le capitaine a rapporté un vase de chine.

de Chine, le capitaine a rapporté un vase.

Le capitaine a rapporte (un vase de chine).

Page 17: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 17/24

Interactive disambiguation (2)

capitaine

Officier qui commande une compagnie d'infanterie, un

escadron de cavalerie, une batterie d'artillerie

Officier qui commande un navire de commerce

Chef d'une équipe sportive

- gives a correct unique multilevel concrete (UMC) tree- then a correct unique multilevel abstract (UMA) tree- and finally a correct UNL graph

Page 18: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 18/24

Possible text-graph « coedition » at reading time

• applicable if there is a UNL graph associated with a segment one wants to modify

• goal : share the revisions across languages, by reflecting them on the UNL graph

• Ex: FB2204 (Forum Barcelona 2004) « Une cité retrouvera une zone côtière après un

forum »

• add ".@def" on the nodes for "city", ”forum”• transform “forum” into “Forum”• replace "retrieve" by "recover" • add ".@complete" on the node containing it.

« La cité récupérera une zone côtière après le Forum »

Page 19: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 19/24

Page 20: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 20/24

Page 21: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 21/24

Page 22: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 22/24

Principles of coedition (1)

• It is impossible in principle to deduce the modification on the graph from a modification on the text For example, replacing "un" ("a") by "le" ("the") does not entail that the following noun is determined (.@def),

because it can also be generic• "il aime la montagne" = "he likes mountains"

• Revision is not done by modifying directly the text, but by using a menu system

• Menu items have a "language side" and a hidden "UNL side"

Page 23: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 23/24

Principles of coedition (2)

• when a menu item is chosen,only the graph is transformed, the action to be done on the text is delayed and shown

• at any time, the new graph may be deconverted • If is is satisfactory, that shows that errors were due

to the graph and not to the deconverter, and the graph may be sent to deconverters in other languages.

• Versions in some other languages known by the user may be displayed, so that improvement sharing is visible and encouraging.

Page 24: 1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project

17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) 24/24

Conclusion

• Need for a translation task (#7) in Papillon• Seamless integration with x-lexies + x-axies• Possible combination of TA & MT• Mutualization spirit (Papillon, Montaigne) for TA• Use of UNL (2 « pivot » architectures) for MT• Mutualization again in MT part (humans involved)

Interactive disambiguationCoedition textUNL graph… & of course lexical data contribution through Papillon!