37
ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences [email protected]

ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Embed Size (px)

Citation preview

Page 1: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

ETAP-3: State of the Art, Options, and Prospects of Development

Leonid IomdinInstitute for Information Transmission

Problems

Russian Academy of Sciences

[email protected]

Page 2: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Theoretical Background

Igor Mel’čuk:

«Meaning Text» theoryJurij Apresjan:

Integrated Theory of Linguistic Description and Systemic lexicography

Page 3: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

ETAP-3 Options

Machine translation SynTagRus: the tagged corpus of

Russian Texts Generation from and to UNL (Quasi)synonymous Paraphrasing Computer-Aided Language Learning

Tool

Page 4: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Machine Translation

Russian English• 120,000-strong morphological dictionaries

• 95,000-strong combinatorial dictionaries Russian German prototype Russian French prototype Russian Korean prototype Russian Spanish prototype Arabic English prototype

Page 5: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Major Features of ETAP Environment

Rule-based Approach Stratificational Approach Syntactic Dependencies Lexicalistic Approach Self-Tuning Maximum Reusability of Linguistic

resources

Page 6: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

General Layout of Translation Process

OBJECTS STAGES DICTIONARIES

Sourcephrase

Expansion

Morphologicalanalysis

Source morphologicaldictionary

Sourcecombinatorial

dictionary

Parsing

MorphSsource

SyntSsource Normalization

NormSsource Transfer

Targetcombinatorial

dictionary

NormStarget

SyntStarget Syntactic

synthesis

Morphologicalsynthesis

Target morphologicaldictionaryTarget

phrase

MorphStarget

Page 7: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Dependency Syntactic Structure

They made a general remark that it was true.

Page 8: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Self-Tuning: Grammar vs. Dictionary

General regularities: general rules that apply to very large classes of words and occur very often.• Example: agreement Adj + N

Restricted-scope regularities: specific rules that apply to restricted classes of words and have limited occurrence. • Example: compound numerals

Page 9: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Multiple Translation

They made a general remark that …

• (a) ‘they remarked in a general way that…’

• (b) ‘they forced a general to remark that…’

Page 10: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Synonymous Paraphrasing

The director ordered John to write a report

The director gave John an order to write a report

John was ordered by the director to write a report

John received an order from the director to write a report

Page 11: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Lexical Functions

Substitute LF• synonyms, antonyms, converse terms,

derivatives

Collocate LF• MAGN = 'a high degree of what is denoted

by X’

• OPER/FUNC

• ...

Page 12: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Lexical Functions

MAGN disease grave MAGN fog heavy MAGN control strict

Page 13: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Oper / Func Family of LF

INVITATION

the minister the ambassador

1 issues

receives

2

Page 14: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Examples of LF Oper

Oper1 (invitation) = issue

Oper2 (invitation) = receive

Oper1 (defeat) = suffer

Oper2 (resistence) = encounter

Oper2 (respect) = enjoy

Page 15: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Examples of LF Func

Func1 (fear) = possess

Func2 (decision) = concern

Func1 (responsibility) = rest (with)

Func2 (vengeance) = fall (upon)

Page 16: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

General Properties of Lexical Functions

Universality Intralinguistic idiomaticity

• grave disease, heavy fog

• *heavy disease, *grave fog.

Cross-linguistic idiomaticity• Rus. тяжелая болезнь ‘heavy disease’

• Rus. густой туман ‘dense fog’

Page 17: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

General Properties of Lexical Functions (cont.)

Paraphrasing Potential:

• He respects [X] his teachers

• He has [OPER1 (S0 (X))] respect [S0 (X)] for his teachers

• He treats [LABOR12 (S0 (X))] his teachers with respect

• His teachers enjoy [OPER2 (S0 (X))] his respect

Page 18: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

LF in Practical Applications

Syntactic and Lexical Ambiguity Resolution in Parsers

Idiomatic Translation of a Large Class of Set Expressions in Machine Translation

Sentence Paraphrasing

Page 19: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Lexical Ambiguity Resolution

to draw a distinction - provodit' razlichie Both verbs are extremely ambiguous:

• draw - more than 50 meanings

• provodit’ - more than 10 meanings

Page 20: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Syntactic Ambiguity Resolution

support of the parliament • 'support by the parliament'

• 'support (given) to the parliament'

The president had [Y=OPER2(X)] the support [X] of the parliament

The fear [X] of his wife possessed [Y = FUNC1 (X)] Peter

The fears of his wife infected Peter.

Page 21: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Idiomatic translation: LF Temp

March: in - март: в2 Tuesday: on - вторник: в1 dawn: at - рассвет: на2 moment: at - момент: в1 Easter: at – пасха: на1

Page 22: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Sentence Paraphrasing X = CONV12 (X)

This group consists of 20 persons –

Twenty persons comprise this group;

X + Y = ANTI1(X) + ANTI2(Y) He began to observe the rules –

He stopped violating the rules

X = LABOR12 + S0(X) He respects his parents –

He treats his parents with respect

Page 23: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Sample Dictionary Entry (Excerpt): CHANCE

CHANCE1

POR:S

SYNT:COUNT,PREDTO,PREDTHAT

DES:'FACT','ABSTRACT’

Page 24: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

CHANCE

D1.1:OF,'PERSON'

D2.1:OF,'FACT'

D2.2:TO2

D2.3:THAT1

Page 25: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

CHANCE

SYN1: OPPORTUNITY

MAGN: GOOD1, FAIR1, EXCELLENT

ANTIMAGN: SLIGHT, SLIM, POOR, LITTLE1, SMALL

OPER1: HAVE, STAND1

REAL1-M: TAKE

Page 26: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

CHANCE

ANTIREAL1-M: MISS1

INCEPOPER1: GET

FINOPER1: LOSE

CAUSFUNC1: GIVE <TO1>

ZONE:R

TRANS:ШАНС/СЛУЧАЙ

Page 27: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

CHANCE

REG:TRADUCT2.00TAKE:XLOC:RR:COMPOS/MODIF/POSSESCHECK1.1 DEP-LEXA(X,Z,PREPOS,BY1)N:01CHECK1.1 DOM(X,*,R)DO1 ZAMRUZ:Z(PO1)2 ZAMRUZ:X(SLUCHAJNOST’)

Page 28: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

CHANCE

N:02CHECK2.1 DOM(X,*,*)DO1 ZAMRUZ:Z(SLUCHAJNO)2 STERUZ:X

TRAF:RA-EXPANS.16LA:THAT1

TRAF:RA-EXPANS.22

Page 29: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

What is UNL?

UNL is a formal language for meaning representation

A minimal unit of UNL is a UNL expression

UNL expression corresponds to a sentence of natural language in the amount of information conveyed

Page 30: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Internet

UNL Architecture

French People

Hindu PeopleSpanish People

Chinese PeopleFrenchChine

se

Spanish

UNL System

Hindi

Page 31: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

How is UNL made?

UNL is a formal language of meaning representation

A minimum UNL unit is UNL graph The amount of sense rendered by a UNL

graph corresponds to a natural language sentence

Page 32: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Two MT architectures: Transfer vs. Interlingua

Source text Target text

Interlingua

Transfer

Page 33: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

UNL approach to lexical design

Semantic units of UNL (universal words, UW) are designed on the basis of natural language (English) words which can be semantically modified if need be

Page 34: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

UNL strategy

Lexical meanings of the natural language are represented by UWs.

1. Lexical meaning coincides with the meaning of an unambiguous English word

2. Lexical meaning coincides with one of the senses of an unambiguous English word

3. Lexical meaning does not coincide with any of the lexical meanings of English

Page 35: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Disambiguation of natural word senses

Coach: bus, trainer, train, drill,...• coach(icl>bus>transport)

• сoach(icl>person,obj>sportsman)

• coach(icl>do,obj>sportsman)

• coach(icl>do,obj>student)

Page 36: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

прибежатьприлететь приплыть приползтиcome(met>run)come(met>plane)come(met>swim)come(met>crawl)

Formation of new UWs

Page 37: ETAP-3: State of the Art, Options, and Prospects of Development Leonid Iomdin Institute for Information Transmission Problems Russian Academy of Sciences

Prague, May 12, 2008

Formation of new UWs

жениться marry(agt>man)выходить замуж marry(agt>woman)