14
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25- 29/11/02 Proposals for solving some problems in UNL encoding International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002 Christian BOITET GETA, CLIPS, IMAG, Grenoble [email protected]

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

Embed Size (px)

Citation preview

Page 1: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/021

Proposals for solving some problems in UNL encoding

International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002

Christian BOITET

GETA, CLIPS, IMAG, Grenoble

[email protected]

Page 2: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/022

Which problems?

What Igor said "remains to be done"1. representation of multi-word concepts (« long

UWs »);

2. elliptical expressions;

3. treatment of arguments both in the UW dictionary and in the UNL expressions

and 1. conventions about attributes

2. XML formats for UNL documents

Page 3: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/023

Representation of multi-word concepts (long UWs) — 1

Problematic examples of "UNKNOWN LONG UWs""Institute of Advanced studies (UNU/IAS)"(icl>…)

"Institute of Advanced studies (UNU/IAS)"(icl>…)

"East-Asia cooperation office"

East-Asia cooperation office

east-asia cooperation office(icl>…)

"Tokyo University"

"University of Kyoto"

"World Bank(icl>…)"

Page 4: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/024

Representation of long UWs — 2

What are the problems?1. No hope of including all these long UWs in our

UNL-LLL dictionaries because of potentially immense, unbounded number of

such UWs Maybe never more than 5%, 10% of them in open

domains

2. Necessity to include an analyzer of English compounds in order to translate "unknown long UWs" piece by piece. but such compounds are extremely ambiguous

Page 5: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/025

Let us think a bit more

Proper nouns CAN be decomposed. This is NOT to say that their translation is always

compositional.Compositional: World Bank ==> Banque du

Monde false

Idiomatic: World Bank ==> Banque mondiale correct

So that we should have a solution allowing BOTHCompositional deconversion if the long UW is

unknown

Idiomatic deconversion after it put in the UNL-LLL dictionary

Page 6: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/026

Proposal of a solution

Origin Proposed by H.Uchida at a meeting in Tokyo (1999?) Not yet included but still needed and still the best

Principle Headword encodes a UNL representation of the

compound

Possible syntax"(mod(bank(icl>entity).@entry,world):01)"(icl> entity)

"(mod(bank(icl> entity).@entry,world))"(icl> entity)

… or a better one!

Page 7: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/027

How to deconvert Case 1:

"(mod(bank(icl>institution).@entry,world))"(icl>institution)is not in the UNL-FR dictionary

==> French deconverter "unwraps" mod(bank(icl>institution).@entry,world)

into a scope of the UNL-graph

Page 8: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/028

Another example

«"(mod(university.@entry,Tokyo(icl>town)):01)"(icl>entity)»

Compositional deconversion Université de Tokyo University of Tokyo Universität von Tokyo Tokyo no daigaku (or Tokyo ni daigaku)

Idiomatic deconversion Université de Tokyo (or Todai!) Tokyo University / University of Tokyo Universität Tokyo Tokyo daigaku / Todai

Page 9: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/029

Elliptical expressions

ExampleDo you prefer the first or the second solution?I prefer the first.

Je préfère le premier? Je préfère la première?

==> A bad deconversion will be very misleading.

Possible solutionEncode the elided element and put .@eld on it.That is equivalent to "preedit" the input text

I prefer the first <eld>solution</eld>.

…and in the spirit of the new idea by H.Uchida of preediting for semantic relations

Page 10: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0210

Treatment of arguments

in the UW dictionary in the UNL expressions

See talk by I.Bogulslavskij The solution proposed entails

1. a very small change in the UNL syntax Allow attributes .@A, .@B, .@C, .@D on arcs

hence also on restrictions by sem.rel.

2. a discipline in the UW creation all arguments should appear as restrictions

Page 11: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0211

"Argument-full" + "readable" UW

Argument-fulllook(icl>do,agt.@A>person,obj.@B>thing);

look(icl>do,agt.@A>person,gol.@B>thing);

look(icl>do,agt.@A>person,dst.@B>thing);

Readablelook(icl>do, agt.@A>person, obj.@B>thing);look for

something

Even more readablelook for(icl>do,agt .@A>person, obj.@B>thing);look for

something

Page 12: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0212

Continuing that list…

look for(icl>do,agt.@A>person, obj.@B>thing);look for something

look at(icl>do,agt.@A>person, plt.@B>thing);look at somethingor

look at(icl>do,agt.@A>person, obj.@B>thing);look at something

look like(icl>do,agt.@A>person, cmp.@B>thing);look like something

look like(icl>do,agt.@A>person, obj.@B>thing);look like somethingmight also cover "look as" in "he looks as a good man"

or

look as if(icl>do,agt.@A>person, obj.@B>thing);it looks as if…

look(icl>do,agt.@A>person, obj.@B>thing);look for something

Page 13: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0213

Attributes

The problemlion(icl>mammal).@plur

==> un lion, les lions, lions? We don't know whether definiteness

has been computed ==> it is .@undef ==> use itor not ==> it is UNKNOWN ==> compute default

Solution: for every attribute XXXX, put.@XXXX for +XXXX (1 or true).@unXXXX for -XXXX (0 or false)nothing for XXXX unknown (? or undefined)

Page 14: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0214

XML formats for UNL documents

A minimal UNL-xml format strictly equivalent of UNL-htmlr– proposed & used by Tsai W.J. for the SWIIVRE-UNL web

site & his Ph.D. Methodology for defining and using other, more

detailed UNL-xml-xyz formats: – xyz is an application (e.g. a graphical editor, or statistics-

gathering tool, etc.), – Automatic parsing of the basic UNL-xml format introduces

new tags, – An object document model (DOM) suitable for application

xyz can then be defined and used.