© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/021
Proposals for solving some problems in UNL encoding
International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002
Christian BOITET
GETA, CLIPS, IMAG, Grenoble
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/022
Which problems?
What Igor said "remains to be done"1. representation of multi-word concepts (« long
UWs »);
2. elliptical expressions;
3. treatment of arguments both in the UW dictionary and in the UNL expressions
and 1. conventions about attributes
2. XML formats for UNL documents
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/023
Representation of multi-word concepts (long UWs) — 1
Problematic examples of "UNKNOWN LONG UWs""Institute of Advanced studies (UNU/IAS)"(icl>…)
"Institute of Advanced studies (UNU/IAS)"(icl>…)
"East-Asia cooperation office"
East-Asia cooperation office
east-asia cooperation office(icl>…)
"Tokyo University"
"University of Kyoto"
"World Bank(icl>…)"
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/024
Representation of long UWs — 2
What are the problems?1. No hope of including all these long UWs in our
UNL-LLL dictionaries because of potentially immense, unbounded number of
such UWs Maybe never more than 5%, 10% of them in open
domains
2. Necessity to include an analyzer of English compounds in order to translate "unknown long UWs" piece by piece. but such compounds are extremely ambiguous
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/025
Let us think a bit more
Proper nouns CAN be decomposed. This is NOT to say that their translation is always
compositional.Compositional: World Bank ==> Banque du
Monde false
Idiomatic: World Bank ==> Banque mondiale correct
So that we should have a solution allowing BOTHCompositional deconversion if the long UW is
unknown
Idiomatic deconversion after it put in the UNL-LLL dictionary
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/026
Proposal of a solution
Origin Proposed by H.Uchida at a meeting in Tokyo (1999?) Not yet included but still needed and still the best
Principle Headword encodes a UNL representation of the
compound
Possible syntax"(mod(bank(icl>entity).@entry,world):01)"(icl> entity)
"(mod(bank(icl> entity).@entry,world))"(icl> entity)
… or a better one!
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/027
How to deconvert Case 1:
"(mod(bank(icl>institution).@entry,world))"(icl>institution)is not in the UNL-FR dictionary
==> French deconverter "unwraps" mod(bank(icl>institution).@entry,world)
into a scope of the UNL-graph
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/028
Another example
«"(mod(university.@entry,Tokyo(icl>town)):01)"(icl>entity)»
Compositional deconversion Université de Tokyo University of Tokyo Universität von Tokyo Tokyo no daigaku (or Tokyo ni daigaku)
Idiomatic deconversion Université de Tokyo (or Todai!) Tokyo University / University of Tokyo Universität Tokyo Tokyo daigaku / Todai
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/029
Elliptical expressions
ExampleDo you prefer the first or the second solution?I prefer the first.
Je préfère le premier? Je préfère la première?
==> A bad deconversion will be very misleading.
Possible solutionEncode the elided element and put .@eld on it.That is equivalent to "preedit" the input text
I prefer the first <eld>solution</eld>.
…and in the spirit of the new idea by H.Uchida of preediting for semantic relations
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0210
Treatment of arguments
in the UW dictionary in the UNL expressions
See talk by I.Bogulslavskij The solution proposed entails
1. a very small change in the UNL syntax Allow attributes .@A, .@B, .@C, .@D on arcs
hence also on restrictions by sem.rel.
2. a discipline in the UW creation all arguments should appear as restrictions
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0211
"Argument-full" + "readable" UW
Argument-fulllook(icl>do,agt.@A>person,obj.@B>thing);
look(icl>do,agt.@A>person,gol.@B>thing);
look(icl>do,agt.@A>person,dst.@B>thing);
Readablelook(icl>do, agt.@A>person, obj.@B>thing);look for
something
Even more readablelook for(icl>do,agt .@A>person, obj.@B>thing);look for
something
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0212
Continuing that list…
look for(icl>do,agt.@A>person, obj.@B>thing);look for something
look at(icl>do,agt.@A>person, plt.@B>thing);look at somethingor
look at(icl>do,agt.@A>person, obj.@B>thing);look at something
look like(icl>do,agt.@A>person, cmp.@B>thing);look like something
look like(icl>do,agt.@A>person, obj.@B>thing);look like somethingmight also cover "look as" in "he looks as a good man"
or
look as if(icl>do,agt.@A>person, obj.@B>thing);it looks as if…
look(icl>do,agt.@A>person, obj.@B>thing);look for something
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0213
Attributes
The problemlion(icl>mammal).@plur
==> un lion, les lions, lions? We don't know whether definiteness
has been computed ==> it is .@undef ==> use itor not ==> it is UNKNOWN ==> compute default
Solution: for every attribute XXXX, put.@XXXX for +XXXX (1 or true).@unXXXX for -XXXX (0 or false)nothing for XXXX unknown (? or undefined)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0214
XML formats for UNL documents
A minimal UNL-xml format strictly equivalent of UNL-htmlr– proposed & used by Tsai W.J. for the SWIIVRE-UNL web
site & his Ph.D. Methodology for defining and using other, more
detailed UNL-xml-xyz formats: – xyz is an application (e.g. a graphical editor, or statistics-
gathering tool, etc.), – Automatic parsing of the basic UNL-xml format introduces
new tags, – An object document model (DOM) suitable for application
xyz can then be defined and used.