Upload
dinah-daniela-carr
View
213
Download
0
Embed Size (px)
Citation preview
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Defining Comparison in a Computational Linguistics
FrameworkMaria Milosavljevic
Intelligent Interactive Technologies Group,
CSIRO Mathematical and Information Sciences North Ryde, NSW
http://www.cmis.csiro.au/Maria.Milosavljevic/
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Overview
» The Basic Ideas• Language Technology• Some Definitions• An Ontology of Comparisons• Comparison in Context• Conclusions and Future Directions
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
The Basic Idea
• Learning is incremental…
we augment existing knowledge with new knowledge in order to maximise the extent to which the new knowledge coheres with our existing knowledge
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
The User’s Knowledge
• Teaching should capitalise on the user’s existing knowledge:– maximise the hearer’s conceptual coherence of that
entity
– prevent the hearer from forming misconceptions about that entity
• Most NLG systems utilise a model of the user’s knowledge to prevent repetition
• It should also be used to build on her existing understanding
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Overview
• The Basic Ideas» Language Technology• Some Definitions• An Ontology of Comparisons• Comparison in Context• Conclusions and Future Directions
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
‘Meaning’
Text
Natural Language Analysis
Text
Natural Language Generation
Natural Language Technology
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Objectives of Text Generation
• Reduce information overload by constructing appropriate presentations on-demand
• Tailor text to the individual’s knowledge, needs, abilities, situation, language, previous interactions, etc.
• Decrease document construction and maintenance costs: texts are updated as underlying knowledge changes
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Overview
• The Basic Ideas• Language Technology» Some Definitions• An Ontology of Comparisons• Comparison in Context• Conclusions and Future Directions
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Some Definitions
• A property p of an entity is an ordered pair <a, v> consisting of an attribute a and its corresponding value v, for example <colour, red>.
• A focused entity is the topic of a text, or the entity being discussed in a text.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Some Definitions
• A proposition is a predication of a property to an entity, or the relationship which holds between two entities (for example, <part-of, mouth-piece, clarinet>).
• A description of a focused entity is defined as the linguistic realisation of a set of one or more propositions, the purpose of which is to allow the hearer to build a mental model of the focused entity.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Comparative proposition
• A comparative proposition is a proposition which states the existence of a difference or a similarity between two entities. For example, the comparative proposition below states that there is a difference between the entities dromedary camel and bactrian camel. Note that the attributes match (number-of-humps). This is important in order to draw similarities and differences together.
(difference
(hasprop dromedary-camel (number-of-humps 1)),
(hasprop bactrian-camel (number-of-humps 2)))
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Comparison
• A comparative clause is the linguistic realisation of a comparative proposition.
• A comparison is a set of one or more comparative propositions which together express the differences and/or similarities between two entities.
• A comparative text is the linguistic realisation of a comparison.
• For convenience sake, we will also use the term comparison to mean comparative text.
• A comparator entity is the entity which is being compared to the focused entity within a comparative text.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Uni/Bi-Focal
• A uni-focal comparison is a comparative text which has one primary focused entity.“Hearing aids have the same basic components as any public-address system, but all the components are miniature and the amplified sound is delivered to the ear of the hearing-aid user only.”
• A bi-focal comparison is a comparative text which has two equally-important foci.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Bi-focal Comparison
Rabbits and Hares, common name for certain small, furry mammals with long ears and short tails. Although the names rabbit and hare are often used interchangeably, in zoological classification the species called rabbits are characterised by the helplessness of their offspring, which are born naked and with closed eyes, and by their gregarious habit of living in colonies in underground burrows. (The exception is the cottontail of North America, which does not dig burrows; its nest is on the surface, usually in dense vegetation, and it is not social.) Species designated zoologically as hares are born furred and with open eyes, and the adults merely construct a simple nest and rarely live socially. Furthermore, the hare is generally larger than the rabbit and has longer ears with characteristic blackmarkings. Moreover, the skulls of rabbits and hares are distinctly different. ... (Encarta Encyclopedia)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Multi-Focal
• A n-focal comparison is a comparative text which has n equally-important foci. We will use the term multi-focal comparison to refer to a comparison with more than two foci.
“The buffeo, the smallest dolphin, is less than 1.2 m (less than 4 ft) long; the largest, the bottle-nosed dolphin, reaches a length of 3 m (10 ft). The killer whale is considered a dolphin despite its much greater length of 9 m (30 f t).”
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Overview
• The Basic Ideas• Language Technology• Some Definitions» An Ontology of Comparisons• Comparison in Context• Conclusions and Future Directions
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Whole text Partial text
Bi-focal
Directcomparison
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Direct Comparison 1
Rabbits and Hares, common name for certain small, furry mammals with long ears and short tails. Although the names rabbit and hare are often used interchangeably, in zoological classification the species called rabbits are characterised by the helplessness of their offspring, which are born naked and with closed eyes, and by their gregarious habit of living in colonies in underground burrows. (The exception is the cottontail of North America, which does not dig burrows; its nest is on the surface, usually in dense vegetation, and it is not social.) Species designated zoologically as hares are born furred and with open eyes, and the adults merely construct a simple nest and rarely live socially. Furthermore, the hare is generally larger than the rabbit and has longer ears with characteristic blackmarkings. Moreover, the skulls of rabbits and hares are distinctly different. ... (Encarta Encyclopedia)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Direct Comparison 2
Microsoft Carpoint Example Comparison between a BMW and an Audi
Choose a model 1998 1998
BMW Audi
3-Series A4
Price Range $21,390 - $41,500 $23,790 - $30,040
Airbags Driver, Passenger, Driver, Passenger,
Side Side
Choose a trim 318i 2.8
Base Price (MSRP) $26,150 $28,390
Base Invoice $22,930 $24,944
Destination Charge $570 $500
Driver Airbag Standard Standard
Passenger Airbag Standard Standard
...
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Direct
• A direct comparison is a bi-focal comparative text which exists as an entire text, and whose purpose is to: (i) highlight that the two foci exist and are highly similar; (ii) describe the foci; and (iii) distinguish the two foci.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of ComparisonsComparative text
user-initiated system-initiated
Objective: distinguish
Whole text Partial text
Bi-focal Multi-focal
Directcomparison
Significanttype
comparison
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Significant Type Comparison
In colder climates, ground squirrels commonly hibernate; tree squirrels do not.(Encarta Encyclopedia)
The feathers of the male bird may be different in appearance from those of the female bird of the same species. (Encarta Encyclopedia)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Significant Type
• The significant types of an entity are the partitionings of that entity into groups or parts of some kind. For example, an animal class can be partitioned into groups such as: male and female; captive and free; and young and adult, or into sub-parts such as head, body and tail, and so on.
• A significant type comparison is a multi-focal comparative text which is used within a description of a focused entity in order to: (i) inform the reader of the presence of some or all of the significant types of the focused entity; and (ii) provide the most relevant distinction(s) between these significant types.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Objective: distinguishComparator: potential confusorObjective: 1. misconception
prevention2. express uniqueness
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Domain-based
• A domain-based comparison is a uni-focal comparative text which occurs within a description of a focused entity, and which draws the hearer's attention to another similar entity within the domain in order to: (i) exemplify the uniqueness or non-uniqueness of the focused entity; and (ii) prevent the hearer from forming misconceptions about the similarity or otherwise of the two entities.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Objective: distinguish
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based
Set Complementcomparison
Comparative text
Comparator: potential confusorObjective: 1. misconception
prevention2. express uniqueness
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Set Complement Comparison
... the claws are short and lack the sheath that covers retracted
claws in other cat species. (Grolier Encyclopedia)
Being a cylindrical pipe stopped at one end, the clarinet overblows to the interval of a 12th above the fundamental pitch
(unlike flutes and oboes, which overblow to the octave). (Encarta Encyclopedia)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Set Complement
• A contrast set is any form of grouping to which the focused entity belongs, such as its parent class or supertype in a generalisation hierarchy.
• A set complement comparison is a domain-based comparison, between a focused entity and its complement in a contrast set to which it belongs.
• NOTE: In order to determine the uniqueness of the focused entity in the contrast set, we need to compare the focused entity to its complement in the contrast set, since the focused entity is not different to itself.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Objective: distinguishComparator: potential confusorObjective: 1. misconception
prevention2. express uniqueness
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based
Set Complementcomparison
Clarificatorycomparison
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Clarificatory Comparison
Track bikes are similar in appearance and construction to road racing bicycles, except that they lack brakes, have no variable gear mechanism, and weigh about 7 to 9 kg (about 15 to 20 lbs). Mountain bikes are built to withstand the rigorous conditions of off-road riding. Although their frames are commonly constructed of the same materials as other racing bikes, they have sturdier tubing. (Encarta Encyclopedia)
Sheep, are hollow-horned ruminants belonging to the genus Ovis, suborder Ruminata, family Bovidae. Similar to goats, sheep differ in their stockier bodies, the presence of scent glands in face and hind feet, and the absence of beards in the males. Domesticated sheep are also more timid and prefer to flock and follow a leader. (Grolier Encyclopedia)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Clarificatory
• A potential confusor of a focused entity is an entity which is highly similar to the focused entity, and which the hearer might confuse for the focused entity.
• A clarificatory comparison is a domain-based comparison, between a focused entity and its potential confusor. The purpose of the comparison is to distinguish the focused entity clearly from the potential confusor, thus preventing the hearer from forming misconceptions about the similarity (or otherwise) of the entities.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Objective: distinguish Comparator: knownObjective: better
understanding
Comparator: potential confusorObjective: 1. misconception
prevention2. express uniqueness
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based Familiarity-based
Set Complementcomparison
Clarificatorycomparison
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Familiarity-based
• A familiarity-based comparison is a uni-focal comparative text which occurs within a description of a focused entity which draws the hearer's attention to the similarities and/or differences between a focused entity and another entity with which the hearer is familiar, in order to allow the hearer to form a conceptual model of the focused entity more easily.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Objective: distinguish Comparator: knownObjective: better
understanding
Comparator: potential confusorObjective: 1. misconception
prevention2. express uniqueness
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based Familiarity-based
Set Complementcomparison
Clarificatorycomparison
Like-entitycomparison
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Like-entity Comparison
Sheep, are hollow-horned ruminants belonging to the genus Ovis, suborder Ruminata, family Bovidae. Similar to goats, sheep differ in their stockier bodies, the presence of scent glands in face and hind feet, and the absence of beards in the males. Domesticated sheep are also more timid and prefer to flock and follow a leader. (Grolier Encyclopedia)
All spiders are alike in some ways. Spiders have eight legs. Their bodies have two parts. Some people think that spiders are insects. But insects have six legs, and their bodies have three parts. Spiders and insects are two different kinds of animals. (National Geographic Encyclopedia K-2)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Like-entity
• A like-entity comparison is a familiarity-based comparison between the focused entity and a highly similar comparator entity.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
user-initiated system-initiated
Objective: distinguish Comparator: knownObjective: better
understanding
Comparator: potential confusorObjective: 1. misconception
prevention2. express uniqueness
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based Familiarity-based
Set Complementcomparison
Clarificatorycomparison
Illustrativecomparison
Like-entitycomparison
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Illustrative Comparison
Tachyglossus aculeatus, found in many habitats across Australia and Tasmania, is 35 to 53 cm long and has spines like a hedgehog's. (Encyclopedia Britannica)
They are about the size of a large cat and have long, bushy tails, a shaggy brown coat, and large ears. (Aye-aye, Encarta Encyclopedia)
Slightly larger than chinchillas, the mountain viscachas have
long, rabbitlike ears and a long squirrel-like tail. (Encarta Encyclopedia)
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Definition: Illustrative
• An illustrative comparison is a familiarity-based comparison whose purpose is to enhance the hearer's understanding of an attribute of the focused entity, by gauging the value for that attribute against the value of the same attribute for another entity which the hearer is familiar with.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Ontology of Comparisons
Whole text Partial text
Bi-focal Multi-focal Uni-focal
Directcomparison
Significanttype
comparison
Domain-based Familiarity-based
Set Complementcomparison
Clarificatorycomparison
Illustrativecomparison
Like-entitycomparison
Comparative text
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Overview
• The Basic Ideas• Language Technology• Some Definitions• An Ontology of Comparisons» Comparison in Context• Conclusions and Future Directions
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Context of Discourse
Participants
Topic Setting
Message form
Discourse
Firth 1957, Hymes 1962, Lewis 1972, Brown & Yule 1983
Speaker Hearer
PurposeBackgroundKnowledgePrevious
Discourse
Audience
EventTimePlaceObjects
Objects
Familiarity-basedcomparisonDomain-based
comparison
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Objects and the Hearer
• Relationship to the focused entity:– similarity – relatedness– spatial proximity
• Hearer information:– goals– knowledge– perceivability
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
W
E C
IV
Creating the Context
Liken DistinguishDistinguishingCharacteristics
OpportunisticLinks
UserKnowledge
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
AnimalDomain
Cheetah Leopard, Cat Class
Domestic CatDomestic DogHuman
Liken
Example - Animal Domain
DistinguishDistinguishingCharacteristics
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
JewelryDomain
JewelInstance
Supertype classesSimilar jewels
Jewels sharing some property
Other jewelsin this case
Discourse History
Example - Jewellery Domain
DistinguishingCharacteristics
OpportunisticLinks
Liken Distinguish
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Example
• The Alligator is a member of the Crocodylidae Family that has a broad, flat, rounded snout. It is similar in appearance to the related Crocodile. The Crocodile is a member of the Crocodylidae Family that has a narrow snout. The Crocodile is much longer than the Alligator (5.25 m vs 3.75 m). The Alligator has longer teeth on the lower jaw which cannot be seen when its mouth is closed whereas the Crocodile has one longer tooth on each side of the lower jaw which can be seen sticking up when its jaw is closed.
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Overview
• The Basic Ideas• Language Technology• Some Definitions• An Ontology of Comparisons• Comparison in Context» Conclusions and Future Directions
CS
IRO
Math
em
atica
l an
d In
form
atio
n S
cien
ces
Conclusions
• Analysis of types of comparison• Ontology & Definitions• Description via comparison
– Improving hearer’s conceptual coherence– Preventing hearer misconceptions
• Comparisons in context