Motion - Preps Verbs Pustejovsky

8/2/2019 Motion - Preps Verbs Pustejovsky

1/29

1

Introduction

1.1 Overview

1.1.1 MotivationNatural language abounds with descriptions of motion. This is hardly surprising,

since our environment teems with slithering, swimming, flying, and cruising crea-

tures that navigate in a world with natural elements that can spin, flow, slide, whirl,

etc. Our experience of our own motion, and our perception of motion in the world,

together have given human languages substantial means to verbally express many

different aspects of movement, including its temporal circumstances and its spatial

trajectory and its manner. In every language on earth, verbalizations of motion can

specify changes in the spatial position of an object over time. In addition to when and

where the motion takes place, languages additionally characterize how the motion

takes place: its path, its manner, how it was caused, etc. The path of motion, in

particular, involves conceptualizations of the various spatial relationships that an

object can have to other objects in the space it moves in.

Physicists and philosophers have long theorized about the nature of space and

spatial relationships. Newton (1995) believed that space has an existence independent

of physical objects, an absolute space that will remain always similar and immov-

able (Newton 1995, Scholium 3). Objects, in his account, occupy places that are part of

absolute space, which affords a universal coordinate system with objects and theirrelationships being characterizable in terms of Euclidean geometry. This sort of model

of space underlies most of the classical, pre-relativistic analyses of motion in physics.

The conception of space found in natural languages is quite different. As we shall

see, it allows for positioning objects in terms of coordinate systems, but does not have

built-in a universal, absolute coordinate system that allows for precise specification of

object positions. (Of course, languages can in many cases specify relatively precise

positions by importing absolute coordinate systems.) Typically, a figure object is

expressed as being in a particular orientation (left, east, under, etc.) with respect

to another reference or ground object and possibly a third object, the viewer

(Levinson 2003). A figure object can also be positioned in terms of topological

relations (inside, separate from, etc.) along with distance from a ground object.


2/29

When objects are positioned without a reference object, the descriptions can indicate

paths in a coordinate system (to the east or seaward). Space in language, at least

in terms of the way it is revealed by the use of closed-class terms for topology and

orientation, seems to be parasitic on objects and the relations between them, and canbe broadly described as incorporating a relational view of space.1

This book articulates a new computational linguistics approach to understanding

natural language descriptions of motion. Our goals are theoretical as well as prag-

matic. From a theoretical standpoint, we aim to provide a semantic theory of motion

expressions that can be used for computation. This sort of theory involves mapping

motion descriptions in natural language to formal representations that computers

can automatically reason with. As we shall see, such reasoning uses qualitative

models of space and time, making inferences about changes in the positions of

objects over time. From an empirical standpoint, we want our theory to mesh well

with natural language data, and so we allow our computational methods to avail of

information found in text corpora.

The ability to create computer programs that can automatically process large

corpora containing descriptions of motion has an important practical consequence:

it allows us to map from texts to data representations that can be of immense value in

everyday life. For example, a system could take a set of verbal directions for getting to

a particular place, and automatically transform it into a map with trajectories marked

on it. Narratives of journeys taken today and long ago could be parsed into logs thatrecord where, when, and how the various segments of the journey were carried out.

Documents involving media such as pictures and videos that have associated linguis-

tic annotations can be analyzed so as to retrieve spatial, temporal, and motion-related

information from collections of such media on the Web.

In this chapter, we will first discuss the challenges in linguistic analysis and

inference that are faced by such systems. After outlining our technical approach,

we highlight two key insights that inform our work. The challenges and our approach

give rise to a set of requirements that have to be met, in our view, in order to achieve

success; this constitutes a short list of desiderata. Last but not least, all research builds

1 The natural language-derived relational view of space that we have sketched is often viewed as being inconformity with Leibnizs philosophy of space. Leibniz denies the reality of an absolute space out there,arguing that space is a mental construct arising from an ordering of physical objects (like time, which he

views as a mental construct arising from an ordering of events). Specifically, an objects physical location isdetermined by its relation to that offixed (what we might call ground) objects: Particularly, that placeis that, which is the same in different moments to different existent things, when their relations ofco-existence with certain other existentes, which are supposed to continue fixed from one of those

moments to the other, agree entirely together. [ . . . ] Lastly, space is that which results from places takentogether. (Clarke 1717, p. 199; my elisions indicated by [ . . . ]). Leibnizs places are thus defined in terms ofrelations between objects, similar to the situation revealed in natural language usage. However, naturallanguage and its analysis has nothing to say about the metaphysical question as to whether space exists or isa mental construct.

2 Interpreting Motion


3/29

on the labor of others; to situate our work and convince the reader that we have

something interesting and plausible to say, we compare and contrast our work with

previous research in linguistics on spatial prepositions and motion verbs.

1.1.2 Challenges

In order to interpret motion expressions in natural language, each sentence has to be

first parsed along with morphological analysis, and once a syntactic structure is

arrived at and disambiguated from among alternative parses, the predicates and

their semantic arguments have to be identified, with the latter classified in terms of

their semantic roles (the agent of the event, the theme, the manner and path of

motion, etc.). To carry this out, the system must have knowledge of the morphology

and syntax of the language, as well as the mapping between the semantic arguments

of different lexical predicates on one hand, and on the other, the syntactic constitu-

ents (arguments) these predicates can combine with (i.e., subcategorize for) as well as

additional phrases (adjuncts) that co-occur with them in the sentence. This sort of

information is usually represented in a lexicon for the particular language. In

addition, the events must be anchored to the times they are purported to occur in.

For example, in the sentence The Princess of Wales arrived at a Christmas concert

last night, the syntactic subject The Princess of Wales has to be identified as the

Theme of the predicate arrive, at a Christmas concert as its Goal, and last night

as its Time. In addition, last night must be pegged to a time that is on the previousnight with respect to the speech time (which could of course be on the same day as

the speech time).

Here tense has to be recognized. Some languages (like the Bantu language Chi-

Bemba) have several past and future tenses; some, like Mandarin Chinese, do not

have grammatical tense; and still others like Burmese distinguish only between

ongoing or past events and others. These apparent linguistic peculiarities (which

are in fact entirely normal for the speakers of those languages) have to be taken into

account, along with context, to situate the event with respect to the speech time.

Events also have to be ordered with respect to each other, which can be non-trivial

when events are narrated in an order different from that of their occurrence. The

results of these inferences have to be represented in terms of an inventory of temporal

relations that is drawn from some calculus that deals with orderings in time. Time

expressions must also be resolved, to calendar times where possible.

These inferential tasks can be fairly challenging for computational approaches,

because most narratives will not explicitly date each event, and when time and

date expressions are used, they may be anaphoric, i.e., relative to times introduced

earlier in the discourse (as in arrived on Tuesday). Further, the inventory oftemporal relations in the calculus used must be expressive enough to capture the

distinctions between temporal relations found in any natural language; and it is also

Introduction 3


4/29

desirable to be able to carry out efficient computations using the calculus. This

reflects an important desideratum: the semantic representations need to be expressive

enough for natural languages, but also must be amenable to inference methods that

can be used in practical systems.Turning to spatial information, spatial references in the form of place names

(toponyms) mentioned in text must be identified and, when geographic in nature,

resolved to particular entities such as countries, mountain ranges, cities, etc., and when

construed as points, resolved to geo-coordinates where possible. This resolution

process can involve considerable disambiguation, as humans naturally tend to reuse

names when naming places as well as other entities. Spatial relationships involving

topological, orientation, and distance relations between places must be recognized.

This too can be challenging, due in part to the ambiguity of prepositions and adver-

bials. The unraveling of directions, in particular, can be notoriously difficult, as any

driver navigating from others helpful verbal directions can attest. In addition, some

languages have fairly elaborate inventories of closed-class terms for representing spatial

relations. For example, Talmy (2000) cites the (now extinct) Californian language

Atsugewi which has a set of suffixes appearing on the verb that mark some 50

distinctions of Ground geometries and the paths that relate to them. Some dozen of

these suffixes represent distinctions covered by the English preposition into, which

does not itself reflect such finer subdivisions. (ibid., p. 192). As with time, these spatial

relations must be represented in terms of some calculus that characterizes orderings inspace. Such a calculus must, of course, also satisfy the desideratum above.

The above inferences are just prerequisites for interpreting motion expressions.

Once the events are anchored to times, and the objects participating in the events are

located with respect to other objects in terms of spatial relations, motion events have

to be analyzed. In particular, information from the lexicon such as the class of the

motion verb must be brought to bear on the analysis; for example, run is a manner-

of-motion verb, while arrive is a path verb. This will allow the system to character-

ize motion events in terms of the event or situation involved in the change of

location, the object that is undergoing movement (the figure), the region (or path)traversed through the motion, a distinguished point or region of the path (the

ground), the manner in which the change of location is carried out, and the medium

through which the motion takes place. Once the motion is grounded in this way by

linguistic analysis, qualitative reasoning tools must operate on the underlying repre-

sentation, allowing inferences to be made. Maps and other visualizations that track

the movements of entities may also be generated from the representation.

1.1.3 ApproachThese requirements present a set of formidable problems for automatic interpreta-

tion of motion expressions in language. However, writing in the second decade of the



5/29

21st century, we believe computational approaches have started to address these

challenges. The goal of our book is to flesh out a computational approach, addressing

for the first time in a systematic manner the integration of the language of motion

with qualitative reasoning. This integration is evaluated in terms of the desideratumabove, discussed in Chapters 3 and 4, highlighting gaps and outstanding problems.

We also indicate along the way, in Chapter 5, the performance accuracies of practical

systems.

Our approach integrates together the linguistic conceptualizations with the formal

methods, mapping one to the other in the context of natural language processing.

Our approach is empirical, driven by instances of language use found in text collec-

tions (or corpora), especially the newsletters, travel blogs, route directions, etc., found

on the Web. In terms of methodology, these corpora are first annotated by humans

with features reflecting the kinds of linguistic distinctions and analyses mentioned

above. Computers then mine the annotated corpora to learn automatically how to

reproduce the annotations, using a variety of machine-learning tools. These annota-

tions are then mapped to the representations used by the formal models, allowing

reasoning to be carried out over motion information captured from natural language.

Throughout, the goal of satisfying the above desideratum is addressed to the extent

possible. The details of this methodology are described in Chapter 5.

The automatic systems that result from training on the annotated data offer both a

working embodiment of the theory and the modularity that it defi

nes, as well aspractical tools that can interpret motion expressions in language and generate

visualizations including maps and sketches. From a theoretical standpoint, this

methodology allows linguistic theories to be tested empirically, both in terms of the

breadth of their applicability when faced with actual language use, as well as the

precise linguistic representation that should result for each example. This test also

involves measuring the reliability of humans in terms of the annotations that they

produce. In practical terms, the approach results in systems with a text-to-sketch

capability that can display tracks on a map of where a moving object has been at

particular times. For example, given a bikers travel blog as input, a map with trackscould be generated as output. The resulting systems can be evaluated and compared

with each other, stimulating in turn the development of new and better methods.

In a nutshell, we offer an integrated perspective on how language structures

concepts of motion, and how the world shapes the way in which motion is linguisti-

cally expressed. The books approach is two-pronged: analysis of the details of

language use in different contexts (based on the exploitation of linguistic corpora),

along with theoretical modeling and formal reasoning (based on qualitative

representations).While there has been a great deal of linguistics research on the semantics of motion

verbs as well as locative constructions, and considerable research on qualitative spatial

reasoning, there has been little interdisciplinary effort on trying to connect these two

Introduction 5


6/29

fields in a systematic way. This is the first book, we believe, to analyze concepts of

motion in language while integrating these two fundamental points-of-view.

In the rest of this chapter, we outline two key insights that inform our approach.

After discussing our desiderata, to further situate our approach, we differentiate ourframework from other work in linguistics, as well as compare our classifications and

semantics for motion with other relevant approaches.

1.2 Key insights

1.2.1 Spatial abstractions

One of the key insights from prior research has to do with the types of conceptualization

needed to understand spatial language, e.g., Miller and Johnson-Laird (1976), Herskovits

(1986), Talmy (1983, 2000), among others. For example, research by Talmy (1983, 2000)

has characterized various primitive templates or schemas for representing motion. In a

description like (1), a complex spatial scene is abstracted as a geometric point (the

figure) moving towards another point (the ground) for a bounded temporal extent.

Likewise, a moving object may be described as a point moving along a path that is a line

(2), or as a line moving coaxially along the linear path (3).

(1) The ball rolled toward the lamp for 10 seconds.

(2) The ball rolled across the railway bed.

(3) The trickle flowed along the ledge.

The idealization is such that the speaker is able to abstract away from irrelevant

details such as the length or orientation of the path, representing each spatial scene

using a schema, and the hearer in turn is able to recreate the scenes from the schema.2

Talmy points out that these representations do not rely on Euclidean geometry and

the properties of metric spaces, emphasizing instead topological relations that remain

invariant irrespective of changes in sizes, distances, and shapes of the objects. He also

points out that while the expressions for the geometries of figure objects tend to be

limited in variety, the geometries of ground objects, by contrast, are less constrained

and vary considerably with the language, including bounded planes (e.g., the bike

sped across the field/around the track), cylindrical forms (the bike sped through

the tunnel), a wide variety of different types of enclosures (I crawled out the

window, I ran in the house), etc.

A related set offindings has to do with the differences across languages in the way

one can specify a figure object as being in a particular orientation (left, east,

2 The use of such intuitive geometries begs the question as to whether the points being idealized are infact mathematical points. After all, natural language does not typically construe points in space or time asbeing dimensionless; instead, they are all conceived as having extent.



7/29

under etc.) with respect to another reference or ground object and possibly a third

object, the viewer. Studies of speakers across a wide variety of languages have

revealed a basic inventory of three types of geometric coordinate systems (frames

of reference) whose types are unevenly distributed, along with a variety of idiosyn-cratic instantiations, across languages (Levinson 2003). The human ability to refer to

and pick out objects in space relies on these particular frames of reference. These are

discussed in more detail in Chapter 3.

While understanding spatial descriptions appears to rely on interpreting such

topological and geometrical relationships, it is important to note that it does not

require precise geometries. Humans, after all, communicate successfully by and large

without specifying the relatively exact (e.g. GPS) positions of objects and their shapes.

We are able to describe and understand fairly elaborate motions, without needing to

drill down into equations that characterize the physical motions signaled by these

verbs. The use of imprecise and often incomplete qualitative geometric descriptions

(instead of quantitative ones such as specifying the coordinates and shapes of every

object) allows human communication to be highly efficient. Our communication

relies on a rich commonsense model of the world that has proved sufficient for

humans to survive and evolve until now.

In turn, this fact has hardly gone unnoticed in artificial intelligence research.

Having an artificial agent reason qualitatively allows for reasoning to be more

effi

cient in some situations, since abstracting away from numerical details allowsthe agent to focus on more compact representations that isolate just the relevant

information needed to solve a particular problem. AI approaches to qualitative

reasoning have developed a rich set of geometric primitives for representing time,

space (includingdistance, orientation, and topological relations involving notions

such as contact and containment), and together with those, motion. The results of

such research have yielded a wide variety of spatial and temporal reasoning logics

and tools. Qualitative Spatial Reasoning has been successfully applied to military

sketch maps (Forbus et al. 2003), meteorology (Bailey-Kellogg and Zhao 2004), robot

navigation (Moratz and Wallgrn 2003), integration of sensor information forenvironmental monitoring (Jung and Nittel 2008), etc.

In contrast, the primitives specified in the linguistic approaches above are not

expressive enough for formal computational reasoning. To address this gap, in

Chapter 3, we map the geometric and topological primitives and calculi used in

qualitative reasoning in a systematic manner to natural language. Our work thus

allows for more formal and expressive models to be constructed for linguistic

representations. Our innovations are similar in spirit to Miller and Johnson-Laird

(1976) and Johnson-Laird (1977), who argued that understanding of language in-volves translating a sentence into an executable program. We are thus committed to

providing computationally expressive ways of representing motion expressed in

natural language, in particular subscribing to the idea that understanding motion

Introduction 7


8/29

in language involves assembling and executing programs. However, the program-

ming framework we use, discussed in Chapter 4, involves precise formal logics

developed in computer science, rather than Miller and Johnson-Lairds early and

somewhat ad hoc procedural semantics.3

In section 1.3.3, we compare our approachto the semantics of motion with several other approaches.

1.2.2 Motion semantics: action- versus location-based predicates

Motion verbs, according to Talmy (1985, 1991, 2000), occur in syntactic constructions

that express several semantic components: (i) a Figure object that moves with respect

to (ii) a Ground object, along a spatial region, called (iii) the Path. There are also two

additional components (called co-events, in keeping with his view that they are

construable as distinct events): (iv) the Manner of the movement and (v) the Cause

that is responsible for the motion.

A further distinction that Talmy makes (one that is largely borne out by cross-

linguistic research) is that languages have two distinct strategies for expressing

concepts of motion. In satellite-framing, commonly used in English and other

Germanic languages, as well as Slavic languages, also called manner-type languages,

the main verb conflates (i.e., contains a morpheme that encodes) the manner or

cause of motion, while path information is expressed in satellites.4 Here a satellite is

any constituent other than a noun-phrase or prepositional-phrase complement that

is in a sister relation to the verb root (Talmy2000, p. 102), and includes particles,affixes, etc.5 Thus, in (4a), the language represents the motion as an action of

bouncing, with slid/ rolled/ bounced expressing the manner of the motion,

and the path being expressed by the satellite down.6 In contrast, in verb-framing,

found in Turkish, Romance, Semitic, and other languages, also called path-type

languages, the verb conflates the path, whereas the manner is optionally expressed

by adjuncts, as in the Spanish (4b).

3 The procedural semantics of Miller and Johnson-Laird (1976) is based on primitive routines such asfindingin a search domain an entity referred to by a natural language description, testingif the particularproperties predicated by the description hold of it, and actingso as to make the description be true of theentity.

4 Such manner-of-motion verbs are extremely common in English, as attested by the long list of suchverbs in the verb classification of Levin (1993).

5 Talmy (1991) characterized satellites in more detail: The satellite, which can be either a bound affix ora free word, is thus intended to encompass all of the following grammatical forms, which traditionally havebeen largely treated independently of each other: English verb particles, German separable and inseparable

verb prefixes, Latin or Russian verb prefixes, Chinese verb complements, Caddo incorporated nouns and

Atsugewi polysynthetic affixes around the verb root.

(Talmy1991, p. 486).6 Likewise, in the napkin blew off the table, the verb conflates the Cause of the motion, with the pathbeing expressed by the satellite off, In addition to Manner/Cause and Path conflation, Talmy (1985)points out that verbs can also conflate Figure information, as in the Atsugewi verb root -caq-, whichmeans for a slimy lumpish object (e.g., a toad, a cow-dropping) to be move/be located.



9/29

(4a) The rock slid/rolled/bounced down the hill.

(4b) La botella entr a la cueva (flotando)

the bottle moved-in to the cave (floating)

The bottle floated into the cave.

Here the language represents the motion as a change of location. Note that there

are exceptions; English has Romance-derived verbs like enter, arrive, ascend

etc. that encode path. As Talmy (1985) points out, the (small number of) verbs in

English that conflate Path are mostly Romance borrowings.

Now, various scholars including Talmy have recognized that this classification is

not quite disjoint. For example, in languages involving serial verb compounds, like

Lahu, Thai, and Mandarin Chinese (Slobin 2004), it is unclear which one is the main

verb; and in Native American language families such as Hokan and Penutian, path

and manner morphemes together form part of a verb complex, with neither one

being classifiable as a main verb or satellite (Delancey 1989). Also, in the Australian

language Jaminjung, motion is expressed by one of five core verbs combined with

preverbs that encode both path and manner with neither one being of subordinate

status (Schultze-Berndt 2000). All such languages have been designated by Slobin

(2004) as belonging to a third category instantiating equipollent-framing, where

both manner and path are equally salient. In response, Talmy (2009) has accepted

that cases of equipollent framing definitely exist. For example, based on a set oflinguistic criteria for what constitutes a main verb, he points out that in the case of

Mandarin serial verbs, the verb in the first position is clearly the main verb, while the

verb in second position is sometimes viewed as subordinate, and sometimes a main

verbin the latter case, demonstrating equipollent framing. However, such in-

stances, he shows, are relatively rare.

Given this qualified but fundamental linguistic distinction,7 the semantic repre-

sentations for verbs can involve two classes of logical predicates: action-based

predicates (e.g., manner-of-motion verbs found in satellite-framing patterns, like

bike, drive, fly, etc.) and location-based predicates (e.g. for path verbs found inverb-framing patterns, such as arrive, depart, etc.). Action-based predicates do

not make reference to distinguished locations, but rather to the assignment and

reassignment of locations of the object, through the action. Since the location-based

predicates focus on points on a path, we view them as making reference to a

distinguished location, and the location of the moving object is tested to check

its relation to this distinguished value.

The predicate semantics makes use of Dynamic Interval Temporal Logic (DITL)

from Pustejovsky and Moszkowicz (2011), which in turn blends dynamic logic (Harel

7 For equipollent languages, our semantic representation will thus have to make use of a combination ofaction- and location-based predicates.

Introduction 9


10/29

1984) with a first-order linear temporal logic (Allen, 1984; Moszkowski, 1986; Manna

and Pnueli, 1995; Krger and Merz, 2008). DITL is a hybrid, first-order dynamic logic

where events are modeled as either dynamic processes or static situations. Here event

expressions refer to simple or complex programs, and states refer to preconditions orpost-conditions of these programs. Assignment-of-location is modeled as an atomic

program, and change-of-location is modeled as a compound program, whose

relation is determined compositionally by the relations denoted by its atomic parts.

This approach to modeling the semantics of motion is discussed in more depth in

Chapter 4.

There are obvious subtypes of action-based predicates, due, for example, to the

type of vehicle involved in the motion (bike, drive, etc.). Just as important are

aspects of manner defined in terms of topological constraints between the objects

throughout the motion. Consider a figure object that is moving with respect to a

ground object. Here we can consider four subclasses, based on the orientation of the

figure with respect to the ground, whether the topological relation is constant

throughout the process of motion, whether it involves all of the figure or only a

part thereof, and characteristics of the medium in which the figure moves.

Similarly, location-based predicates can be differentiated according to how many

formal qualitative dimensions are involved in their definitions. For example, the

simplest path is merely an implicit line associated with a distinguished end or

start point, as in the case of thetopological path

verbs

arrive

,

exit

,

take off

,

etc. This can be further refined to make reference to orientation or direction, as

in the orientation path verbs climb and descend, metric information, as in

the topometric verbs approach, near, etc., or a combination of both, as in the

topometric orientation expressions just below or just above.

In this book, we will examine how these categories and subcategories of motion

predicates are expressed through qualitative spatial and temporal models. In the next

section, we critically assess, in the light of our approach, prior work on the semantics

of spatial prepositions, verb classification, and motion verb semantics.

1.3 Desiderata

The challenges we identified earlier can only be met if we constrain our approach to

meet some strict requirements. These have to be borne in mind when we assess any

technical approach, both ours as well as that of other research. We list these now,

while delving into them further throughout this chapter and book.

1. As mentioned earlier: the semantic representations need to be expressive

enough for natural languages, but also must be amenable to inference methodsthat can be used in practical systems.



11/29

2. The semantic theory must be denotational, i.e. provide a mapping in terms of a

model of things in the world.

3. The semantic analysis must be compositional, i.e., the meaning of sentences

must be built up systematically from the meanings of the constituent phrasesand in turn the lexical elements in them, in tandem with the syntactic opera-

tions that assemble them.

4. The representations used have to support qualitative reasoning.

5. The systems built must be evaluated to be accurate and efficient enough to

support practical applications.

1.4 Theoretical background

1.4.1 Spatial prepositions

1.4.1.1 Classic studies There has been considerable prior research on motion verbs

(e.g. run), spatial prepositions (across), adjectives (narrow), adverbs (far),

nouns (lake), proper names (San Francisco), and other locative constructions.

We focus here on spatial prepositions and adpositions. Two key issues emerge from

the prior research. The first issue is the nature of the spatial representations involved,

and the second issue is what exactly differentiates the different senses to produce

polysemy. Underlying them both is a third issue, the characteristics and properties of

a theory of meaning.Prepositions are traditionally classified as either directional or locative (Miller and

Johnson-Laird 1976; Herskovits 1986; Zwarts and Winter 2000). Directional ones involve

a path and/or movement, and include across, around, from, into, onto, and

to. Locative prepositions are sub-classified into projective ones, which involve a point-

of-view (e.g. above, behind, below, beside, in front of, over, under) and

non-projective ones (e.g. at, between, in, inside, on, outside, near).

The work of Miller and Johnson-Laird (1976) represents a significant advance in the

modeling of the semantics of spatial prepositions. Consider their analysis ofin asin(5):

(5a) a cityin Sweden

(5b) the coffee in the cup

(5c) the spoon in the cup

(5d) the scratch in the surface

(5e) the bone in the leg

In (5a,b), the figure is entirelyenclosed within the ground object, whereas in (5c)

part of the figure need not be enclosed in the ground. In (5b,c), the ground object is

conceptualized as some form ofcontainer. In (5d,e), the figure is entirely enclosed inthe ground object, with (5d) dealing with two-dimensional (2D) objects and (5e)

dealing with three-dimensional (3D) objects. To handle these cases, Miller and

Introduction 11


12/29

Johnson-Laird develop a semantic theory of parthood and topological relations, i.e.

mereotopology. In their account, in has a common meaning in the above uses: the

figure has a part that is totally inside the ground object.8 Providing a theory of

mereotopology, built, say on primitive notions of connection and parthood, isessential, we believe, to characterizing of spatial relations. Such a theory will be

discussed more in Chapter 2 and formalized in Chapter 3.

Likewise, consider the uses of on in (6).

(6a) the scratch on the surface

(6b) the picture on the wall

(6c) the lamp on the table

(6d) the house on the river

(6e) the boat on the river

Miller and Johnson-Laird point out that in (6ac), the relation is between surfaces.

In (6b), part of the figure is over a part of the ground (such as a hook), and the latter

part supports the rest of the figure. In (6c), if the table is on a rug, which is on the

floor, it is fine to saythe table is on the floor, because the region of interaction with

the floor includes the table legs. But the transitivity is limited: we cannot say in (6c)

that the lamp is on the floor. Searching the region of interaction with the floor will

not reveal the lamp.

Functional notions such assupport

and

regions of interaction

(or

affordances

of objects (Gibson 1977)) are part and parcel of a theory of spatial relations; in this

book, though we will take note of their presence, we will not be formally representing

functional notions, as they presuppose a great deal of commonsense knowledge that

is difficult to acquire and represent in a general way for use in practical systems. Of

course, in specific domains, it is possible to enumerate object-specific functional

properties (including shape). For example, in their natural language-driven scene

rendering system, Coyne and Sproat (2001) associate 3D regions called spatial tags

with objects, so that the object representing daisy has a stem spatial tag and

likewise test-tube a cup spatial tag. Given the input expression the daisy is in thetest tube, the graphical output has the daisys stem inserted into the test tubes

cupped opening. A similar approach could be used to represent the meaning of (5c).

However, his daisy is in the scrapbook would presumably require an entirely

different spatial tag for daisy, begging the question of the enumeration of

domain-independent functional properties for each object.

Regarding (6d), it involves a path that is potentially ambiguous between being on

the edge of the ground object (the river) and being on the surface of the ground object

(where the surface is that part of the object that will refl

ect light to the eye or that can

8 In their semantic framework, the relations are between percepts of figure and ground, rather thanbetween things in the world.



13/29

be explored by touch), with a strong preference for the former (in contrast to (6e)).

Based on this and other evidence, Miller and Johnson-Laird argue that on has two

spatial meanings: either the figure is part of the region of interaction with the surface

of the ground object, with the ground supporting the figure, or else the figure object isconstrued as being in a path relation with the ground object.

In subsequent research, Herskovits (1986) proposed underlying geometric mean-

ings for spatial prepositions in English involving geometric relations between figure

and ground objects; these relations are between objects construed as points, lines,

surfaces, volumes, and vectors. The preposition on in (7a), for example, involves

concepts ofcontiguity(the figure is next to and touches the ground object) and (as

we have seen) support (the ground object supports the figure). However, in (7b),

contrary to Miller and Johnson-Laird, she argues that support is not involved.

(7a) The bookon the table.

(7b) The wrinkles on his forehead.

In addition, the objects related by a preposition must be modeled in terms of their

geometric properties, expressed as geometric functions that define characteristics of

the space occupied by the object. For example, a table is geometrically constrained to

be bounded and definite in shape, whereas water is not. Other geometric functions

include idealizations (approximations to a point, line, surface, or plane), parts (e.g.

edges, bases, surfaces, etc.), axes, volumes, projections, and what she callsgood-

form. For example, in (8a), good form provides the Gestalt closure on the tree such

that a bird can be contained in the space occupied by that form, shown in (8b), from

Pustejovsky (1989).

(8a) The bird in the tree.

(8b) Included-in (Part (Place (Bird)), Interior (Outline (VisiblePart (Place (Tree))))).

Turning to the issue of polysemy, Herskovits argues that (7a) above expresses an

ideal meaning of on, whose sense is shifted in (7b). Senses can also shift due to a

pragmatic degree oftolerance, i.e. to handle fuzzy cases of (7a) where the book is on atable cloth which is in turn on the table. As a result, while an ideal meaning is semantic,

the actual senses in use are produced as pragmatic alterations to the ideal meaning.

From the standpoint of a theory of meaning, Herskovits account rejects the notion

of a compositional theory. Further, although there is a sketch of a mereotopology,

there is no precise theory of how exactly the pragmatic alterations occur, resulting in

a lack of applicability to computational processes.

1.4.1.2 Cognitive linguistics Along with Herskovits work, there has been a great

deal of activity in cognitive linguistics on the semantics of spatial prepositions. Herewe will consider some of the core work from this area, while deferring a discussion of

Jackendoffs contributions to the next section.

Introduction 13


14/29

One of the fundamental tenets of this rather diverse field is that human concepts

are embodied, i.e., the concepts we have access to and the nature of the reality we

think and talk about are a function of our embodiment (Evans et al. 2007, p. 7).

Following (Johnson 1987; Lakoff and Johnson 1980; Brugman 1981; Mandler 2004;Evans, op. cit.), basic topological concepts like contact and inclusion (in the spatial

sense ofenclosure) are formed through the infants interaction with objects. In this

account, it is the schema of the container which underlies both the enclosure or

inclusion sense of in in (9a) and its metaphorical extension in (9b).

(9a) The cat is in the house.

(9b) The cat is in trouble.

The nature of polysemy is a contentious issue in cognitive linguistics. Consider the

preposition over, which has been the subject of considerable discussion. The classic

account of Lakoff (1987) makes fine-grained sense distinctions for the preposition

based on characteristics of the figure and ground object. In (10a), the landmark (i.e.,

ground object) is an extended object, but not so in (10b) (examples from Tyler and

Evans 2001):

(10a) The helicopter hovered over the ocean.

(10b) The hummingbird hovered over the flower.

Likewise, in (11a) there is contact with the wall, whereas there is not in (11b); in(11c), there is covering and occlusion of the ground. These differences would

warrant, in the classic account, different senses for over.9

(11a) The boy climbed over the wall.

(11b) The tennis ball flewover the wall.

(11c) Joan nailed a board over the hole in the ceiling.

(11d) The heavy rains caused the river to flowover its banks.

In general, this sort of argument by appeal to arbitrary spatial distinctions proliferates

senses in a somewhat unprincipled manner. There is no underlying mereotopologicaltheory, providing no way of building up spatial concepts from more primitive ones.

Researchers have struggled to constrain the number of senses, using (quite sensi-

bly) dictionaries, lexical resources, and various theoretical criteria. For example,

Tyler and Evans (2001) take their cue from Herskovits and propose a proto-sense

or (primary sense) of every preposition that they argue is the diachronically earliest

sense;10 the proto-sense ofover means above except that unlike above, there is

potential contact with the ground. Notably, this sense does not contain path

9 Examples in (11) from Tyler and Evans (2001, pp. 728, 732, 757).10 Postulating the diachronically earliest sense as more basic in every case does not seem at all correct

given modern usage.



15/29

information. The above and across interpretation in (11a) and (11b), which does

include the path, is not a different sense ofover, but arises in conjunction with the

meaning of the verb and the figure and ground objects. In (11c), however, a non-

primary sense of

over

is differentiated, as it involves the distinct spatial notion ofcovering. In (11d), the sense is distinguished based on a supposedly distinct spatial

notion of excess given by a cognitive scenario of a container overflowing, with the

figure rising higher than the top of the ground object.

The Tyler and Evans proposal suffers from the same problems we observed with

Herskovits account. Appealing to potential contact between figure and ground only

serves as a way of grouping together disjunctions. Further, (11d) does not seem to

warrant a different sense, given the contribution of the verb flow. In addition, as

Cuyckens (2007) points out, consider (12a) and (12b).

(12a) The cat jumped overthe wall.

(12b) The cat jumped up on the wall.

The only syntactic difference is the preposition, but (12a) results in a different path

than (12b)the cat ends up on the wall in the latter, but on the other side of the wall

in the former. Thus over must involve a path meaning. Having said that, the

question arises as to the set of spatial properties that should be considered when

distinguishing spatial senses of a preposition. Unless these properties are drawn from

a structured domain, in particular geometric or topological domains that can bemade mathematically precise, pretty much any set of spatial properties that sound

relevant might be used, since the theory has no way of evaluating them except by

arguments based on linguistic tests.

In general, the inability to find reliable criteria to differentiate word senses is also a

reflection of the lack of empirical, corpus-based methodology in the cognitive

linguistics approach. Corpus-level annotation of word senses is a well-established

task in computational linguistics, e.g. SENSEVAL-1 (Kilgarriff and Palmer 2000). In

these annotation efforts, fine-grained lexical resources such as WordNet (Fellbaum

1998), where different senses of words are grouped into synonym classes calledsynsets (with the classes being linked by conceptual relations such as hypernymy

and part-whole relations), have been used as sense inventories for annotating open-

class terms in large corpora. Certain senses will of course be more frequent than

others, and the more frequent ones may coincide with notions of central or more

salient meanings for a given word. (As it happens, WordNet provides a ranking of

different senses based on frequencies in the British National Corpus.) This sort of

project also has the practical benefit of dividing the problem of polysemy into those

word senses that are easy to agree on and those that arent, focusing attention on the

ones that pose challenges, and perhaps suggesting revisions or limitations to the

sense inventory. In SENSEVAL-3 (Mihalcea and Edmonds 2004), annotators agreed

with each other almost two-thirds of the time.

Introduction 15


16/29

Turning to the theory of meaning, cognitive linguistics is an inherently mentalistic

theory of meaning.11 In contrast, denotational theories12 are important for several

reasons: (i) Truth and reference are important for successful communication, as

work in discourse modeling, e.g. Kamp and Reyle (1993) indicates. (ii) Mentalistictheories tend not to tell us what role in understanding the things communicated

about play. As Putnam (1975) points out, a person may not have the conceptual

knowledge to tell the difference between a beech and an elm, even though the two

terms clearly refer to different things in the world. (iii) Using a logical representation

allows for logical inferences to be made, for formal properties of computation to be

studied systematically, etc. The latter property is of course of considerable interest to

computational approaches.

1.4.1.3 Jackendoff In our earlier linguistic analyses, we mentioned paths. In additionto Talmy, another cognitive linguist who provides a rich representation for paths is

Jackendoff (1983, 1990). In his theory of Lexical Conceptual Structure (LCS), the verbs

of location and motion are viewed as fundamentally spatial, with non-spatial senses

being an extension of the spatial senses. Jackendoff gives distinguished status to

places and paths in LCS.

Paths can be bounded, where the ground is the start- or end-point of the path.

Another type of path is a direction, as in (13a), where the ground object does not fall

on the path, but would if the path were extended some unspecified distance (ibid.,

p. 165). A third kind is a route, where the ground object is related to some point in the

interior of the path, as in (14a). Unlike Herskovits account, Jackendoffs semantics

has an implicit mereotopology and is compositional. He relies on functions to

assemble meanings of words together to form meanings of phrases. A place-function

(e.g. IN, ON, INSIDE, UNDER, etc.) takes a Thing and returns a Place, while a path-

function (FROM, TO, TOWARD, AWAY-FROM, and VIA) takes either a Thing or

a Place and returns a Path. Examples of place-and path-functions are shown in the

prepositional phrase meanings in (13b) and (14b).

(13a) [John ran] towardthe house.(13b) [Path TOWARD ([Thing house])]

(14a) [The car passed] through the tunnel.

(14b) [Path VIA ([Place INSIDE ([Thing tunnel)])]

11 Mentalistic, or representational theories of meaning, are concerned mainly with understanding therelation between linguistic expressions and things in the speakers mind, namely, explaining what goes on

in peoples minds when they use language.12 Denotational theories of meaning (i.e. as found in model-theoretic semantics) are concerned mainly

with the correspondence between expressions and things in the environment, and thus this enterprise aimsat a theory of truth and reference. Such theories represent the environment in terms of a formal model forthe denotation of expressions.



17/29

While the semantics of LCS is obviously compositional, it is not intended to be

truth-conditional, and is thus in keeping with cognitive semantics precepts. Since it

has no basis in logic, Conceptual Structure cannot be used to make logical inferences,

and as such cannot account for entailments between sentences.13

Another drawbackis that the primitives corresponding to prepositions, such as IN, ON, TOWARD,

INSIDE, etc. are not further elaborated to support reasoning; they are functors in a

compositional syntax, but are not differentiated from each other in terms of seman-

tics. Finally, unlike the work say of (Talmy 2000), the geometry used is far too

abstract to be relevant to computational modeling of spatial reference and motion.

1.4.1.4 Vector representations It must be acknowledged that Jackendoffs ontology

of paths and places and the differentiation between place- and path-functions

constitute one of the more expressive accounts of the semantics of spatial preposi-tions offered within an entirely compositional semantics. His basic notions of paths

have been further elaborated by others, most notably within a denotational semantics

by Zwarts (2003). In the latters work, a spatial preposition denotes a set of paths,

where a path is defined as a continuous function from the real interval [0, 1] to points

(or regions) in space. The denotation of a prepositional phrase (PP) of the form into

the room is a set of paths whose end-point is inside the room. Zwarts associates

events with paths via a function that takes an event and returns its path. Accordingly,

the denotation of a verb phrase (VP) of the form enter the room is a set of events

such that (only) the end-point of the events path is inside the room.

In support of this theory, relations like into, inside etc. are based on an

underlying model of vectors14 (Zwarts and Winter 2000). Here, the preposition

inside is treated as a function which maps a set of points representing the ground

object A to a set of vectors whose start-points are on the boundary of A and whose

end-points are internal to A. Since there may be multiple vectors from different

points on the boundary to the particular end-point, only the shortest vector is

considered. The set of points representing an object is treated as convex,15 in keeping

with our use of prepositions like

inside

to conceptualize even non-convex groundobjects as being convex. As Zwarts and Winter point out, the ball is inside the bowl

is compatible with a situation where the ball is sitting on the bottom of an open bowl,

where the ball actually occupies a space that is disjoint from that of the bowl.

The preposition outside is similar, except that the externally closest vectors are

involved, i.e. the shortest vectors that start at the boundary of A and end at points

13 However, a truth-conditional semantics for Conceptual Structure has been demonstrated by (Zwarts

and Verkuyl 1994), who recast it as a many-sorted first-order logic.14 Other researchers have also explored vectors, including Talmy (2000), Bohnemeyer (2003), OKeefe(2003), and Carlson et al. (2003). However, they have not concerned themselves with building up acompositional semantics for spatial language based on vectors.

15 A set of points is convex if the line segment joining any pair of points in the set lies entirely in the set.

Introduction 17


18/29

not belonging to A. As for the preposition on, its meaning is a set of vectors each of

whose end-points is outside the set of points corresponding to the figure object, but

whose length is less than some small number, so that distance between figure and

ground is near zero.Although the theory of Zwarts and Winter (2000) does provide an elegant

compositional semantics for PPs, including those modified by measure phrases, it

can be faulted on several grounds. For one thing, though there are vectors and point

sets, there is no explicit mereotopology. The invocation of metric notions of distance

to represent topological relations is somewhat counter-intuitive. A related failing is

that the theory does not distinguish between in and inside, or between at and

on, and the case of (5c) mentioned earlier, where there is a part of the figure that is

outside the ground object, is ignored. Finally, carrying out formal reasoning using

these vector models is still an open question. In short, the theory does not provide an

adequate grounding in a spatial semantics that can be used for reasoning.

1.4.1.5 Assessment In summary, then, the prior theoretical research, while

providing insightful discussions of the semantics of spatial prepositions, has made

assumptions (such as those of cognitive linguistics) that are untenable in a computa-

tional approach, and has also largely ignored evidence from corpus-based annotation

efforts at distinguishing senses in context. While compositional treatments of prepo-

sitional meaning have flourished, the question of what underlying spatial primitives

to rely on has not thus far been tied to those available in qualitative reasoning

systems. In Chapter 3, we explore topological and geometric representations that

can be used for expressing prepositional meaning in qualitative reasoning systems.

1.4.2 Motion verbs

1.4.2.1 Langacker As with spatial prepositions, there has been a fair amount of

research on the semantics of motion verbs. We had earlier discussed the influential

work of Talmy and Jackendoff. Another key cognitive linguist who has tackled

motion is Langacker (1987). It is not possible to do justice to his overall cognitivistphilosophy here; instead, let us get down to brass tacks and examine his analyses of

motion verbs. Consider the verb enter. Langacker (1987) characterizes it as a

dynamic process, whose conceptual semantics involves, in effect, a temporally in-

dexed sequence of relations between the trajector (i.e. movingfigure object) and the

landmark (i.e. ground object, which may or may not move). The trajector changes

from a state of being spatially OUT with respect to the landmark to a state of being

IN with respect to the landmark. From his diagrams of image-schema16 (ibid.

16 An image schema is a mental pattern that recurrently provides structured understanding of variousexperiences, and is available for use in metaphor as a source domain to provide an understanding of yetother experiences (Johnson 1987, pp. 24).



19/29

p. 245, figures 7.1 and 7.2), it appears that this change of state occurs over a conceived

time interval, where the process involves a sequence of an indefinite number of

component states (ibid. p. 244). As for the relations IN and OUT, they are explained

informally as follows:

The relation [A IN B], based on immanence, specifies that thecognitive events constituting the conception of A (in a given domain) are included

among those comprised by B. The relation of separation, which I will give as [A OUT

B], is based on the absence of such inclusion. (ibid. p. 228).

In contrast, the verb arrive, according to Langacker (1987), presupposes an

extended path of motion on the part of its trajectory, but only the final portions of

this trajectorythose where the trajector enters the vicinity of its destination and

then reaches itare specifically designated by this verb. (ibid. p. 246).

Langackers account does clearly capture some of our topological intuitions about

enter. However, his presentation relies on diagrams representing image-schema,

and there is no formal description of the process of entering. While one can accept

the idea of a primitive spatial relation IN standing for inclusion, characterizing it in

terms of relationships between cognitive events is somewhat vague. Further, there is

no clear distinction between enter and arrive, except by way of various diagrams

and the informal definitions above. More specifically, there is no statement that

arrive involves the trajector, at the end of the process, being merely AT the

landmark, as opposed to being IN the landmark as in the case of enter. This

problem is further borne out by his analysis of the verb

leave

: (Langacker 1988,p. 96) indicates that the trajector is at first IN with respect to the landmark, and then

overlaps with its boundary (i.e. trajector is AT the landmark), before being OUT with

respect to the landmark. Here too, there is no difference from exit.

Having critiqued his account, it is worth pointing out that Langackers intuitions

reflect a topological view of motion verbs. In Chapter 3, we will formalize notions

such as IN in terms of mereotopology, and in Chapter 4, we will provide a formal

semantics for verbs like enter and arrive that gives a specific computational

interpretation to notions similar to Langackers.

1.4.2.2 Jackendoff Let us turn now to the interpretation of motion in Jackendoffs

LCS (Jackendoff1983, 1990). In LCS, verbs of spatial motion, such as bike, are given a

common semantic template, which determines their syntactic behavior, shown in (15).

(15) [Event GO+LOC ([Thing]x, [Path]y)]

GO is a semantic primitive of motion, which is a function that takes as inputs a Thing

and a Path and returns as output an Event. GO+LOC involves movement specialized

to a locative semantic field17. When the above verb template is combined with a path

PP, we get examples like (16).

17 Analogously, verbs of temporal motion, such as delay, use GO+TEMP.

Introduction 19


20/29

(16a) John biked to the store.

(16b) [Event GO ([Thing John], [Path TO ([Place AT ([Thing store])])])]

A verb like enter is treated as equivalent to go into, and has the more

instantiated semantics shown in (17).

(17) [Event GO ([Thing]x, [Path TO ([Place IN ([Thing]y)])])]

Note that LCS, in addition to bearing the disadvantages described in the previous

section, also blurs important differences, since all motion verbs are represented just

by either GO(Thing, Path), STAY(Thing, Place), as in cling, ORIENT(Thing,

Path), as in point, BE(Thing, Place) as in lie, and GO_Ext(Thing, Path), as in

reach, along with their specialization to different semantic fields. The inability to

distinguish among verb meanings is a serious problem with such highly abstractrepresentations of meaning.

1.4.2.3 WordNet Given the theories of verb semantics, one would expect that lexical

resources would exist that provide a rich semantics for motion verbs. Unfortunately,

this is not the case. We mentioned WordNet (Fellbaum 1998) earlier, and its

differentiation and ranking of word senses based on corpora. In WordNet, verbs

are grouped into a hierarchy, with related verbs differentiated by manner into

troponyms. For example, the troponyms of arrive are: land, reach, flood/drive/

come in, light, perch, force-land, beach, disembark, debark, set down, touch down, andcrash land. However, while WordNet is widely used for its coverage of relations such

as synonymy and hypernymy, which is what it was designed for, it is impoverished

not only in terms of the syntactic representations for the verbs, but also in terms of

the absence of any semantic representation for lexical items. Consequently, research-

ers have integrated WordNet with other resources that provide the missing

information.

1.4.2.4 VerbNet VerbNet (Kipper et al. 2006) is one such key lexical resource that

provides syntactic and semantic information about verbs which are grouped intoclasses based on extensions of the well-known classification of Levin (1993). We first

discuss the latters classification, where verbs are grouped into semantic classes based

on participating in common meaning-preserving syntactic constructions involving

syntactic arguments, called diathesis alternations.

For example, consider the verbs break and cut. As seen in (18) (examples from

Kipper-Schuler (2005)), break participates in transitive (18a), the simple intransi-

tive (18b), the middle construction (18c), but not the conative alternation (18d).

(18a) John broke the jar.(18b) The jar broke.

(18c) Jars break easily.

(18d) *John broke at the loaf.



21/29

In comparison, cut participates in the transitive, middle, and conative alternations.

(19a) John cut the bread.

(19b) *The bread cut.

(19c) Bread cuts easily.

(19d) John valiantlycutat the frozen loaf, but his knife was too dull to make a dent

in it.

These differences are grounds, in Levins account, for splittingbreak verbs (along

with similar-behaving verbs such as chip, crack, crash, crush, fracture, rip, shatter,

smash, snap, splinter, tear) into a separate class from cutverbs (with fellow-members

chip, clip, cut, hack, hew, saw, scrape, scratch, slash, snip). In particular, the motion

verbs (Levin class 51) are grouped into 9 subclasses.

As Kipper-Schuler (ibid.) points out, this method also produces classes whose

members are far from synonymous, e.g. the braid class, which counts among its

members bob, braid, brush, clip, comb, condition, crimp, crop, curl, etc. Further, the

classes are not disjoint, and some verbs are members of multiple classes with

conflicting sets of alternations. VerbNet attempts to fix these and other problems

by refining the classes (e.g. as in Dang et al. (1998), grouping together classes which

share at least three members), adding new classes, integrating the classes with

WordNet, and most importantly, providing semantic templates for each of the

classes.For example, consider the semantics for the path verb arrive in VerbNet (version

3.1), as in arrived in the US. The entry specifies that the entity that fills the semantic

role of Theme (the subject noun phrase (NP)) moves during the arrival event, and

that at the end of the arriving event, the location of the moving object is in the US,

i.e. the entity that fills the semantic role of the Oblique object (the PP). Thus, the

semantic information for arrive is expressed as:

(20) motion(during(E), Theme) location(end(E), Theme, Oblique)

As we shall see in Chapter 2, arrive is a verb whose meaning involves the figureobject traversing a path that goes from its not being located at the ground object to its

being at the ground object. Although (20) does not make reference to paths and to

start(E), VerbNet appears to at least capture part of the meaning.

However, as Zaenen et al. (2008) reveal, while some of the motion verbs in

VerbNet (such as carry) have start and/or end point information, others dont,

leaving a great deal of incompleteness. They argue that although they were able to get

around some of these glitches and extract change of location information from

VerbNet by a variety of post-processing rules, there is a more fundamental problemwith the VerbNet approach: the classification is driven by syntactic considerations

separating arguments from adjuncts. As is well-known, there is no one-to-one

mapping between syntactic predications and semantic ones. The latter often include

Introduction 21


22/29

as arguments constituents that are syntactically adjuncts. For lexical resources to be

helpful in normalizing textual information, they have to encode the distinction

between syntactic and semantic predication and be systematic about the correspon-

dence between the two

. (ibid., p. 390). Their investigation reveals, unfortunately,that VerbNet lacks such a systematic mapping.18

1.4.2.5 FrameNet Another well-known lexical resource is FrameNet (Baker et al.

2003), which has been developed based on the underlying theory of Frame Semantics,

e.g. Fillmore (1976). It involves specifying each lexical items syntactic properties in

the context of a hierarchy of semantic structures called frames, which represent the

experiential knowledge evoked by lexical items. The semantic roles of verbs (called

frame elements) are annotated in terms of corpus examples.

For example, consider the path verb

arrive, for which a FrameNet III example is

shown in (21).

(21) [The Princess of Wales THEME] arrived TARGET [smiling and laughing DEPIC-

TIVE] [at a Christmas concert GOAL] [last night TIME].

In FrameNets view, the lexical entryarrive evokes the frame ofarriving, which

is a subframe of (i.e. is part of) the traversal frame, which in turn is a subclass of the

motion frame and involves the Theme changing location with respect to a Path.

In the motion frame, a Theme starting out at a location expressed by the Source

role ends up at a Goal location, covering space between the two, expressed by the

Path role; or else, the Theme moves in a particular Area of Direction, or its Distance

may be expressed.19 Arriving involves a moving object (filling the semantic role of

Theme) moving in the direction of a location filling the semantic role of Goal.

According to the comments for the arrive lexical entry, the Goal is always

implied by the verb, but may or may not be explicit in the text; it indicates where

the Theme ends up, or would end up, as a result of the motion. Note that this

FrameNet representation is weaker than the one we have been advocating, in that it

doesnt commit to the

figure object of the Princess of Wales in (21) being located, at

the point of arrival, atthe ground object (the site of the Christmas concert). In turn,

FrameNets representation for the preposition at, while it is associated with a

Locative_relation frame (a subclass of the Trajector-Landmark frame that is derived

from Langackers account), does not convey any specific semantics for at.

18 In more recent work, Palmer et al. (2009) have tried to address some of these issues.19 The motion frame is defined as Some entity (Theme) starts out in one place (Source) and ends up

in some other place (Goal), having covered some space between the two (Path).

Additional frames thatinherit the motion frame elaborate on this definition. Goal-profiling frames account for verbs such asreach. Source-profiling frames capture verbs from the Leave class. Path-profiling frames are for verbssuch as traverse or cross, and, finally, the manner of motion can be elaborated on in additional framesfor verbs like run and fly.



23/29

Likewise, the verb enter, which is also associated with the arriving frame and

illustrated in (22), does not indicate that at the end of the event, the figure we is

inside the ground object the upper room, thus failing to distinguish enter from

arrive

(in the latter, the figure is merelyatthe ground).

(22) We THEME entered TARGET [the upper room GOAL] [by a flight of stairs leading

from the north side of the yard PATH].

While FrameNet seems to do well with change of location motions, the hierarchy

can be confusing. Sometimes the motion frame is directly inherited as in the case of

the traversal frame. Conversely, the departing frame uses the motion frame (i.e. it

does not necessarily inherit or specialize the semantic roles of the motion frame) and

is a subclass of the traversal frame.

As another example, the manner verb drive is associated with the frame of

operate_vehicle, which has semantic roles that include those illustrated in (23),

from FrameNet III.20

(23a) [Jamie Shepherd DRIVER] droveTARGET [the bucketing old vehicle VEHICLE]

[out of the estate SOURCE] [towards the main road PATH].

(23b) [The riders DRIVER] droveTARGET [all over the place AREA].

(23c) Dhamma is [the charioteer DRIVER] [that DRIVER] drivesTARGET [the chariot

VEHICLE] [along the road [to Nirvana GOAL] PATH].The frame operate_vehicle is a subclass of the Operating_a_system frame,

inheriting or specializing all its semantic roles; it also uses the motion frame.

However, the combined information does not explicitly indicate that driving a

vehicle involves an iterated change of location. In Chapter 2, we will provide such

a semantics for manner verbs like drive.

All in all, while FrameNets rich subclassification of motion verbs and its integra-

tion of semantics, syntax and corpus data are both impressive and commendable,

FrameNet does not address or explicitly represent the sorts of spatial relationshipsinvolved in motion that we have been emphasizing. Further, although it has been

used for inferential tasks such as question-answering (Narayanan and Harabagiu

2004), FrameNets representation, even when mapped to knowledge representation

languages such as OWL, is not directly amenable to spatial reasoning. And although

FrameNet, VerbNet and WordNet have been mapped to each other, e.g. (Shi and

Mihalcea 2005), such an integrated resource, given the discussion above, also does

not address our desiderata.

20 As the FrameNet III website indicates, the semantic role AREA is used for expressions which describe ageneral area in which motion takes place when the motion is understood to be irregular and not to consist ofa single linear path. Locative setting adjuncts of motion expressions may also be assigned this frame element.

Introduction 23


24/29

1.4.2.6 Verb classifications based on qualitative reasoning Let us now turn to other

verb classifications, inspired by work in qualitative spatial reasoning (QSR). One of

the most successful models in QSR, which has been used for static spatial relations, is

the Region Connection Calculus 8 (RCC-8), (Randell et al. 1992), a calculus grounded

in mereotopology (to be discussed in Chapter 2). It identifies the following eight

jointly exhaustive and pairwise disjoint relations between two regions A and B:

(24) a. Disconnected (DC): A and B do not touch each other.

b. Externally Connected (EC): A and B touch each other at their boundaries.

c. Partial Overlap (PO): A and B overlap each other in Euclidean space.

d. Equal (EQ): A and B occupy the exact same Euclidean space.

e. Tangential Proper Part (TPP): A is inside B and touches the boundary of B.

f. Non-tangential Proper Part (NTPP): A is inside B and does not touch the

boundary of B.

g. Tangential Proper Part Inverse (TPPi): B is inside A and touches the bound-

ary of A.

h. Non-tangential Proper Part Inverse (NTPPi): B is inside A and does not

touch the boundary of A.

As we shall see in Chapters 2 and 3, RCC-8 and other systems like it do an adequate

job of representing static information about space. However, it cannot help us deal

with motion, since that task requires a temporal component. Muller (1998) proposes

just such a system, one which merges spatial and temporal phenomena with a

qualitative theory of motion based on spatiotemporal primitives. This system has

at its base a topological system borrowed from Asher and Vieu (1995) that is similar

to RCC-8 but adds the concept of open and closed regions, as well as a set of temporal

relations that include a relation of temporal connection, along with the standard

ordering relations. The result of Mullers system is a set of six motion classes: leave,

hit, reach, external, internal, and cross.

Asher and Sablayrolles (1995) offer a related account of motion verbs and spatialprepositional phrases in French. They propose ten groups of motion verbs as follows:

sapprocher (to approach), arriver (to arrive), entrer (to enter), se poser (to alight),

sloigner (to distance oneself from), partir (to leave), sortir (to go out), dcoller (to

take off), passer (par) (to go through), and dvier(to deviate). This verb classification

is more fine-grained than Mullers. Asher and Sablayrolles, however, do not have any

groups that match well with Mullers internaland external. In addition, Muller does

not include a class for the inverse of hit. The most striking difference between the

accounts is that Asher and Sablayrolles include a notion of metric distance that

Muller does not. This allows the separation of verbs such as approach and reach.For Muller, approach would have to be a simple external motion, which does not

adequately capture the meaning of this verb.



25/29

How do the semantic classifications of Muller, Asher, Sablayrolles, and Vieu

among others relate to those in VerbNet and FrameNet? To answer this, Pustejovsky

and Moszkowicz (2008) mapped Asher and Sablayrolles verbs to VerbNet classes.The mapping revealed that while many of the motion predicates we care about have

specific classes in VerbNet, it is not always clear what these classes have in common

unless we look to FrameNet to find a higher level representation. Pustejovsky and

Moszkowicz (ibid.) therefore considered a mapping to FrameNet, arriving at a more

expressive verb classification. The resulting ten classes are based largely on Mullers

classifications with some very slight modifications detailed in Table 1.1, along with

some revisions we have made. Here X means there is no mapping.

1.4.2.7 Compositional semantics, revisited So far, we have discussed motion verbs aswell as spatial prepositions separately, but of course when they combine together in

sentences there is the question of specifying and composing together the meanings of

each constituent. Our approach, discussed in Chapter 4, leverages a richer semantics

for nouns, prepositions, and motion verbs that allows one to parcel the meaning

contributions of the various constituents appropriately, without promiscuously pro-

liferating preposition senses.

For example, in (5b) discussed earlier (the coffee in the cup), cup has a noun

sense as an open container made of solid material used for drinking; this comes out ofits lexical entry, based on the Generative Lexicon (GL) account of Pustejovsky (1995,

2001). The preposition in has a meaning that involves an underspecified notion of

containment, specifically inside a container. Thus, in the cup involves containment

TABLE 1.1. A revised classification of motion verbs

Class Examples FrameNet Muller

Asher and

Sablayrolles

MOVE drive, fly, run Motion or Self -

motion

X X

MOVE_EXTERNAL drive around, pass Traversing External X MOVE INTERNAL walk around the

room

Motion Internal X

LEAVE desert, leave Departing Internal partir, sortir

REACH arrive, enter, reach Arriving Reach arriver/entrer

ATTACH approach Attaching X X

DETACH disconnect, pull

away, take off

X X dcoller

HIT hit, land Impact Hit se poser

FOLLOW chase, follow Co-Theme X X

DEVIATE flee, run from Fleeing X dvier

STAY remain, stay State continue X X

Introduction 25


26/29

inside a drinking instrument. Coffee has a noun sense of being constituted of liquid

material. To glue the two together, to get coffee in the cup, the liquid has to be

contained in the container, and for that its convex hull21 is required to be inside the

container. This is achieved within a compositional semantics using GL (based onnotions ofcoercion and co-composition), via an axiom of world knowledge. In (5c),

spoon is an eating instrument with a handle, and constituted of solid material, and

to be contained in a container, it is sufficient for a part of it to be inside the container.

The details of how this integration is performed compositionally are explored in

Pustejovsky (forthcoming).

Likewise, consider the preposition around. In (25a), the walking is outside the

pool, whereas in (25b), the swimming is inside the pool.

(25a) He walked aroundthe pool.(25b) He swam aroundthe pool.

Clearly, it is the verb which differentiates the spatial relationship between figure

and ground in each case, rather than the preposition. Here, around creates a region

that is displaced relative to the ground region, without committing to the direction of

displacement. It is the medium of the motion (a parameter of verb meaning) that has

a contrasting value in this case: swimming involves water as the medium, whereas

walking involves a solid surface, setting aside some notable (e.g. mythological)

exceptions.This overview of approaches and resources for analysis of motion in language

establishes that while there have been a variety of linguistic theories and resources

that provide a classification of motion verbs, a substantial gap exists in terms of

actually representing the spatial semantics of motion in a manner consistent with our

desiderata. The fact that even basic sense differences such as the distinction between

the motion verbs enter and arrive are not adequately explicated by these theories

shows that they are not expressive enough for natural language. We have suggested

that our account has an improved modularity that allows verbs, nouns, and preposi-

tions to contribute spatial meaning in such a way that these meanings can be composedtogether (within a particular GL-derived compositional account) so as to provide fine-

grained meaning differences, without proliferating prepositional senses. Finally, we

have arrived at a verb classification that builds on and extends earlier ones.

1.5 Caveats

An interdisciplinary book like this one is necessarily restricted in scope, and as a

result there are several deliberate lacunae. First and foremost, the theory being

21 The convex hull of a region, treated as a set of points S, is the boundary formed by the minimalconvex set containing S.



27/29

developed here is essentially a semantic one. As such, questions of pragmatics, which

of course are key to the understanding of language in context, are not addressed. We

have already observed that the meaning of spatial prepositions, even when putting

aside metaphorical uses, can involve functional notions such as support and

affordances, i.e. the nature of interactions with the ground object. An especially

compelling argument implicating functional notions is found in the experiments of

Coventry et al. (2001). They showed subjects pictures of the kind displayed in

Figure 1.1, and asked them to rate the acceptability of sentences of the form theFigure is preposition to the Ground, where the prepositions used were over,

above, under, and below. For example, a given sentence could be the umbrella

is over the man. Not only were the ratings related to the degree of rotation of

the figure from the vertical plane, but ratings for functional scenes (the middle row)

were higher than those for controls (top row), which were in turn higher than for

non-functional scenes (bottom row).

In addition to Coventry et al. (2001), there have been a substantial number of other

psycholinguistic investigations into the acceptability of different spatial terms given

geometric and functional relations between figure and ground, e.g. (Logan and

Sadler, 1996; Garrod et al., 1999; Carlson et al., 2003; Coventry, 2003), with the latter

two developing a psychologically-grounded computational model that integrates

FIGURE 1.1 Acceptability ratings, rotation, and functional information, from Coventry (2003, p. 60)

Introduction 27


28/29

these two types of relations. We will not survey these here; suffice it to say that in our

framework, as discussed in Chapters 3 and 4, we do not as yet address such functional

information or different degrees of centrality in word meaning.

Other topics that we leave out include perceptual accessibility (e.g. visibility andocclusion) of the objects to the viewer. Nor do we consider the pragmatic conditions

under which particular spatial references take place and succeed (e.g. the speaker s

choice of a reference frame and point-of-view, the details of a spatial description in

the presence of particular distractors in the environment, etc.). A good discussion of

these and other factors is found in the work of Tenbrink (2007). Finally, a book of this

limited length cannot claim to offer a thorough survey of the field; in the course of

our exposition, the best we can do is to cite other papers that introduce the reader to

the relevant literature.

1.6 Conclusion

Let us first summarize the argument so far. We launched this book with a discussion

of the substantial challenges faced by todays text-to-sketch technology in terms of

comprehending natural language. We based our approach on two key insights from

the previous literature: research on the types of spatial abstractions underlying

language use, and the distinction between satellite-framing patterns (used with

manner-of-motion verbs like

bike

,

drive

,fl

y

etc.) and verb-framing patterns(used in path-verbs such as arrive, depart etc.). The former provides inspiration

for our account of qualitative spatial relations based on a theory of mereotopology, to

be explicated in Chapter 3. The latter distinction motivated our differentiating, in our

semantic theory, between action-based and path-based predicates, leading to a first-

order dynamic logic (discussed in Chapters 2 and 4) where events are modeled as

dynamic processes or static situations.

For the approach to be of practical use in computational approaches, five specific

requirements have to be met. When considered in the light of these requirements, the

prior theories of spatial prepositions turned out to be rich in fundamental insights,but made assumptions untenable for a computational approach, while also ignoring

evidence from corpus-based word-sense disambiguation. While compositional treat-

ments of the semantics of spatial prepositions were available, the question of what

underlying spatial primitives to rely on was not tied to those available in qualitative

reasoning systems. As for motion verbs, we found a gap in terms of a lack of

expressiveness and some specific shortcomings with respect to our desiderata. We

indicated how the compositional integration of prepositional, verb, and noun mean-

ings will be handled in our framework. We also proposed what we believe to be amore expressive verb classification than has been hitherto considered. Finally, we

listed some of the obvious lacunae in our approach.



29/29

In Chapter 2, we will delve more deeply into how motion is expressed in natural

languages, introducing a framework that analyzes different parameters of spatial

meaning in natural language in terms of successively more expressive representation

languages. Following that, in Chapter 3, we will examine spatial and temporalrepresentations and inference methods that have been developed based on qualitative

reasoning, applying them to spatial phenomena in language involving topological

and orientation relations. Chapter 4 applies the methods discussed in Chapters 2 and 3

to motion, providing a grounding for the semantics of motion expressions in

language within a cognitively inspired spatiotemporal model of change. We demon-

strate how the two linguistic strategies for encoding motion (that of path construc-

tions and manner-of-motion constructions) can be modeled within an operational

(dynamic) interval temporal logic. We also show how prepositional, noun, and verb

meanings are integrated together