Motion - Preps Verbs Pustejovsky

Embed Size (px)

Citation preview

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    1/29

    1

    Introduction

    1.1 Overview

    1.1.1 MotivationNatural language abounds with descriptions of motion. This is hardly surprising,

    since our environment teems with slithering, swimming, flying, and cruising crea-

    tures that navigate in a world with natural elements that can spin, flow, slide, whirl,

    etc. Our experience of our own motion, and our perception of motion in the world,

    together have given human languages substantial means to verbally express many

    different aspects of movement, including its temporal circumstances and its spatial

    trajectory and its manner. In every language on earth, verbalizations of motion can

    specify changes in the spatial position of an object over time. In addition to when and

    where the motion takes place, languages additionally characterize how the motion

    takes place: its path, its manner, how it was caused, etc. The path of motion, in

    particular, involves conceptualizations of the various spatial relationships that an

    object can have to other objects in the space it moves in.

    Physicists and philosophers have long theorized about the nature of space and

    spatial relationships. Newton (1995) believed that space has an existence independent

    of physical objects, an absolute space that will remain always similar and immov-

    able (Newton 1995, Scholium 3). Objects, in his account, occupy places that are part of

    absolute space, which affords a universal coordinate system with objects and theirrelationships being characterizable in terms of Euclidean geometry. This sort of model

    of space underlies most of the classical, pre-relativistic analyses of motion in physics.

    The conception of space found in natural languages is quite different. As we shall

    see, it allows for positioning objects in terms of coordinate systems, but does not have

    built-in a universal, absolute coordinate system that allows for precise specification of

    object positions. (Of course, languages can in many cases specify relatively precise

    positions by importing absolute coordinate systems.) Typically, a figure object is

    expressed as being in a particular orientation (left, east, under, etc.) with respect

    to another reference or ground object and possibly a third object, the viewer

    (Levinson 2003). A figure object can also be positioned in terms of topological

    relations (inside, separate from, etc.) along with distance from a ground object.

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    2/29

    When objects are positioned without a reference object, the descriptions can indicate

    paths in a coordinate system (to the east or seaward). Space in language, at least

    in terms of the way it is revealed by the use of closed-class terms for topology and

    orientation, seems to be parasitic on objects and the relations between them, and canbe broadly described as incorporating a relational view of space.1

    This book articulates a new computational linguistics approach to understanding

    natural language descriptions of motion. Our goals are theoretical as well as prag-

    matic. From a theoretical standpoint, we aim to provide a semantic theory of motion

    expressions that can be used for computation. This sort of theory involves mapping

    motion descriptions in natural language to formal representations that computers

    can automatically reason with. As we shall see, such reasoning uses qualitative

    models of space and time, making inferences about changes in the positions of

    objects over time. From an empirical standpoint, we want our theory to mesh well

    with natural language data, and so we allow our computational methods to avail of

    information found in text corpora.

    The ability to create computer programs that can automatically process large

    corpora containing descriptions of motion has an important practical consequence:

    it allows us to map from texts to data representations that can be of immense value in

    everyday life. For example, a system could take a set of verbal directions for getting to

    a particular place, and automatically transform it into a map with trajectories marked

    on it. Narratives of journeys taken today and long ago could be parsed into logs thatrecord where, when, and how the various segments of the journey were carried out.

    Documents involving media such as pictures and videos that have associated linguis-

    tic annotations can be analyzed so as to retrieve spatial, temporal, and motion-related

    information from collections of such media on the Web.

    In this chapter, we will first discuss the challenges in linguistic analysis and

    inference that are faced by such systems. After outlining our technical approach,

    we highlight two key insights that inform our work. The challenges and our approach

    give rise to a set of requirements that have to be met, in our view, in order to achieve

    success; this constitutes a short list of desiderata. Last but not least, all research builds

    1 The natural language-derived relational view of space that we have sketched is often viewed as being inconformity with Leibnizs philosophy of space. Leibniz denies the reality of an absolute space out there,arguing that space is a mental construct arising from an ordering of physical objects (like time, which he

    views as a mental construct arising from an ordering of events). Specifically, an objects physical location isdetermined by its relation to that offixed (what we might call ground) objects: Particularly, that placeis that, which is the same in different moments to different existent things, when their relations ofco-existence with certain other existentes, which are supposed to continue fixed from one of those

    moments to the other, agree entirely together. [ . . . ] Lastly, space is that which results from places takentogether. (Clarke 1717, p. 199; my elisions indicated by [ . . . ]). Leibnizs places are thus defined in terms ofrelations between objects, similar to the situation revealed in natural language usage. However, naturallanguage and its analysis has nothing to say about the metaphysical question as to whether space exists or isa mental construct.

    2 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    3/29

    on the labor of others; to situate our work and convince the reader that we have

    something interesting and plausible to say, we compare and contrast our work with

    previous research in linguistics on spatial prepositions and motion verbs.

    1.1.2 Challenges

    In order to interpret motion expressions in natural language, each sentence has to be

    first parsed along with morphological analysis, and once a syntactic structure is

    arrived at and disambiguated from among alternative parses, the predicates and

    their semantic arguments have to be identified, with the latter classified in terms of

    their semantic roles (the agent of the event, the theme, the manner and path of

    motion, etc.). To carry this out, the system must have knowledge of the morphology

    and syntax of the language, as well as the mapping between the semantic arguments

    of different lexical predicates on one hand, and on the other, the syntactic constitu-

    ents (arguments) these predicates can combine with (i.e., subcategorize for) as well as

    additional phrases (adjuncts) that co-occur with them in the sentence. This sort of

    information is usually represented in a lexicon for the particular language. In

    addition, the events must be anchored to the times they are purported to occur in.

    For example, in the sentence The Princess of Wales arrived at a Christmas concert

    last night, the syntactic subject The Princess of Wales has to be identified as the

    Theme of the predicate arrive, at a Christmas concert as its Goal, and last night

    as its Time. In addition, last night must be pegged to a time that is on the previousnight with respect to the speech time (which could of course be on the same day as

    the speech time).

    Here tense has to be recognized. Some languages (like the Bantu language Chi-

    Bemba) have several past and future tenses; some, like Mandarin Chinese, do not

    have grammatical tense; and still others like Burmese distinguish only between

    ongoing or past events and others. These apparent linguistic peculiarities (which

    are in fact entirely normal for the speakers of those languages) have to be taken into

    account, along with context, to situate the event with respect to the speech time.

    Events also have to be ordered with respect to each other, which can be non-trivial

    when events are narrated in an order different from that of their occurrence. The

    results of these inferences have to be represented in terms of an inventory of temporal

    relations that is drawn from some calculus that deals with orderings in time. Time

    expressions must also be resolved, to calendar times where possible.

    These inferential tasks can be fairly challenging for computational approaches,

    because most narratives will not explicitly date each event, and when time and

    date expressions are used, they may be anaphoric, i.e., relative to times introduced

    earlier in the discourse (as in arrived on Tuesday). Further, the inventory oftemporal relations in the calculus used must be expressive enough to capture the

    distinctions between temporal relations found in any natural language; and it is also

    Introduction 3

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    4/29

    desirable to be able to carry out efficient computations using the calculus. This

    reflects an important desideratum: the semantic representations need to be expressive

    enough for natural languages, but also must be amenable to inference methods that

    can be used in practical systems.Turning to spatial information, spatial references in the form of place names

    (toponyms) mentioned in text must be identified and, when geographic in nature,

    resolved to particular entities such as countries, mountain ranges, cities, etc., and when

    construed as points, resolved to geo-coordinates where possible. This resolution

    process can involve considerable disambiguation, as humans naturally tend to reuse

    names when naming places as well as other entities. Spatial relationships involving

    topological, orientation, and distance relations between places must be recognized.

    This too can be challenging, due in part to the ambiguity of prepositions and adver-

    bials. The unraveling of directions, in particular, can be notoriously difficult, as any

    driver navigating from others helpful verbal directions can attest. In addition, some

    languages have fairly elaborate inventories of closed-class terms for representing spatial

    relations. For example, Talmy (2000) cites the (now extinct) Californian language

    Atsugewi which has a set of suffixes appearing on the verb that mark some 50

    distinctions of Ground geometries and the paths that relate to them. Some dozen of

    these suffixes represent distinctions covered by the English preposition into, which

    does not itself reflect such finer subdivisions. (ibid., p. 192). As with time, these spatial

    relations must be represented in terms of some calculus that characterizes orderings inspace. Such a calculus must, of course, also satisfy the desideratum above.

    The above inferences are just prerequisites for interpreting motion expressions.

    Once the events are anchored to times, and the objects participating in the events are

    located with respect to other objects in terms of spatial relations, motion events have

    to be analyzed. In particular, information from the lexicon such as the class of the

    motion verb must be brought to bear on the analysis; for example, run is a manner-

    of-motion verb, while arrive is a path verb. This will allow the system to character-

    ize motion events in terms of the event or situation involved in the change of

    location, the object that is undergoing movement (the figure), the region (or path)traversed through the motion, a distinguished point or region of the path (the

    ground), the manner in which the change of location is carried out, and the medium

    through which the motion takes place. Once the motion is grounded in this way by

    linguistic analysis, qualitative reasoning tools must operate on the underlying repre-

    sentation, allowing inferences to be made. Maps and other visualizations that track

    the movements of entities may also be generated from the representation.

    1.1.3 ApproachThese requirements present a set of formidable problems for automatic interpreta-

    tion of motion expressions in language. However, writing in the second decade of the

    4 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    5/29

    21st century, we believe computational approaches have started to address these

    challenges. The goal of our book is to flesh out a computational approach, addressing

    for the first time in a systematic manner the integration of the language of motion

    with qualitative reasoning. This integration is evaluated in terms of the desideratumabove, discussed in Chapters 3 and 4, highlighting gaps and outstanding problems.

    We also indicate along the way, in Chapter 5, the performance accuracies of practical

    systems.

    Our approach integrates together the linguistic conceptualizations with the formal

    methods, mapping one to the other in the context of natural language processing.

    Our approach is empirical, driven by instances of language use found in text collec-

    tions (or corpora), especially the newsletters, travel blogs, route directions, etc., found

    on the Web. In terms of methodology, these corpora are first annotated by humans

    with features reflecting the kinds of linguistic distinctions and analyses mentioned

    above. Computers then mine the annotated corpora to learn automatically how to

    reproduce the annotations, using a variety of machine-learning tools. These annota-

    tions are then mapped to the representations used by the formal models, allowing

    reasoning to be carried out over motion information captured from natural language.

    Throughout, the goal of satisfying the above desideratum is addressed to the extent

    possible. The details of this methodology are described in Chapter 5.

    The automatic systems that result from training on the annotated data offer both a

    working embodiment of the theory and the modularity that it defi

    nes, as well aspractical tools that can interpret motion expressions in language and generate

    visualizations including maps and sketches. From a theoretical standpoint, this

    methodology allows linguistic theories to be tested empirically, both in terms of the

    breadth of their applicability when faced with actual language use, as well as the

    precise linguistic representation that should result for each example. This test also

    involves measuring the reliability of humans in terms of the annotations that they

    produce. In practical terms, the approach results in systems with a text-to-sketch

    capability that can display tracks on a map of where a moving object has been at

    particular times. For example, given a bikers travel blog as input, a map with trackscould be generated as output. The resulting systems can be evaluated and compared

    with each other, stimulating in turn the development of new and better methods.

    In a nutshell, we offer an integrated perspective on how language structures

    concepts of motion, and how the world shapes the way in which motion is linguisti-

    cally expressed. The books approach is two-pronged: analysis of the details of

    language use in different contexts (based on the exploitation of linguistic corpora),

    along with theoretical modeling and formal reasoning (based on qualitative

    representations).While there has been a great deal of linguistics research on the semantics of motion

    verbs as well as locative constructions, and considerable research on qualitative spatial

    reasoning, there has been little interdisciplinary effort on trying to connect these two

    Introduction 5

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    6/29

    fields in a systematic way. This is the first book, we believe, to analyze concepts of

    motion in language while integrating these two fundamental points-of-view.

    In the rest of this chapter, we outline two key insights that inform our approach.

    After discussing our desiderata, to further situate our approach, we differentiate ourframework from other work in linguistics, as well as compare our classifications and

    semantics for motion with other relevant approaches.

    1.2 Key insights

    1.2.1 Spatial abstractions

    One of the key insights from prior research has to do with the types of conceptualization

    needed to understand spatial language, e.g., Miller and Johnson-Laird (1976), Herskovits

    (1986), Talmy (1983, 2000), among others. For example, research by Talmy (1983, 2000)

    has characterized various primitive templates or schemas for representing motion. In a

    description like (1), a complex spatial scene is abstracted as a geometric point (the

    figure) moving towards another point (the ground) for a bounded temporal extent.

    Likewise, a moving object may be described as a point moving along a path that is a line

    (2), or as a line moving coaxially along the linear path (3).

    (1) The ball rolled toward the lamp for 10 seconds.

    (2) The ball rolled across the railway bed.

    (3) The trickle flowed along the ledge.

    The idealization is such that the speaker is able to abstract away from irrelevant

    details such as the length or orientation of the path, representing each spatial scene

    using a schema, and the hearer in turn is able to recreate the scenes from the schema.2

    Talmy points out that these representations do not rely on Euclidean geometry and

    the properties of metric spaces, emphasizing instead topological relations that remain

    invariant irrespective of changes in sizes, distances, and shapes of the objects. He also

    points out that while the expressions for the geometries of figure objects tend to be

    limited in variety, the geometries of ground objects, by contrast, are less constrained

    and vary considerably with the language, including bounded planes (e.g., the bike

    sped across the field/around the track), cylindrical forms (the bike sped through

    the tunnel), a wide variety of different types of enclosures (I crawled out the

    window, I ran in the house), etc.

    A related set offindings has to do with the differences across languages in the way

    one can specify a figure object as being in a particular orientation (left, east,

    2 The use of such intuitive geometries begs the question as to whether the points being idealized are infact mathematical points. After all, natural language does not typically construe points in space or time asbeing dimensionless; instead, they are all conceived as having extent.

    6 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    7/29

    under etc.) with respect to another reference or ground object and possibly a third

    object, the viewer. Studies of speakers across a wide variety of languages have

    revealed a basic inventory of three types of geometric coordinate systems (frames

    of reference) whose types are unevenly distributed, along with a variety of idiosyn-cratic instantiations, across languages (Levinson 2003). The human ability to refer to

    and pick out objects in space relies on these particular frames of reference. These are

    discussed in more detail in Chapter 3.

    While understanding spatial descriptions appears to rely on interpreting such

    topological and geometrical relationships, it is important to note that it does not

    require precise geometries. Humans, after all, communicate successfully by and large

    without specifying the relatively exact (e.g. GPS) positions of objects and their shapes.

    We are able to describe and understand fairly elaborate motions, without needing to

    drill down into equations that characterize the physical motions signaled by these

    verbs. The use of imprecise and often incomplete qualitative geometric descriptions

    (instead of quantitative ones such as specifying the coordinates and shapes of every

    object) allows human communication to be highly efficient. Our communication

    relies on a rich commonsense model of the world that has proved sufficient for

    humans to survive and evolve until now.

    In turn, this fact has hardly gone unnoticed in artificial intelligence research.

    Having an artificial agent reason qualitatively allows for reasoning to be more

    effi

    cient in some situations, since abstracting away from numerical details allowsthe agent to focus on more compact representations that isolate just the relevant

    information needed to solve a particular problem. AI approaches to qualitative

    reasoning have developed a rich set of geometric primitives for representing time,

    space (includingdistance, orientation, and topological relations involving notions

    such as contact and containment), and together with those, motion. The results of

    such research have yielded a wide variety of spatial and temporal reasoning logics

    and tools. Qualitative Spatial Reasoning has been successfully applied to military

    sketch maps (Forbus et al. 2003), meteorology (Bailey-Kellogg and Zhao 2004), robot

    navigation (Moratz and Wallgrn 2003), integration of sensor information forenvironmental monitoring (Jung and Nittel 2008), etc.

    In contrast, the primitives specified in the linguistic approaches above are not

    expressive enough for formal computational reasoning. To address this gap, in

    Chapter 3, we map the geometric and topological primitives and calculi used in

    qualitative reasoning in a systematic manner to natural language. Our work thus

    allows for more formal and expressive models to be constructed for linguistic

    representations. Our innovations are similar in spirit to Miller and Johnson-Laird

    (1976) and Johnson-Laird (1977), who argued that understanding of language in-volves translating a sentence into an executable program. We are thus committed to

    providing computationally expressive ways of representing motion expressed in

    natural language, in particular subscribing to the idea that understanding motion

    Introduction 7

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    8/29

    in language involves assembling and executing programs. However, the program-

    ming framework we use, discussed in Chapter 4, involves precise formal logics

    developed in computer science, rather than Miller and Johnson-Lairds early and

    somewhat ad hoc procedural semantics.3

    In section 1.3.3, we compare our approachto the semantics of motion with several other approaches.

    1.2.2 Motion semantics: action- versus location-based predicates

    Motion verbs, according to Talmy (1985, 1991, 2000), occur in syntactic constructions

    that express several semantic components: (i) a Figure object that moves with respect

    to (ii) a Ground object, along a spatial region, called (iii) the Path. There are also two

    additional components (called co-events, in keeping with his view that they are

    construable as distinct events): (iv) the Manner of the movement and (v) the Cause

    that is responsible for the motion.

    A further distinction that Talmy makes (one that is largely borne out by cross-

    linguistic research) is that languages have two distinct strategies for expressing

    concepts of motion. In satellite-framing, commonly used in English and other

    Germanic languages, as well as Slavic languages, also called manner-type languages,

    the main verb conflates (i.e., contains a morpheme that encodes) the manner or

    cause of motion, while path information is expressed in satellites.4 Here a satellite is

    any constituent other than a noun-phrase or prepositional-phrase complement that

    is in a sister relation to the verb root (Talmy2000, p. 102), and includes particles,affixes, etc.5 Thus, in (4a), the language represents the motion as an action of

    bouncing, with slid/ rolled/ bounced expressing the manner of the motion,

    and the path being expressed by the satellite down.6 In contrast, in verb-framing,

    found in Turkish, Romance, Semitic, and other languages, also called path-type

    languages, the verb conflates the path, whereas the manner is optionally expressed

    by adjuncts, as in the Spanish (4b).

    3 The procedural semantics of Miller and Johnson-Laird (1976) is based on primitive routines such asfindingin a search domain an entity referred to by a natural language description, testingif the particularproperties predicated by the description hold of it, and actingso as to make the description be true of theentity.

    4 Such manner-of-motion verbs are extremely common in English, as attested by the long list of suchverbs in the verb classification of Levin (1993).

    5 Talmy (1991) characterized satellites in more detail: The satellite, which can be either a bound affix ora free word, is thus intended to encompass all of the following grammatical forms, which traditionally havebeen largely treated independently of each other: English verb particles, German separable and inseparable

    verb prefixes, Latin or Russian verb prefixes, Chinese verb complements, Caddo incorporated nouns and

    Atsugewi polysynthetic affixes around the verb root.

    (Talmy1991, p. 486).6 Likewise, in the napkin blew off the table, the verb conflates the Cause of the motion, with the pathbeing expressed by the satellite off, In addition to Manner/Cause and Path conflation, Talmy (1985)points out that verbs can also conflate Figure information, as in the Atsugewi verb root -caq-, whichmeans for a slimy lumpish object (e.g., a toad, a cow-dropping) to be move/be located.

    8 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    9/29

    (4a) The rock slid/rolled/bounced down the hill.

    (4b) La botella entr a la cueva (flotando)

    the bottle moved-in to the cave (floating)

    The bottle floated into the cave.

    Here the language represents the motion as a change of location. Note that there

    are exceptions; English has Romance-derived verbs like enter, arrive, ascend

    etc. that encode path. As Talmy (1985) points out, the (small number of) verbs in

    English that conflate Path are mostly Romance borrowings.

    Now, various scholars including Talmy have recognized that this classification is

    not quite disjoint. For example, in languages involving serial verb compounds, like

    Lahu, Thai, and Mandarin Chinese (Slobin 2004), it is unclear which one is the main

    verb; and in Native American language families such as Hokan and Penutian, path

    and manner morphemes together form part of a verb complex, with neither one

    being classifiable as a main verb or satellite (Delancey 1989). Also, in the Australian

    language Jaminjung, motion is expressed by one of five core verbs combined with

    preverbs that encode both path and manner with neither one being of subordinate

    status (Schultze-Berndt 2000). All such languages have been designated by Slobin

    (2004) as belonging to a third category instantiating equipollent-framing, where

    both manner and path are equally salient. In response, Talmy (2009) has accepted

    that cases of equipollent framing definitely exist. For example, based on a set oflinguistic criteria for what constitutes a main verb, he points out that in the case of

    Mandarin serial verbs, the verb in the first position is clearly the main verb, while the

    verb in second position is sometimes viewed as subordinate, and sometimes a main

    verbin the latter case, demonstrating equipollent framing. However, such in-

    stances, he shows, are relatively rare.

    Given this qualified but fundamental linguistic distinction,7 the semantic repre-

    sentations for verbs can involve two classes of logical predicates: action-based

    predicates (e.g., manner-of-motion verbs found in satellite-framing patterns, like

    bike, drive, fly, etc.) and location-based predicates (e.g. for path verbs found inverb-framing patterns, such as arrive, depart, etc.). Action-based predicates do

    not make reference to distinguished locations, but rather to the assignment and

    reassignment of locations of the object, through the action. Since the location-based

    predicates focus on points on a path, we view them as making reference to a

    distinguished location, and the location of the moving object is tested to check

    its relation to this distinguished value.

    The predicate semantics makes use of Dynamic Interval Temporal Logic (DITL)

    from Pustejovsky and Moszkowicz (2011), which in turn blends dynamic logic (Harel

    7 For equipollent languages, our semantic representation will thus have to make use of a combination ofaction- and location-based predicates.

    Introduction 9

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    10/29

    1984) with a first-order linear temporal logic (Allen, 1984; Moszkowski, 1986; Manna

    and Pnueli, 1995; Krger and Merz, 2008). DITL is a hybrid, first-order dynamic logic

    where events are modeled as either dynamic processes or static situations. Here event

    expressions refer to simple or complex programs, and states refer to preconditions orpost-conditions of these programs. Assignment-of-location is modeled as an atomic

    program, and change-of-location is modeled as a compound program, whose

    relation is determined compositionally by the relations denoted by its atomic parts.

    This approach to modeling the semantics of motion is discussed in more depth in

    Chapter 4.

    There are obvious subtypes of action-based predicates, due, for example, to the

    type of vehicle involved in the motion (bike, drive, etc.). Just as important are

    aspects of manner defined in terms of topological constraints between the objects

    throughout the motion. Consider a figure object that is moving with respect to a

    ground object. Here we can consider four subclasses, based on the orientation of the

    figure with respect to the ground, whether the topological relation is constant

    throughout the process of motion, whether it involves all of the figure or only a

    part thereof, and characteristics of the medium in which the figure moves.

    Similarly, location-based predicates can be differentiated according to how many

    formal qualitative dimensions are involved in their definitions. For example, the

    simplest path is merely an implicit line associated with a distinguished end or

    start point, as in the case of thetopological path

    verbs

    arrive

    ,

    exit

    ,

    take off

    ,

    etc. This can be further refined to make reference to orientation or direction, as

    in the orientation path verbs climb and descend, metric information, as in

    the topometric verbs approach, near, etc., or a combination of both, as in the

    topometric orientation expressions just below or just above.

    In this book, we will examine how these categories and subcategories of motion

    predicates are expressed through qualitative spatial and temporal models. In the next

    section, we critically assess, in the light of our approach, prior work on the semantics

    of spatial prepositions, verb classification, and motion verb semantics.

    1.3 Desiderata

    The challenges we identified earlier can only be met if we constrain our approach to

    meet some strict requirements. These have to be borne in mind when we assess any

    technical approach, both ours as well as that of other research. We list these now,

    while delving into them further throughout this chapter and book.

    1. As mentioned earlier: the semantic representations need to be expressive

    enough for natural languages, but also must be amenable to inference methodsthat can be used in practical systems.

    10 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    11/29

    2. The semantic theory must be denotational, i.e. provide a mapping in terms of a

    model of things in the world.

    3. The semantic analysis must be compositional, i.e., the meaning of sentences

    must be built up systematically from the meanings of the constituent phrasesand in turn the lexical elements in them, in tandem with the syntactic opera-

    tions that assemble them.

    4. The representations used have to support qualitative reasoning.

    5. The systems built must be evaluated to be accurate and efficient enough to

    support practical applications.

    1.4 Theoretical background

    1.4.1 Spatial prepositions

    1.4.1.1 Classic studies There has been considerable prior research on motion verbs

    (e.g. run), spatial prepositions (across), adjectives (narrow), adverbs (far),

    nouns (lake), proper names (San Francisco), and other locative constructions.

    We focus here on spatial prepositions and adpositions. Two key issues emerge from

    the prior research. The first issue is the nature of the spatial representations involved,

    and the second issue is what exactly differentiates the different senses to produce

    polysemy. Underlying them both is a third issue, the characteristics and properties of

    a theory of meaning.Prepositions are traditionally classified as either directional or locative (Miller and

    Johnson-Laird 1976; Herskovits 1986; Zwarts and Winter 2000). Directional ones involve

    a path and/or movement, and include across, around, from, into, onto, and

    to. Locative prepositions are sub-classified into projective ones, which involve a point-

    of-view (e.g. above, behind, below, beside, in front of, over, under) and

    non-projective ones (e.g. at, between, in, inside, on, outside, near).

    The work of Miller and Johnson-Laird (1976) represents a significant advance in the

    modeling of the semantics of spatial prepositions. Consider their analysis ofin asin(5):

    (5a) a cityin Sweden

    (5b) the coffee in the cup

    (5c) the spoon in the cup

    (5d) the scratch in the surface

    (5e) the bone in the leg

    In (5a,b), the figure is entirelyenclosed within the ground object, whereas in (5c)

    part of the figure need not be enclosed in the ground. In (5b,c), the ground object is

    conceptualized as some form ofcontainer. In (5d,e), the figure is entirely enclosed inthe ground object, with (5d) dealing with two-dimensional (2D) objects and (5e)

    dealing with three-dimensional (3D) objects. To handle these cases, Miller and

    Introduction 11

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    12/29

    Johnson-Laird develop a semantic theory of parthood and topological relations, i.e.

    mereotopology. In their account, in has a common meaning in the above uses: the

    figure has a part that is totally inside the ground object.8 Providing a theory of

    mereotopology, built, say on primitive notions of connection and parthood, isessential, we believe, to characterizing of spatial relations. Such a theory will be

    discussed more in Chapter 2 and formalized in Chapter 3.

    Likewise, consider the uses of on in (6).

    (6a) the scratch on the surface

    (6b) the picture on the wall

    (6c) the lamp on the table

    (6d) the house on the river

    (6e) the boat on the river

    Miller and Johnson-Laird point out that in (6ac), the relation is between surfaces.

    In (6b), part of the figure is over a part of the ground (such as a hook), and the latter

    part supports the rest of the figure. In (6c), if the table is on a rug, which is on the

    floor, it is fine to saythe table is on the floor, because the region of interaction with

    the floor includes the table legs. But the transitivity is limited: we cannot say in (6c)

    that the lamp is on the floor. Searching the region of interaction with the floor will

    not reveal the lamp.

    Functional notions such assupport

    and

    regions of interaction

    (or

    affordances

    of objects (Gibson 1977)) are part and parcel of a theory of spatial relations; in this

    book, though we will take note of their presence, we will not be formally representing

    functional notions, as they presuppose a great deal of commonsense knowledge that

    is difficult to acquire and represent in a general way for use in practical systems. Of

    course, in specific domains, it is possible to enumerate object-specific functional

    properties (including shape). For example, in their natural language-driven scene

    rendering system, Coyne and Sproat (2001) associate 3D regions called spatial tags

    with objects, so that the object representing daisy has a stem spatial tag and

    likewise test-tube a cup spatial tag. Given the input expression the daisy is in thetest tube, the graphical output has the daisys stem inserted into the test tubes

    cupped opening. A similar approach could be used to represent the meaning of (5c).

    However, his daisy is in the scrapbook would presumably require an entirely

    different spatial tag for daisy, begging the question of the enumeration of

    domain-independent functional properties for each object.

    Regarding (6d), it involves a path that is potentially ambiguous between being on

    the edge of the ground object (the river) and being on the surface of the ground object

    (where the surface is that part of the object that will refl

    ect light to the eye or that can

    8 In their semantic framework, the relations are between percepts of figure and ground, rather thanbetween things in the world.

    12 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    13/29

    be explored by touch), with a strong preference for the former (in contrast to (6e)).

    Based on this and other evidence, Miller and Johnson-Laird argue that on has two

    spatial meanings: either the figure is part of the region of interaction with the surface

    of the ground object, with the ground supporting the figure, or else the figure object isconstrued as being in a path relation with the ground object.

    In subsequent research, Herskovits (1986) proposed underlying geometric mean-

    ings for spatial prepositions in English involving geometric relations between figure

    and ground objects; these relations are between objects construed as points, lines,

    surfaces, volumes, and vectors. The preposition on in (7a), for example, involves

    concepts ofcontiguity(the figure is next to and touches the ground object) and (as

    we have seen) support (the ground object supports the figure). However, in (7b),

    contrary to Miller and Johnson-Laird, she argues that support is not involved.

    (7a) The bookon the table.

    (7b) The wrinkles on his forehead.

    In addition, the objects related by a preposition must be modeled in terms of their

    geometric properties, expressed as geometric functions that define characteristics of

    the space occupied by the object. For example, a table is geometrically constrained to

    be bounded and definite in shape, whereas water is not. Other geometric functions

    include idealizations (approximations to a point, line, surface, or plane), parts (e.g.

    edges, bases, surfaces, etc.), axes, volumes, projections, and what she callsgood-

    form. For example, in (8a), good form provides the Gestalt closure on the tree such

    that a bird can be contained in the space occupied by that form, shown in (8b), from

    Pustejovsky (1989).

    (8a) The bird in the tree.

    (8b) Included-in (Part (Place (Bird)), Interior (Outline (VisiblePart (Place (Tree))))).

    Turning to the issue of polysemy, Herskovits argues that (7a) above expresses an

    ideal meaning of on, whose sense is shifted in (7b). Senses can also shift due to a

    pragmatic degree oftolerance, i.e. to handle fuzzy cases of (7a) where the book is on atable cloth which is in turn on the table. As a result, while an ideal meaning is semantic,

    the actual senses in use are produced as pragmatic alterations to the ideal meaning.

    From the standpoint of a theory of meaning, Herskovits account rejects the notion

    of a compositional theory. Further, although there is a sketch of a mereotopology,

    there is no precise theory of how exactly the pragmatic alterations occur, resulting in

    a lack of applicability to computational processes.

    1.4.1.2 Cognitive linguistics Along with Herskovits work, there has been a great

    deal of activity in cognitive linguistics on the semantics of spatial prepositions. Herewe will consider some of the core work from this area, while deferring a discussion of

    Jackendoffs contributions to the next section.

    Introduction 13

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    14/29

    One of the fundamental tenets of this rather diverse field is that human concepts

    are embodied, i.e., the concepts we have access to and the nature of the reality we

    think and talk about are a function of our embodiment (Evans et al. 2007, p. 7).

    Following (Johnson 1987; Lakoff and Johnson 1980; Brugman 1981; Mandler 2004;Evans, op. cit.), basic topological concepts like contact and inclusion (in the spatial

    sense ofenclosure) are formed through the infants interaction with objects. In this

    account, it is the schema of the container which underlies both the enclosure or

    inclusion sense of in in (9a) and its metaphorical extension in (9b).

    (9a) The cat is in the house.

    (9b) The cat is in trouble.

    The nature of polysemy is a contentious issue in cognitive linguistics. Consider the

    preposition over, which has been the subject of considerable discussion. The classic

    account of Lakoff (1987) makes fine-grained sense distinctions for the preposition

    based on characteristics of the figure and ground object. In (10a), the landmark (i.e.,

    ground object) is an extended object, but not so in (10b) (examples from Tyler and

    Evans 2001):

    (10a) The helicopter hovered over the ocean.

    (10b) The hummingbird hovered over the flower.

    Likewise, in (11a) there is contact with the wall, whereas there is not in (11b); in(11c), there is covering and occlusion of the ground. These differences would

    warrant, in the classic account, different senses for over.9

    (11a) The boy climbed over the wall.

    (11b) The tennis ball flewover the wall.

    (11c) Joan nailed a board over the hole in the ceiling.

    (11d) The heavy rains caused the river to flowover its banks.

    In general, this sort of argument by appeal to arbitrary spatial distinctions proliferates

    senses in a somewhat unprincipled manner. There is no underlying mereotopologicaltheory, providing no way of building up spatial concepts from more primitive ones.

    Researchers have struggled to constrain the number of senses, using (quite sensi-

    bly) dictionaries, lexical resources, and various theoretical criteria. For example,

    Tyler and Evans (2001) take their cue from Herskovits and propose a proto-sense

    or (primary sense) of every preposition that they argue is the diachronically earliest

    sense;10 the proto-sense ofover means above except that unlike above, there is

    potential contact with the ground. Notably, this sense does not contain path

    9 Examples in (11) from Tyler and Evans (2001, pp. 728, 732, 757).10 Postulating the diachronically earliest sense as more basic in every case does not seem at all correct

    given modern usage.

    14 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    15/29

    information. The above and across interpretation in (11a) and (11b), which does

    include the path, is not a different sense ofover, but arises in conjunction with the

    meaning of the verb and the figure and ground objects. In (11c), however, a non-

    primary sense of

    over

    is differentiated, as it involves the distinct spatial notion ofcovering. In (11d), the sense is distinguished based on a supposedly distinct spatial

    notion of excess given by a cognitive scenario of a container overflowing, with the

    figure rising higher than the top of the ground object.

    The Tyler and Evans proposal suffers from the same problems we observed with

    Herskovits account. Appealing to potential contact between figure and ground only

    serves as a way of grouping together disjunctions. Further, (11d) does not seem to

    warrant a different sense, given the contribution of the verb flow. In addition, as

    Cuyckens (2007) points out, consider (12a) and (12b).

    (12a) The cat jumped overthe wall.

    (12b) The cat jumped up on the wall.

    The only syntactic difference is the preposition, but (12a) results in a different path

    than (12b)the cat ends up on the wall in the latter, but on the other side of the wall

    in the former. Thus over must involve a path meaning. Having said that, the

    question arises as to the set of spatial properties that should be considered when

    distinguishing spatial senses of a preposition. Unless these properties are drawn from

    a structured domain, in particular geometric or topological domains that can bemade mathematically precise, pretty much any set of spatial properties that sound

    relevant might be used, since the theory has no way of evaluating them except by

    arguments based on linguistic tests.

    In general, the inability to find reliable criteria to differentiate word senses is also a

    reflection of the lack of empirical, corpus-based methodology in the cognitive

    linguistics approach. Corpus-level annotation of word senses is a well-established

    task in computational linguistics, e.g. SENSEVAL-1 (Kilgarriff and Palmer 2000). In

    these annotation efforts, fine-grained lexical resources such as WordNet (Fellbaum

    1998), where different senses of words are grouped into synonym classes calledsynsets (with the classes being linked by conceptual relations such as hypernymy

    and part-whole relations), have been used as sense inventories for annotating open-

    class terms in large corpora. Certain senses will of course be more frequent than

    others, and the more frequent ones may coincide with notions of central or more

    salient meanings for a given word. (As it happens, WordNet provides a ranking of

    different senses based on frequencies in the British National Corpus.) This sort of

    project also has the practical benefit of dividing the problem of polysemy into those

    word senses that are easy to agree on and those that arent, focusing attention on the

    ones that pose challenges, and perhaps suggesting revisions or limitations to the

    sense inventory. In SENSEVAL-3 (Mihalcea and Edmonds 2004), annotators agreed

    with each other almost two-thirds of the time.

    Introduction 15

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    16/29

    Turning to the theory of meaning, cognitive linguistics is an inherently mentalistic

    theory of meaning.11 In contrast, denotational theories12 are important for several

    reasons: (i) Truth and reference are important for successful communication, as

    work in discourse modeling, e.g. Kamp and Reyle (1993) indicates. (ii) Mentalistictheories tend not to tell us what role in understanding the things communicated

    about play. As Putnam (1975) points out, a person may not have the conceptual

    knowledge to tell the difference between a beech and an elm, even though the two

    terms clearly refer to different things in the world. (iii) Using a logical representation

    allows for logical inferences to be made, for formal properties of computation to be

    studied systematically, etc. The latter property is of course of considerable interest to

    computational approaches.

    1.4.1.3 Jackendoff In our earlier linguistic analyses, we mentioned paths. In additionto Talmy, another cognitive linguist who provides a rich representation for paths is

    Jackendoff (1983, 1990). In his theory of Lexical Conceptual Structure (LCS), the verbs

    of location and motion are viewed as fundamentally spatial, with non-spatial senses

    being an extension of the spatial senses. Jackendoff gives distinguished status to

    places and paths in LCS.

    Paths can be bounded, where the ground is the start- or end-point of the path.

    Another type of path is a direction, as in (13a), where the ground object does not fall

    on the path, but would if the path were extended some unspecified distance (ibid.,

    p. 165). A third kind is a route, where the ground object is related to some point in the

    interior of the path, as in (14a). Unlike Herskovits account, Jackendoffs semantics

    has an implicit mereotopology and is compositional. He relies on functions to

    assemble meanings of words together to form meanings of phrases. A place-function

    (e.g. IN, ON, INSIDE, UNDER, etc.) takes a Thing and returns a Place, while a path-

    function (FROM, TO, TOWARD, AWAY-FROM, and VIA) takes either a Thing or

    a Place and returns a Path. Examples of place-and path-functions are shown in the

    prepositional phrase meanings in (13b) and (14b).

    (13a) [John ran] towardthe house.(13b) [Path TOWARD ([Thing house])]

    (14a) [The car passed] through the tunnel.

    (14b) [Path VIA ([Place INSIDE ([Thing tunnel)])]

    11 Mentalistic, or representational theories of meaning, are concerned mainly with understanding therelation between linguistic expressions and things in the speakers mind, namely, explaining what goes on

    in peoples minds when they use language.12 Denotational theories of meaning (i.e. as found in model-theoretic semantics) are concerned mainly

    with the correspondence between expressions and things in the environment, and thus this enterprise aimsat a theory of truth and reference. Such theories represent the environment in terms of a formal model forthe denotation of expressions.

    16 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    17/29

    While the semantics of LCS is obviously compositional, it is not intended to be

    truth-conditional, and is thus in keeping with cognitive semantics precepts. Since it

    has no basis in logic, Conceptual Structure cannot be used to make logical inferences,

    and as such cannot account for entailments between sentences.13

    Another drawbackis that the primitives corresponding to prepositions, such as IN, ON, TOWARD,

    INSIDE, etc. are not further elaborated to support reasoning; they are functors in a

    compositional syntax, but are not differentiated from each other in terms of seman-

    tics. Finally, unlike the work say of (Talmy 2000), the geometry used is far too

    abstract to be relevant to computational modeling of spatial reference and motion.

    1.4.1.4 Vector representations It must be acknowledged that Jackendoffs ontology

    of paths and places and the differentiation between place- and path-functions

    constitute one of the more expressive accounts of the semantics of spatial preposi-tions offered within an entirely compositional semantics. His basic notions of paths

    have been further elaborated by others, most notably within a denotational semantics

    by Zwarts (2003). In the latters work, a spatial preposition denotes a set of paths,

    where a path is defined as a continuous function from the real interval [0, 1] to points

    (or regions) in space. The denotation of a prepositional phrase (PP) of the form into

    the room is a set of paths whose end-point is inside the room. Zwarts associates

    events with paths via a function that takes an event and returns its path. Accordingly,

    the denotation of a verb phrase (VP) of the form enter the room is a set of events

    such that (only) the end-point of the events path is inside the room.

    In support of this theory, relations like into, inside etc. are based on an

    underlying model of vectors14 (Zwarts and Winter 2000). Here, the preposition

    inside is treated as a function which maps a set of points representing the ground

    object A to a set of vectors whose start-points are on the boundary of A and whose

    end-points are internal to A. Since there may be multiple vectors from different

    points on the boundary to the particular end-point, only the shortest vector is

    considered. The set of points representing an object is treated as convex,15 in keeping

    with our use of prepositions like

    inside

    to conceptualize even non-convex groundobjects as being convex. As Zwarts and Winter point out, the ball is inside the bowl

    is compatible with a situation where the ball is sitting on the bottom of an open bowl,

    where the ball actually occupies a space that is disjoint from that of the bowl.

    The preposition outside is similar, except that the externally closest vectors are

    involved, i.e. the shortest vectors that start at the boundary of A and end at points

    13 However, a truth-conditional semantics for Conceptual Structure has been demonstrated by (Zwarts

    and Verkuyl 1994), who recast it as a many-sorted first-order logic.14 Other researchers have also explored vectors, including Talmy (2000), Bohnemeyer (2003), OKeefe(2003), and Carlson et al. (2003). However, they have not concerned themselves with building up acompositional semantics for spatial language based on vectors.

    15 A set of points is convex if the line segment joining any pair of points in the set lies entirely in the set.

    Introduction 17

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    18/29

    not belonging to A. As for the preposition on, its meaning is a set of vectors each of

    whose end-points is outside the set of points corresponding to the figure object, but

    whose length is less than some small number, so that distance between figure and

    ground is near zero.Although the theory of Zwarts and Winter (2000) does provide an elegant

    compositional semantics for PPs, including those modified by measure phrases, it

    can be faulted on several grounds. For one thing, though there are vectors and point

    sets, there is no explicit mereotopology. The invocation of metric notions of distance

    to represent topological relations is somewhat counter-intuitive. A related failing is

    that the theory does not distinguish between in and inside, or between at and

    on, and the case of (5c) mentioned earlier, where there is a part of the figure that is

    outside the ground object, is ignored. Finally, carrying out formal reasoning using

    these vector models is still an open question. In short, the theory does not provide an

    adequate grounding in a spatial semantics that can be used for reasoning.

    1.4.1.5 Assessment In summary, then, the prior theoretical research, while

    providing insightful discussions of the semantics of spatial prepositions, has made

    assumptions (such as those of cognitive linguistics) that are untenable in a computa-

    tional approach, and has also largely ignored evidence from corpus-based annotation

    efforts at distinguishing senses in context. While compositional treatments of prepo-

    sitional meaning have flourished, the question of what underlying spatial primitives

    to rely on has not thus far been tied to those available in qualitative reasoning

    systems. In Chapter 3, we explore topological and geometric representations that

    can be used for expressing prepositional meaning in qualitative reasoning systems.

    1.4.2 Motion verbs

    1.4.2.1 Langacker As with spatial prepositions, there has been a fair amount of

    research on the semantics of motion verbs. We had earlier discussed the influential

    work of Talmy and Jackendoff. Another key cognitive linguist who has tackled

    motion is Langacker (1987). It is not possible to do justice to his overall cognitivistphilosophy here; instead, let us get down to brass tacks and examine his analyses of

    motion verbs. Consider the verb enter. Langacker (1987) characterizes it as a

    dynamic process, whose conceptual semantics involves, in effect, a temporally in-

    dexed sequence of relations between the trajector (i.e. movingfigure object) and the

    landmark (i.e. ground object, which may or may not move). The trajector changes

    from a state of being spatially OUT with respect to the landmark to a state of being

    IN with respect to the landmark. From his diagrams of image-schema16 (ibid.

    16 An image schema is a mental pattern that recurrently provides structured understanding of variousexperiences, and is available for use in metaphor as a source domain to provide an understanding of yetother experiences (Johnson 1987, pp. 24).

    18 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    19/29

    p. 245, figures 7.1 and 7.2), it appears that this change of state occurs over a conceived

    time interval, where the process involves a sequence of an indefinite number of

    component states (ibid. p. 244). As for the relations IN and OUT, they are explained

    informally as follows:

    The relation [A IN B], based on immanence, specifies that thecognitive events constituting the conception of A (in a given domain) are included

    among those comprised by B. The relation of separation, which I will give as [A OUT

    B], is based on the absence of such inclusion. (ibid. p. 228).

    In contrast, the verb arrive, according to Langacker (1987), presupposes an

    extended path of motion on the part of its trajectory, but only the final portions of

    this trajectorythose where the trajector enters the vicinity of its destination and

    then reaches itare specifically designated by this verb. (ibid. p. 246).

    Langackers account does clearly capture some of our topological intuitions about

    enter. However, his presentation relies on diagrams representing image-schema,

    and there is no formal description of the process of entering. While one can accept

    the idea of a primitive spatial relation IN standing for inclusion, characterizing it in

    terms of relationships between cognitive events is somewhat vague. Further, there is

    no clear distinction between enter and arrive, except by way of various diagrams

    and the informal definitions above. More specifically, there is no statement that

    arrive involves the trajector, at the end of the process, being merely AT the

    landmark, as opposed to being IN the landmark as in the case of enter. This

    problem is further borne out by his analysis of the verb

    leave

    : (Langacker 1988,p. 96) indicates that the trajector is at first IN with respect to the landmark, and then

    overlaps with its boundary (i.e. trajector is AT the landmark), before being OUT with

    respect to the landmark. Here too, there is no difference from exit.

    Having critiqued his account, it is worth pointing out that Langackers intuitions

    reflect a topological view of motion verbs. In Chapter 3, we will formalize notions

    such as IN in terms of mereotopology, and in Chapter 4, we will provide a formal

    semantics for verbs like enter and arrive that gives a specific computational

    interpretation to notions similar to Langackers.

    1.4.2.2 Jackendoff Let us turn now to the interpretation of motion in Jackendoffs

    LCS (Jackendoff1983, 1990). In LCS, verbs of spatial motion, such as bike, are given a

    common semantic template, which determines their syntactic behavior, shown in (15).

    (15) [Event GO+LOC ([Thing]x, [Path]y)]

    GO is a semantic primitive of motion, which is a function that takes as inputs a Thing

    and a Path and returns as output an Event. GO+LOC involves movement specialized

    to a locative semantic field17. When the above verb template is combined with a path

    PP, we get examples like (16).

    17 Analogously, verbs of temporal motion, such as delay, use GO+TEMP.

    Introduction 19

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    20/29

    (16a) John biked to the store.

    (16b) [Event GO ([Thing John], [Path TO ([Place AT ([Thing store])])])]

    A verb like enter is treated as equivalent to go into, and has the more

    instantiated semantics shown in (17).

    (17) [Event GO ([Thing]x, [Path TO ([Place IN ([Thing]y)])])]

    Note that LCS, in addition to bearing the disadvantages described in the previous

    section, also blurs important differences, since all motion verbs are represented just

    by either GO(Thing, Path), STAY(Thing, Place), as in cling, ORIENT(Thing,

    Path), as in point, BE(Thing, Place) as in lie, and GO_Ext(Thing, Path), as in

    reach, along with their specialization to different semantic fields. The inability to

    distinguish among verb meanings is a serious problem with such highly abstractrepresentations of meaning.

    1.4.2.3 WordNet Given the theories of verb semantics, one would expect that lexical

    resources would exist that provide a rich semantics for motion verbs. Unfortunately,

    this is not the case. We mentioned WordNet (Fellbaum 1998) earlier, and its

    differentiation and ranking of word senses based on corpora. In WordNet, verbs

    are grouped into a hierarchy, with related verbs differentiated by manner into

    troponyms. For example, the troponyms of arrive are: land, reach, flood/drive/

    come in, light, perch, force-land, beach, disembark, debark, set down, touch down, andcrash land. However, while WordNet is widely used for its coverage of relations such

    as synonymy and hypernymy, which is what it was designed for, it is impoverished

    not only in terms of the syntactic representations for the verbs, but also in terms of

    the absence of any semantic representation for lexical items. Consequently, research-

    ers have integrated WordNet with other resources that provide the missing

    information.

    1.4.2.4 VerbNet VerbNet (Kipper et al. 2006) is one such key lexical resource that

    provides syntactic and semantic information about verbs which are grouped intoclasses based on extensions of the well-known classification of Levin (1993). We first

    discuss the latters classification, where verbs are grouped into semantic classes based

    on participating in common meaning-preserving syntactic constructions involving

    syntactic arguments, called diathesis alternations.

    For example, consider the verbs break and cut. As seen in (18) (examples from

    Kipper-Schuler (2005)), break participates in transitive (18a), the simple intransi-

    tive (18b), the middle construction (18c), but not the conative alternation (18d).

    (18a) John broke the jar.(18b) The jar broke.

    (18c) Jars break easily.

    (18d) *John broke at the loaf.

    20 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    21/29

    In comparison, cut participates in the transitive, middle, and conative alternations.

    (19a) John cut the bread.

    (19b) *The bread cut.

    (19c) Bread cuts easily.

    (19d) John valiantlycutat the frozen loaf, but his knife was too dull to make a dent

    in it.

    These differences are grounds, in Levins account, for splittingbreak verbs (along

    with similar-behaving verbs such as chip, crack, crash, crush, fracture, rip, shatter,

    smash, snap, splinter, tear) into a separate class from cutverbs (with fellow-members

    chip, clip, cut, hack, hew, saw, scrape, scratch, slash, snip). In particular, the motion

    verbs (Levin class 51) are grouped into 9 subclasses.

    As Kipper-Schuler (ibid.) points out, this method also produces classes whose

    members are far from synonymous, e.g. the braid class, which counts among its

    members bob, braid, brush, clip, comb, condition, crimp, crop, curl, etc. Further, the

    classes are not disjoint, and some verbs are members of multiple classes with

    conflicting sets of alternations. VerbNet attempts to fix these and other problems

    by refining the classes (e.g. as in Dang et al. (1998), grouping together classes which

    share at least three members), adding new classes, integrating the classes with

    WordNet, and most importantly, providing semantic templates for each of the

    classes.For example, consider the semantics for the path verb arrive in VerbNet (version

    3.1), as in arrived in the US. The entry specifies that the entity that fills the semantic

    role of Theme (the subject noun phrase (NP)) moves during the arrival event, and

    that at the end of the arriving event, the location of the moving object is in the US,

    i.e. the entity that fills the semantic role of the Oblique object (the PP). Thus, the

    semantic information for arrive is expressed as:

    (20) motion(during(E), Theme) location(end(E), Theme, Oblique)

    As we shall see in Chapter 2, arrive is a verb whose meaning involves the figureobject traversing a path that goes from its not being located at the ground object to its

    being at the ground object. Although (20) does not make reference to paths and to

    start(E), VerbNet appears to at least capture part of the meaning.

    However, as Zaenen et al. (2008) reveal, while some of the motion verbs in

    VerbNet (such as carry) have start and/or end point information, others dont,

    leaving a great deal of incompleteness. They argue that although they were able to get

    around some of these glitches and extract change of location information from

    VerbNet by a variety of post-processing rules, there is a more fundamental problemwith the VerbNet approach: the classification is driven by syntactic considerations

    separating arguments from adjuncts. As is well-known, there is no one-to-one

    mapping between syntactic predications and semantic ones. The latter often include

    Introduction 21

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    22/29

    as arguments constituents that are syntactically adjuncts. For lexical resources to be

    helpful in normalizing textual information, they have to encode the distinction

    between syntactic and semantic predication and be systematic about the correspon-

    dence between the two

    . (ibid., p. 390). Their investigation reveals, unfortunately,that VerbNet lacks such a systematic mapping.18

    1.4.2.5 FrameNet Another well-known lexical resource is FrameNet (Baker et al.

    2003), which has been developed based on the underlying theory of Frame Semantics,

    e.g. Fillmore (1976). It involves specifying each lexical items syntactic properties in

    the context of a hierarchy of semantic structures called frames, which represent the

    experiential knowledge evoked by lexical items. The semantic roles of verbs (called

    frame elements) are annotated in terms of corpus examples.

    For example, consider the path verb

    arrive, for which a FrameNet III example is

    shown in (21).

    (21) [The Princess of Wales THEME] arrived TARGET [smiling and laughing DEPIC-

    TIVE] [at a Christmas concert GOAL] [last night TIME].

    In FrameNets view, the lexical entryarrive evokes the frame ofarriving, which

    is a subframe of (i.e. is part of) the traversal frame, which in turn is a subclass of the

    motion frame and involves the Theme changing location with respect to a Path.

    In the motion frame, a Theme starting out at a location expressed by the Source

    role ends up at a Goal location, covering space between the two, expressed by the

    Path role; or else, the Theme moves in a particular Area of Direction, or its Distance

    may be expressed.19 Arriving involves a moving object (filling the semantic role of

    Theme) moving in the direction of a location filling the semantic role of Goal.

    According to the comments for the arrive lexical entry, the Goal is always

    implied by the verb, but may or may not be explicit in the text; it indicates where

    the Theme ends up, or would end up, as a result of the motion. Note that this

    FrameNet representation is weaker than the one we have been advocating, in that it

    doesnt commit to the

    figure object of the Princess of Wales in (21) being located, at

    the point of arrival, atthe ground object (the site of the Christmas concert). In turn,

    FrameNets representation for the preposition at, while it is associated with a

    Locative_relation frame (a subclass of the Trajector-Landmark frame that is derived

    from Langackers account), does not convey any specific semantics for at.

    18 In more recent work, Palmer et al. (2009) have tried to address some of these issues.19 The motion frame is defined as Some entity (Theme) starts out in one place (Source) and ends up

    in some other place (Goal), having covered some space between the two (Path).

    Additional frames thatinherit the motion frame elaborate on this definition. Goal-profiling frames account for verbs such asreach. Source-profiling frames capture verbs from the Leave class. Path-profiling frames are for verbssuch as traverse or cross, and, finally, the manner of motion can be elaborated on in additional framesfor verbs like run and fly.

    22 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    23/29

    Likewise, the verb enter, which is also associated with the arriving frame and

    illustrated in (22), does not indicate that at the end of the event, the figure we is

    inside the ground object the upper room, thus failing to distinguish enter from

    arrive

    (in the latter, the figure is merelyatthe ground).

    (22) We THEME entered TARGET [the upper room GOAL] [by a flight of stairs leading

    from the north side of the yard PATH].

    While FrameNet seems to do well with change of location motions, the hierarchy

    can be confusing. Sometimes the motion frame is directly inherited as in the case of

    the traversal frame. Conversely, the departing frame uses the motion frame (i.e. it

    does not necessarily inherit or specialize the semantic roles of the motion frame) and

    is a subclass of the traversal frame.

    As another example, the manner verb drive is associated with the frame of

    operate_vehicle, which has semantic roles that include those illustrated in (23),

    from FrameNet III.20

    (23a) [Jamie Shepherd DRIVER] droveTARGET [the bucketing old vehicle VEHICLE]

    [out of the estate SOURCE] [towards the main road PATH].

    (23b) [The riders DRIVER] droveTARGET [all over the place AREA].

    (23c) Dhamma is [the charioteer DRIVER] [that DRIVER] drivesTARGET [the chariot

    VEHICLE] [along the road [to Nirvana GOAL] PATH].The frame operate_vehicle is a subclass of the Operating_a_system frame,

    inheriting or specializing all its semantic roles; it also uses the motion frame.

    However, the combined information does not explicitly indicate that driving a

    vehicle involves an iterated change of location. In Chapter 2, we will provide such

    a semantics for manner verbs like drive.

    All in all, while FrameNets rich subclassification of motion verbs and its integra-

    tion of semantics, syntax and corpus data are both impressive and commendable,

    FrameNet does not address or explicitly represent the sorts of spatial relationshipsinvolved in motion that we have been emphasizing. Further, although it has been

    used for inferential tasks such as question-answering (Narayanan and Harabagiu

    2004), FrameNets representation, even when mapped to knowledge representation

    languages such as OWL, is not directly amenable to spatial reasoning. And although

    FrameNet, VerbNet and WordNet have been mapped to each other, e.g. (Shi and

    Mihalcea 2005), such an integrated resource, given the discussion above, also does

    not address our desiderata.

    20 As the FrameNet III website indicates, the semantic role AREA is used for expressions which describe ageneral area in which motion takes place when the motion is understood to be irregular and not to consist ofa single linear path. Locative setting adjuncts of motion expressions may also be assigned this frame element.

    Introduction 23

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    24/29

    1.4.2.6 Verb classifications based on qualitative reasoning Let us now turn to other

    verb classifications, inspired by work in qualitative spatial reasoning (QSR). One of

    the most successful models in QSR, which has been used for static spatial relations, is

    the Region Connection Calculus 8 (RCC-8), (Randell et al. 1992), a calculus grounded

    in mereotopology (to be discussed in Chapter 2). It identifies the following eight

    jointly exhaustive and pairwise disjoint relations between two regions A and B:

    (24) a. Disconnected (DC): A and B do not touch each other.

    b. Externally Connected (EC): A and B touch each other at their boundaries.

    c. Partial Overlap (PO): A and B overlap each other in Euclidean space.

    d. Equal (EQ): A and B occupy the exact same Euclidean space.

    e. Tangential Proper Part (TPP): A is inside B and touches the boundary of B.

    f. Non-tangential Proper Part (NTPP): A is inside B and does not touch the

    boundary of B.

    g. Tangential Proper Part Inverse (TPPi): B is inside A and touches the bound-

    ary of A.

    h. Non-tangential Proper Part Inverse (NTPPi): B is inside A and does not

    touch the boundary of A.

    As we shall see in Chapters 2 and 3, RCC-8 and other systems like it do an adequate

    job of representing static information about space. However, it cannot help us deal

    with motion, since that task requires a temporal component. Muller (1998) proposes

    just such a system, one which merges spatial and temporal phenomena with a

    qualitative theory of motion based on spatiotemporal primitives. This system has

    at its base a topological system borrowed from Asher and Vieu (1995) that is similar

    to RCC-8 but adds the concept of open and closed regions, as well as a set of temporal

    relations that include a relation of temporal connection, along with the standard

    ordering relations. The result of Mullers system is a set of six motion classes: leave,

    hit, reach, external, internal, and cross.

    Asher and Sablayrolles (1995) offer a related account of motion verbs and spatialprepositional phrases in French. They propose ten groups of motion verbs as follows:

    sapprocher (to approach), arriver (to arrive), entrer (to enter), se poser (to alight),

    sloigner (to distance oneself from), partir (to leave), sortir (to go out), dcoller (to

    take off), passer (par) (to go through), and dvier(to deviate). This verb classification

    is more fine-grained than Mullers. Asher and Sablayrolles, however, do not have any

    groups that match well with Mullers internaland external. In addition, Muller does

    not include a class for the inverse of hit. The most striking difference between the

    accounts is that Asher and Sablayrolles include a notion of metric distance that

    Muller does not. This allows the separation of verbs such as approach and reach.For Muller, approach would have to be a simple external motion, which does not

    adequately capture the meaning of this verb.

    24 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    25/29

    How do the semantic classifications of Muller, Asher, Sablayrolles, and Vieu

    among others relate to those in VerbNet and FrameNet? To answer this, Pustejovsky

    and Moszkowicz (2008) mapped Asher and Sablayrolles verbs to VerbNet classes.The mapping revealed that while many of the motion predicates we care about have

    specific classes in VerbNet, it is not always clear what these classes have in common

    unless we look to FrameNet to find a higher level representation. Pustejovsky and

    Moszkowicz (ibid.) therefore considered a mapping to FrameNet, arriving at a more

    expressive verb classification. The resulting ten classes are based largely on Mullers

    classifications with some very slight modifications detailed in Table 1.1, along with

    some revisions we have made. Here X means there is no mapping.

    1.4.2.7 Compositional semantics, revisited So far, we have discussed motion verbs aswell as spatial prepositions separately, but of course when they combine together in

    sentences there is the question of specifying and composing together the meanings of

    each constituent. Our approach, discussed in Chapter 4, leverages a richer semantics

    for nouns, prepositions, and motion verbs that allows one to parcel the meaning

    contributions of the various constituents appropriately, without promiscuously pro-

    liferating preposition senses.

    For example, in (5b) discussed earlier (the coffee in the cup), cup has a noun

    sense as an open container made of solid material used for drinking; this comes out ofits lexical entry, based on the Generative Lexicon (GL) account of Pustejovsky (1995,

    2001). The preposition in has a meaning that involves an underspecified notion of

    containment, specifically inside a container. Thus, in the cup involves containment

    TABLE 1.1. A revised classification of motion verbs

    Class Examples FrameNet Muller

    Asher and

    Sablayrolles

    MOVE drive, fly, run Motion or Self -

    motion

    X X

    MOVE_EXTERNAL drive around, pass Traversing External X MOVE INTERNAL walk around the

    room

    Motion Internal X

    LEAVE desert, leave Departing Internal partir, sortir

    REACH arrive, enter, reach Arriving Reach arriver/entrer

    ATTACH approach Attaching X X

    DETACH disconnect, pull

    away, take off

    X X dcoller

    HIT hit, land Impact Hit se poser

    FOLLOW chase, follow Co-Theme X X

    DEVIATE flee, run from Fleeing X dvier

    STAY remain, stay State continue X X

    Introduction 25

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    26/29

    inside a drinking instrument. Coffee has a noun sense of being constituted of liquid

    material. To glue the two together, to get coffee in the cup, the liquid has to be

    contained in the container, and for that its convex hull21 is required to be inside the

    container. This is achieved within a compositional semantics using GL (based onnotions ofcoercion and co-composition), via an axiom of world knowledge. In (5c),

    spoon is an eating instrument with a handle, and constituted of solid material, and

    to be contained in a container, it is sufficient for a part of it to be inside the container.

    The details of how this integration is performed compositionally are explored in

    Pustejovsky (forthcoming).

    Likewise, consider the preposition around. In (25a), the walking is outside the

    pool, whereas in (25b), the swimming is inside the pool.

    (25a) He walked aroundthe pool.(25b) He swam aroundthe pool.

    Clearly, it is the verb which differentiates the spatial relationship between figure

    and ground in each case, rather than the preposition. Here, around creates a region

    that is displaced relative to the ground region, without committing to the direction of

    displacement. It is the medium of the motion (a parameter of verb meaning) that has

    a contrasting value in this case: swimming involves water as the medium, whereas

    walking involves a solid surface, setting aside some notable (e.g. mythological)

    exceptions.This overview of approaches and resources for analysis of motion in language

    establishes that while there have been a variety of linguistic theories and resources

    that provide a classification of motion verbs, a substantial gap exists in terms of

    actually representing the spatial semantics of motion in a manner consistent with our

    desiderata. The fact that even basic sense differences such as the distinction between

    the motion verbs enter and arrive are not adequately explicated by these theories

    shows that they are not expressive enough for natural language. We have suggested

    that our account has an improved modularity that allows verbs, nouns, and preposi-

    tions to contribute spatial meaning in such a way that these meanings can be composedtogether (within a particular GL-derived compositional account) so as to provide fine-

    grained meaning differences, without proliferating prepositional senses. Finally, we

    have arrived at a verb classification that builds on and extends earlier ones.

    1.5 Caveats

    An interdisciplinary book like this one is necessarily restricted in scope, and as a

    result there are several deliberate lacunae. First and foremost, the theory being

    21 The convex hull of a region, treated as a set of points S, is the boundary formed by the minimalconvex set containing S.

    26 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    27/29

    developed here is essentially a semantic one. As such, questions of pragmatics, which

    of course are key to the understanding of language in context, are not addressed. We

    have already observed that the meaning of spatial prepositions, even when putting

    aside metaphorical uses, can involve functional notions such as support and

    affordances, i.e. the nature of interactions with the ground object. An especially

    compelling argument implicating functional notions is found in the experiments of

    Coventry et al. (2001). They showed subjects pictures of the kind displayed in

    Figure 1.1, and asked them to rate the acceptability of sentences of the form theFigure is preposition to the Ground, where the prepositions used were over,

    above, under, and below. For example, a given sentence could be the umbrella

    is over the man. Not only were the ratings related to the degree of rotation of

    the figure from the vertical plane, but ratings for functional scenes (the middle row)

    were higher than those for controls (top row), which were in turn higher than for

    non-functional scenes (bottom row).

    In addition to Coventry et al. (2001), there have been a substantial number of other

    psycholinguistic investigations into the acceptability of different spatial terms given

    geometric and functional relations between figure and ground, e.g. (Logan and

    Sadler, 1996; Garrod et al., 1999; Carlson et al., 2003; Coventry, 2003), with the latter

    two developing a psychologically-grounded computational model that integrates

    FIGURE 1.1 Acceptability ratings, rotation, and functional information, from Coventry (2003, p. 60)

    Introduction 27

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    28/29

    these two types of relations. We will not survey these here; suffice it to say that in our

    framework, as discussed in Chapters 3 and 4, we do not as yet address such functional

    information or different degrees of centrality in word meaning.

    Other topics that we leave out include perceptual accessibility (e.g. visibility andocclusion) of the objects to the viewer. Nor do we consider the pragmatic conditions

    under which particular spatial references take place and succeed (e.g. the speaker s

    choice of a reference frame and point-of-view, the details of a spatial description in

    the presence of particular distractors in the environment, etc.). A good discussion of

    these and other factors is found in the work of Tenbrink (2007). Finally, a book of this

    limited length cannot claim to offer a thorough survey of the field; in the course of

    our exposition, the best we can do is to cite other papers that introduce the reader to

    the relevant literature.

    1.6 Conclusion

    Let us first summarize the argument so far. We launched this book with a discussion

    of the substantial challenges faced by todays text-to-sketch technology in terms of

    comprehending natural language. We based our approach on two key insights from

    the previous literature: research on the types of spatial abstractions underlying

    language use, and the distinction between satellite-framing patterns (used with

    manner-of-motion verbs like

    bike

    ,

    drive

    ,fl

    y

    etc.) and verb-framing patterns(used in path-verbs such as arrive, depart etc.). The former provides inspiration

    for our account of qualitative spatial relations based on a theory of mereotopology, to

    be explicated in Chapter 3. The latter distinction motivated our differentiating, in our

    semantic theory, between action-based and path-based predicates, leading to a first-

    order dynamic logic (discussed in Chapters 2 and 4) where events are modeled as

    dynamic processes or static situations.

    For the approach to be of practical use in computational approaches, five specific

    requirements have to be met. When considered in the light of these requirements, the

    prior theories of spatial prepositions turned out to be rich in fundamental insights,but made assumptions untenable for a computational approach, while also ignoring

    evidence from corpus-based word-sense disambiguation. While compositional treat-

    ments of the semantics of spatial prepositions were available, the question of what

    underlying spatial primitives to rely on was not tied to those available in qualitative

    reasoning systems. As for motion verbs, we found a gap in terms of a lack of

    expressiveness and some specific shortcomings with respect to our desiderata. We

    indicated how the compositional integration of prepositional, verb, and noun mean-

    ings will be handled in our framework. We also proposed what we believe to be amore expressive verb classification than has been hitherto considered. Finally, we

    listed some of the obvious lacunae in our approach.

    28 Interpreting Motion

  • 8/2/2019 Motion - Preps Verbs Pustejovsky

    29/29

    In Chapter 2, we will delve more deeply into how motion is expressed in natural

    languages, introducing a framework that analyzes different parameters of spatial

    meaning in natural language in terms of successively more expressive representation

    languages. Following that, in Chapter 3, we will examine spatial and temporalrepresentations and inference methods that have been developed based on qualitative

    reasoning, applying them to spatial phenomena in language involving topological

    and orientation relations. Chapter 4 applies the methods discussed in Chapters 2 and 3

    to motion, providing a grounding for the semantics of motion expressions in

    language within a cognitively inspired spatiotemporal model of change. We demon-

    strate how the two linguistic strategies for encoding motion (that of path construc-

    tions and manner-of-motion constructions) can be modeled within an operational

    (dynamic) interval temporal logic. We also show how prepositional, noun, and verb

    meanings are integrated together