111
in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Embed Size (px)

Citation preview

Page 1: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Lexical Semantics in American Corpus Annotation Projects

Lori LevinSeptember 10, 2004

Tutorial at Clairvoyance Corporation

Page 2: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

What is Lexical Semantics?

Lexical semantics is about the meanings of words.

This tutorial is about the meanings of verbs and their arguments: Sam opened the door with a key. They key opened the door. The door was opened by Sam with a key. The door opened (with a key).

Sam bought a book from Sue. Sue sold a book to Sam.

Page 3: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Types of semantics not covered in this tutorial

Sentence-level meaning Truth conditions of sentences

This is a picture of a cell phone. (true) This is a picture of a book. (false)

Compositional semantics How the meanings of a noun phrase and a verb

phrase are combined into the meaning of a sentence.

Quantifier scope. Everyone here speaks two languages.

Page 4: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Aspects of lexical semantics not covered in this tutorial Nouns, adjectives, adverbs, and prepositions Selectional restrictions:

Colorless green ideas sleep furiously. Chomsky, 1957, Syntactic Structures

Count and mass nouns: There was water all over the driveway. (mass) There was dog all over the driveway. (count)

Synonymy, hyponymy, antonymy, etc. car-automobil car-vehicle Hot-cold

Page 5: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Outline

Background Predicates and Arguments Valency and subcategories of verb Optional arguments and adjuncts Semantic Roles

Three approaches to lexical semantics A linguistic theory

Lexical Conceptual Structure A lexicon project

Frame Semantics A corpus annotation project (also building a lexicon)

PropBank A multi-lingual semantic corpus annotation

project

Page 6: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Predicates and Arguments Verbs (and sometimes nouns and adjectives)

describe events, states, and relations that have a certain number of participants. The children devoured the spaghetti.

Two participants The teacher handed the book to the student.

Three particpants. Problems exist.

One participant. The participants are referred to as arguments

of the verb. (Like arguments of a function.)

Page 7: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Valency and Subcategorization Fillmore and Kay, Lecture Notes, Chapter 4:

The children devoured the spaghetti. *The children devoured. *The children devoured the spaghetti the cheese.

She handed the baby a toy. *She handed the baby. *She handed the toy.

Problems exist. *Problems exist more problems.

Page 8: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Grammaticality An asterisk (*) indicates that a sentence

is ungrammatical. A large percentage of linguists make

these assumptions: Human languages are like formal languages.

Some sentences are in the set of legal sentences and some are not

A human can act like a machine that accepts legal sentences and rejects illegal sentences.

Page 9: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Valency The number of participants is called the verb’s

valence or valency. Devour has a valency of two. Hand has a valency of three. Exist has a valency of one.

Linguists took this term from chemistry – how many electrons are missing from the outer shell. The first linguist to use the term was Charles Hockett in

the 1950’s.

Page 10: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Subcategorization Verbs are divided into subcategories that have

different valencies. Here is how the terminology works:

Exist, devour, and hand have different subcategorizations

i.e., They are in different subcategories Devour subcategorizes for a subject and a direct object. Devour is subcategorized for a subject and a direct

object. Devour takes two arguments, a subject and a direct

object (or an agent and a patient).

Page 11: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Arguments are not always Noun Phrases

The italicized phrases are also arguments: He looked very pale.

Adjective Phrase The solution turned red.

Adjective Phrase I want to go.

Verb Phrase He started singing a song.

Verb Phrase We drove to New York.

Prepositional Phrase

Page 12: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Optional and Obligatory Arguments

The direct object of eat is optional: The children ate. The children ate cake.

The direct object of devour is not optional: *The children devoured. The children devoured the cake.

Page 13: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Optional Arguments The dog ran. The dog ran from the house to the

creek through the garden along the path.

Page 14: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Optional vs. Invisible Argumentsa. What happened to the cake?b. The children ate.b’. The children ate it.

In English, Sentences b and b’ do not mean the same thing in this context.

Compare to Japanese and Chinese.

Page 15: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Adjuncts

Locations, times, manners, and other things that can go with almost any sentences are called adjuncts. The children ate the cake quickly at 2:00

in the kitchen.

Page 16: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

How to tell arguments from adjuncts There are some general guidelines that are not

always conclusive. Adjuncts are always optional.

but some arguments are optional too Repeatability:

The children devoured the cake at 2:00 on Monday. Two temporal adjuncts

The children devoured the cake in Pittsburgh in a restaurant.

Two locative adjuncts *The children devoured the cake the dessert.

arguments are not repeatable

Page 17: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Semantic Roles: Motivation

The verb open appears in different subcategorization patterns: Sam opened the door with a key. The key opened the door. The door was opened by Sam with a key. Sam’s opening of the door with a key

How can we represent the meanings of these sentences in a way that shows that they are related?

Page 18: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Semantic Roles: Motivation These sentences do not have the same

meaning even though they have the same verb: Sam interviewed Sue. Sue interviewed Sam.

Page 19: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Semantic Roles: MotivationThese sentences mean roughly the same

thing even though they use different verbs:

Sam bought a toy from Sue. Sue sold a toy to Sam.

Page 20: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Semantic Roles: Motivation The way to express riding a vehicle to a

location is different in different languages: Sam took a bus to school. Sam ascended to the bus and went to school.

(Hebrew) Sam riding on the bus, went to school. (Japanese) Sam sat on the bus, went to school. (Chinese) Sam went to school by bus. Sam went to school by taking a bus.

Page 21: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Semantic role names in a meaning representation

Sam opened the door with a key. The key opened the door. The door was opened by Sam with a key. Sam’s opening of the door with a key

Open Agent: Sam Patient: door Instrument: key

Page 22: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Semantic Roles Names in a Meaning Representation These sentences do

not have the same meaning: Sam interviewed Sue. Sue interviewed Sam.

Interview Agent: Sam Patient: Sue

Interview Agent: Sue Patient: Sam

Page 23: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Examples of Semantic Roles Agent: an agent acts volitionally or

intentionally The students worked. Sue baked a cake.

Page 24: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Examples of Semantic Roles

Experiencer and Perceived: An experiencer is an animate being that perceives

something, cognizes about something, or or experiences an emotion.

The perceived is the thing that the experiencer perceives or the thing that caused the emotional response.

The students like linguistics. (emoter and perceived)

The students saw a linguist. (perceiver and perceived)

Linguistics frightens the students. (emoter and perceived)

The students thought about linguistics. (cognizer and perceived)

Page 25: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Examples of Semantic Roles Patient: A patient is affected by an action.

Sam kicked the ball. Sue cut the cake.

Beneficiary: A beneficiary benefits from an event Sue baked a cake for Sam. Sue baked Sam a cake.

Malefactive: Someone is affected adversely by an event. My dog died on me.

Instrument: The boy opened the door with a key. The key opened the door.

Location: The clock stands on the shelf. I put the book on the shelf.

Page 26: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Three approaches to semantic roles in meaning representations Ray Jackendoff (1972, 1990) Linguistic Theory

Lexical Conceptual Structure The Motion/Location Metaphor Semantic Roles

Charles Fillmore, FrameNet Project Lexicon Frame-semantics

Martha Palmer, PropBank Project Corpus Annotation Predicate-specific role names: Proto-grammatical relations

Page 27: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Ray Jackendoff Semantic Interpretation in Generative

Grammar, MIT Press, 1972 Semantic Structures, MIT Press, 1990.

Theory of human cognition Used by many computational linguists

Page 28: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Lexical Conceptual Structure

Primitives: GO, BE, STAY, CAUSE, and several more TO, FROM, AWAY, TOWARD, VIA, and several more

Types of entities: Event, State, Thing, Place, Path

Other tiers of representation are added in order to capture nuances of meaning and grammar: Cause and affectedness Manner Actor and undergoer (see discussion of PropBank)

Page 29: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Example of Lexical Conceptual Structure

Sam threw the ball across the room. [event CAUSE [thing SAM] [event GO [thing BALL] [path TO [place AT [thing other-side-of-room]]]]]

Page 30: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Lexical Conceptual Structure and Semantic Role Names

Sam threw the ball across the room. [event CAUSE [thing SAM] agent [event GO [thing BALL] theme [path TO [place AT [thing other-side-of-room]]]]] goal

Page 31: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

The Motion/Location Metaphor J. S. Gruber, Studies in Lexical Relations,

MIT Dissertation, 1965. Agent: causes, manipulates, affects Theme: changes location, is located

somewhere, or exists Source: the starting point of the motion Goal: the ending point of the motion Path: the path of the motion

Page 32: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Examples of Location and Directed Motion Many problems still exist. The clock sits on the shelf. The ball rolled from the door to the

window along the wall. Same walked from his house to town along

the river. Sue rolled across the room. The car turned into the driveway.

Page 33: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Being in a state or changing state The car is red. The ice cream melted. The glass broke. Sam broke the glass. The paper turned from red to green. The fairy godmother turned the pumpkin

into a coach.

Page 34: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Having or Changing possession The teacher gave books to the students. The teacher gave the students books. The students have books.

Page 35: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Exchange of Information The teacher told a story to the students. The teacher told the students a story.

Page 36: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Extent The road extends/runs along the river from

the school to the mall. The string reaches the wall. The string reaches across the room to the

wall.

Page 37: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Strong points of LCS and the Motion/Location Metaphor

Sam manipulates a key, having an effect on the door, causing it to go from the state of being closed to the state of being open. Sam opened the door with a key. The key opened the door. The door was opened by Sam with a key. Sam’s opening of the door with a key

Page 38: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Strong points of LCS and the Motion/Location Metaphor

A toy goes from Sue to Sam. Some money goes from Sam to Sue.

Differences in the causation tier. Sam bought a toy from Sue. Sue sold a toy to Sam.

Page 39: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Strong points of LCS and the Motion/Location Metaphor Supports some inferences:

If X goes from A to B, then X is no longer at A. If X is created (begins to BE) during event Y,

then X doesn’t exist until Y is finished.

Page 40: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Strong or weak point? LCS wasn’t designed with this kind of thing in

mind, but it could be made to work. Sam took a bus to school. Sam ascended to the bus and went to school.

(Hebrew) Sam riding on the bus, went to school. (Japanese) Sam sat on the bus, went to school. (Chinese) Sam went to school by bus. Sam went to school by taking a bus.

Page 41: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Problem with Thematic Roles and the Motion/Location Metaphor It is not clear how to apply the metaphor

to many verbs (Fillmore and Kay, Lecture Notes, pages 4-22) He risked death. We resisted the enemy. She resembles her mother.

Page 42: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

LCS Resources Bonnie Dorr, University of Maryland http://www.umiacs.umd.edu/~bonnie/LCS_Dat

abase_Documentation.html

LCS Lexicon for English English word senses are mapped to WordNet Handcrafted lexical entries for around 4000 verbs Automatically produced entries may be available for

a full-sized lexicon LCS Dictionaries for other languages may be

available May be handcrafted or produced partially

automatically

Page 43: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Problem with Thematic Roles and the Motion/Location Metaphor It is not clear how to apply the metaphor

to many verbs (Fillmore and Kay, Lecture Notes, pages 4-22) He risked death. We resisted the enemy. She resembles her mother.

Page 44: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Charles Fillmore, Collin Baker, and others FrameNet Project

http://www.icsi.berkeley.edu/~framenet/ Frame semantics

Frames are networked using several relations Based on corpus analysis

Lexical entries for around 7500 English verbs

Other FrameNet projects in Spanish Japanese

Page 45: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Advantage of Frame Semantics

FrameNet was designed to capture the similarities in sentences like these. Ride-vehicle frameSam took a bus to school.

Sam ascended onto the bus and went to school. (Hebrew)

Sam riding on the bus, went to school. (Japanese) Sam sat on the bus, went to school. (Chinese) Sam went to school by bus. Sam went to school by taking a bus.

Page 46: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Frame Semantics compared to the Motion/Location Metaphor Frame Semantics has

Many primitives Many semantic roles

Page 47: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

FrameNet strong and weak points FrameNet is still under development and

may change frequently. Versions are clearly identified. Lexical entries are very carefully hand

crafted.

Page 48: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Martha Palmer and othersThe PropBank Project

http://www.cis.upenn.edu/~ace/ Annotate the Penn TreeBank with

predicate-argument information Corpus can be used for automatic learning

of the surface realization of each argument

Page 49: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

PropBank and FrameNet: Close ties PropBank lexical entries are linked to

FrameNet entries. There are more PropBank entries than

FrameNet entries This paper contains some comparisons of

PropBank and Framenet http://www.cis.upenn.edu/~dgildea/gildea-acl0

2.pdf See also VerbNet

http://www.cis.upenn.edu/group/verbnet/

Page 50: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Proto-roles and verb-specific roles http://www.cis.upenn.edu/~dgildea/Verbs/

Abandon Arg0:abandoner

Arg1:thing abandoned, left behind

Arg2:attribute of arg1

Page 51: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

PropBank: multiple surface realizations of arguments

Sam opened the door with a key. The key opened the door. The door was opened by Sam with a key. Sam’s opening of the door with a key

Arg0:opener Sam Arg1:thing opening door Arg2:instrument key Arg3:benefactive

Page 52: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

PropBank:How are lexical entries used by annotators? Intercoder agreement is a high priority for

PropBank. Role names like agent and theme can be

confusing. Verb-specific role names are more clear.

Page 53: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Annotation Procedure Identify the verb in a sentence. Look it up in the PropBank lexicon. Assign arg0…arg-n appropriately by

looking at the verb-specific roles. Always use the same arg-n for the same verb-

specific role.

Page 54: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

What are the arg-n’s? The arg-n labels are arbitrary labels. However, PropBank tries to use them

consistently across verbs. Arg0 tends to be an agent or the argument

most likely to be the subject in active voice. Arg1 tends to be a theme or patient or the

thing most likely to be The direct object of a transitive verb in active voice The subject of a verb in passive voice The subject of an intransitive verb

Page 55: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

PropBank was not designed for this

Sam took a bus to school. Sam ascended onto the bus and went to

school. (Hebrew) Sam riding on the bus, went to school.

(Japanese) Sam sat on the bus, went to school.

(Chinese) Sam went to school by bus. Sam went to school by taking a bus.

But it is linked to FrameNet

Page 56: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

IAMTC (Interlingua Annotation of Multilingual Text Corpora) Project

http://aitc.aitcnet.org/nsf/iamtc/ Collaboration:

New Mexico State University University of Maryland Columbia University MITRE Carnegie Mellon University ISI, University of Southern California

Page 57: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Goals of IAMTC

Interlingua design Three levels of depth

Annotation methodology manuals, tools, evaluations

Annotated multi-parallel texts Foreign language original and multiple English

translations Foreign languages: Arabic, French, Hindi,

Japanese, Korean, Spanish

Page 58: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Motivation for Corpus and Data

Examine the surface realization of many phenomena In one language: many surface realizations of the same

phenomenon I think it is raining. It is probably raining.

Across languages: different syntactic constructions are used to express the same ideas

Page 59: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

IL Development: Staged, deepening

IL0: simple dependency tree gives structure IL1: semantic annotations for Nouns, Verbs,

Adjs, Advs, and Theta Roles Not yet ‘semantic’—”buy”≠“sell’, many remaining

simplifications Concept ‘senses’ from ISI’s Omega ontology Theta Roles from Dorr’s LCS work Elaborate annotation manuals Tiamat annotation interface Post-annotation reconciliation process and interface Evaluation scores: annotator agreement

IL2: that comes next…

Page 60: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Details of English IL0 Deep syntactic dependency representation:

Removes auxiliary verbs, determiners, and some function words

Normalizes passives, clefts, etc. Removes strongly governed prepositions Includes syntactic roles (Subj, Obj)

Construction: Dependency parsed using Connexor (English)

– Tapanainen and Jarvinen, 1997 Hand-corrected

Extensive manual and instructions on IAMTC Wiki website

Page 61: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

IL0 coding manuals for other languages: Japanese Spanish Korean (in progress) Hindi (in progress) French (in progress)

Page 62: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Example of IL0

TrEd, Pajas, 1998

Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

Page 63: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Example of IL0 Sheikh Mohammed, who is also the Defens Minister of the

United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

announced V RootMohamed PN Subj

Sheikh PN ModDefense_Minister PN Mod

who Pron Subjalso Adv Modof P Mod

UAE PN Objat P Mod

ceremony N Objinauguration N Mod

Page 64: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Dependency parser and Omega ontology

Omega (ISI):110,000 concepts (WordNet, Mikrokosmos, etc.), 1.1 mill instances

URL: http://omega.isi.edu

Dependency parser (Prague)

Page 65: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Details of IL1 Intermediate semantic representation:

Annotations performed manually by each person alone Associate open-class lexical items with Omega Ontology

items Replace syntactic relations by one of approx. 20 semantic

(theta) roles (from Dorr), e.g., AGENT, THEME, GOAL, INSTR…

No treatment of prepositions, quantification, negation, time, modality, idioms, proper names, NP-internal structure…

Nodes may receive more than one concept Average: about 1.2

Manual under development; annotation tool built

Page 66: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Example of IL1 Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

Page 67: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Example of IL1: internal representation

The study led them to ask the Czech government to recapitalize CSA at this level.[3, lead, V, lead, Root, LEAD<GET, GUIDE][2, study, N, study, AGENT, SURVEY<WORK, REPORT][4, they, N, they, THEME, ---, ---][6, ask, V, ask, PROPOSITION, ---, ---] [9, government, N, government, GOAL, AUTHORITIES,

GOVERNMENTAL-ORGANIZATION] [8, Czech, Adj, Czech, MOD, CZECH~CZECHOSLOVAKIA, ---] [11, recapitalize, V, recapitalize, PROP, CAPITALIZE<SUPPLY,

INVEST] [12, csa, N, csa, THEME, AIRLINE<LINE, ---] [16, at, P, value_at, GOAL, ---, ---] [15, level, N, level, ---, DEGREE, MEASURE] [14, this, Det, this, ---, ---, ---]

Semantic Roles

Concepts from the Omega Ontology

Page 68: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Tiamat: annotation interface

For each new sentence:

Candidate concepts Step 1: find

Omega concepts for objects and events

Step 2: select event frame (theta roles)

Page 69: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Omega ontology Single set of all semantic terms, taxonomized

and interconnected (http://omega.isi.edu ) Merger of existing ontologies and other

resources: Manually built top structure from ISI WordNet (110,000 nodes) from Princeton Mikrokosmos (6000 nodes) from NMSU Penman Upper model (300 nodes) from ISI 1-million+ instances (people, locations) from ISI TAP domain relations from Stanford…

Undergoing constant reconciliation and pruning Used in several past projects (metadata

formation for database integration; MT; QA; summarization)

Page 70: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

So far… Annotations of 12 English texts:

6 pairs of translations of 1 text from each source language

10 – 12 annotators for each text Approximately 144 annotated texts total

Annotation manuals for IL0 and IL1 Annotation tools Work on evaluation for interannotator agreement. Now, we’re working on IL2 specification and

annotation.

Page 71: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

Page 72: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Color Key Black: same meaning and same

expression Green: small syntactic difference Khaki: Lexical difference Red: Not contained in the other text Purple: Larger difference.

Need to use some inference to know that the meaning is the same

Page 73: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Getting at meaning(Two translations of a Japanese original text)

This year, too, in addition to the birth of Mitsubishi Chemical, which has already been

announced, other rather large-scale

mergers may continue, and be recorded as a "year of mergers."

This year, which has already seen the announcement of the birth of Mitsubishi Chemical

Corporation as well as the continuous numbers of big mergers, may too be recorded as the "year of the merger“ for all we know.More lexical similarity.

More differences in dependency relations.

Page 74: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Additional Topics in Lexical Semantics

Page 75: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

English Transitivity Alternations Beth Levin, 1993

Identified around 100 transitivity alternations in English.

Page 76: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples

Causative-Inchoative: change of state verbs Sam broke the glass. (causative) The glass broke. (inchoative) Sam opened the door. The door opened. Sam kicked the ball. *The ball kicked.

In other languages Inchoative verbs may be reflexive (e.g., Romance languages) There may be a causative marker on the transitive verb.

Inchoative means beginning. Beginning a change of state?

Page 77: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples Dative Shift: giving and telling

I gave Sam the book. I gave the book to Sam. I told the story to the children. I told the children the story. I drove the car to New York. *I drove New York the car.

In other languages The goal may not be able to become a direct object.

(Romance languages) The goal may become a direct object in the presence of

an applicative morpheme. (Bantu languages)

Page 78: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples Spray-Load Alternation: filling and

covering. Sam sprayed the wall with paint. Sam sprayed paint on the wall. Sam loaded the truck with hay. Sam loaded hay onto the truck.

Page 79: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples There Insertion: stative, appearing

Problems exist. There exist problems. A ghost appeared. There appeared a ghost. The students worked. *There worked some students. The students disappeared. *There disappeared some students.

Page 80: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples Locative subjects:

Bees swarmed in the garden. The garden swarmed with bees.

Temporal subjects: 1990 saw the fall of the government.

Page 81: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples Middle: Telic verbs? (see below)

You can cut this bread. This bread cuts easily. You can sell these books easily. These books sell well. People like these books. *These books like well.

Page 82: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples Resultative Secondary Predication: theme

version Sam hammered the nail. Sam hammered the nail flat. The lake froze. The lake froze solid.

Page 83: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Transitivity Alternations and Semantic Classes: Examples Resultative Secondary Predication: agent

version He screamed himself hoarse. He cried himself to sleep.

Page 84: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Class shifts Manner of motion to change of location:

The bottle floated. The bottle floated into the cave. The ball bounced. The ball bounced across the room.

Sound to change of location: The car rumbled. The car rumbled down the street. The dress rustled. She rustled across the room.

Page 85: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

How universal? How universal is argument structure?

If an English word has an agent and a patient, will the translation-equivalent in another language have an agent and patient?

If an English word has a subject and object, will the translation-equivalent in another language have a subject and object?

Less likely: I met him. I met with him.

Page 86: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

How Universal? How universal are alternations and

semantic classes? If an English word undergoes a transitivity

alternation, will the translation equivalent in another language undergo the same transitivity alternation?

Even less likely. (Mitamura, 1989)

Page 87: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Importance of Transitivity Alternations in Language Technologies For any task that requires understanding

(question answering, information extraction, machine translation) you need to know the semantic roles of the NPs. The glass broke. (subject is patient) The kids ate. (subject is agent) I gave them some books (object is recipient)

Page 88: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Importance of Transitivity Alternations in Language Technologies So you need multiple lexical mappings for each

verb: break < agent patient> subj obj break < patient > subj give < agent theme recipient> subj obj obl give < agent theme recipient> subj obj2 obj

Page 89: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Importance of Transitivity Alternations in Language Technologies To speed up lexicon acquisition, assigning a verb

to a semantic class and automatically generating its alternations is faster than listing all of its lexical mappings by hand. I gave books to the students. I gave the students books. Books were given to the students. The students were given books. There were books given to the students. There were students given books.

Page 90: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Lexical Aspect State

The clock sat on the shelf. Activity

The children painted. Accomplishment

The children walked to school. Achievement

The ambassador arrived in Moscow.

Page 91: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Lexical Aspect Took examples from this web page:

http://www.sfu.ca/person/dearmond/322/322.event.class.htm

Vendler, Linguistics in Philosophy, 1967 Dowty, Word Meaning and Montague Grammar,

1979 Tenny, Aspectual Roles and the Syntax-Semantics

Interface, 1994

Page 92: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Activities and Accomplishments Activity:

The children painted for an hour.

?The children painted in an hour.

The children will paint in an hour. They will start in an hour.

The children almost painted. Almost started painting

Test for telicity: If you start to paint and stop,

you have painted. Fails test for telicity.

Accomplishment: ?The children walked to school for

an hour. The children walked to school in

an hour. The children will walk to school in

an hour. They will start in an hour, or it

will take an hour. The children almost walked to

school. Almost started walking, or

almost reached school Test for telicity:

If you start to walk to school and stop, you may not have walked to school.

Passes test for telicity.

Page 93: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Telicity Telic: has a goal or endpoint

(accomplishment) Atelic: does not have a goal or endpoint

(activity) Telicity can change depending on the

sentence: He built houses for a year/*in a year. He built a house in a year/?for a year.

Page 94: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Achievements The ambassador almost arrived in

Moscow. Only means “almost finished” not “almost

started.”

Page 95: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

States (English) Stative: Simple present tense means present

time. Present progressive does not sound good. He knows the answer. He is knowing the answer.

Non-stative: Simple present tense means habitual or generic. Present progressive means present time. He paints. He is painting.

Page 96: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Lexical Aspect for Language Technologies English

You have to know the lexical aspect of the verb in order to know what the tense morphemes mean.

The simple present tense means “habitual” with a non-stative verb, but means present time with a stative verb.

You have to know the lexical aspect of the verb in order to know what the adverbials mean.

Almost can mean “almost started,” “almost finished,” or both.

Page 97: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Telicity Japanese:

Telic verbs with –te iru have a resultative meaning

Aite iru: is open or has been opened, not is opening

Otite iru: is dropped (is on the floor), not is dropping (unless it takes a very long time to fall, like a leaf falling off of a sky scraper)

Atelic verbs with –te iru have a progressive meaning

Tabete iru: is eating, not has eaten

Page 98: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Telicity Japanese: -te aru (with passive-like

meaning) only applies to telic verbs because it focuses on a resulting state. (e.g., wash (arau), but not praise (homeru))

Sara ga aratte aru.Plate subj wash

???Taroo ga homete aru.

Page 99: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Telicity: Finnish Angelica Kratzer, Telicity and the Meaning of Objective Case,

International Round Table ‘The Syntax and Semantics of Aspect’, Universite de Paris, Nov. 2000.

Telic: direct object can have partitive or accusative case (with a slight difference in meaning): Ammu-i-n karhu-aShoot-past-1sg bear-partI shot at a/the bear

Ammu-i-n karhu-nShoot-past-1sg bear-accI shot the bear

Atelic: can only have partitive case: despise, admire, envy, love, study, play, listen, pull

Page 100: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Telicity: Chinese Lisa Lai Shen Cheng, Aspects of the Ba-Construction,

Lexicon Project Working Papers 24, Carol Tenny (ed.), MIT, 1988.

Ta ba shu mai le.He BA book sell ASPHe sold the book

Factors determining grammaticality of the ba-construction: Aspect markers: occurs with le and zhe, but not with zai

and guo. Definiteness: The direct object has to be interpretable as

definite. Telicity of the verb: tui le (pushed) vs. tui dao le (pushed

down; push-fall); la le (pull) vs. la dao le (pull down; pull-fall); dai le (bring/carry) vs. dai lai le (bring here; carry-come)

Page 101: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

“Ba” and Telicity*Wǒ bǎ Lǐsì tūi-le. I BA Lisi push-ASP

“I pushed Lisi.”

Wǒ bǎ Lǐsì tūi-dǎo-le. I BA Lisi push-fall ASP

“I pushed Lisi and he fell.”

Page 102: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

“Ba” and Telicity

*Tā bǎ Zhāngsān lā-le.

He BA Zhangsan pull-ASP

“He pulled Zhangsan.”

Tā bǎ Zhāngsān lā-dǎo-le.He BA Zhangsan pull-fall-ASP

“He pulled Zhangsan and Zhangsan fell.”

Page 103: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

“Ba” and Telicity*Tā bǎ dìan-nǎo dài-le. He BA computer bring-ASP “He brought the computer.” (Does this really mean “He carried the computer?”) Tā bǎ dìan-nǎo dài-lái-le. He BA computer bring-come-ASP “He brought the computer here.”

Page 104: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

“Ba” and Telicity*Tā bǎ fángjīan dǎ-sǎo-le. He BA room hit-sweep-ASP

“He cleaned the room.”

Tā bǎ fángjīan dǎ-sǎo de hěn gānjìng.He BA room hit-sweep DE very clean “He cleaned the room and the result is that the

room is very clean.”

Page 105: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Two kinds of intransitive verbs: subject is agentive or notSam worked. agentiveSam fell (by accident). non-agentive Unaccusative: an intransitive verb whose subject is not

agentive. Because the noun phrase would have been accusative if

the verb were transitive? Unergative: an intransitive verb whose subject is

agentive. Because the noun phrase would have been ergative if the

verb were transitive? Confusing terminology by David Perlmutter and Paul

Postal. Highly influential and insightful contribution to linguistic

theory also by David Perlmutter and Paul Postal.

Page 106: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Unaccusativity or Agentivity English: Resultative secondary

predication:

*He screamed hoarse.?He worked to exhaustion. He worked himself to exhaustion It broke to pieces. It froze solid.

Page 107: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Unaccusativity or Agentivity: German Impersonal Passivehttp://www.wm.edu/CAS/modlang/gasmit/grammar/passive/impspass.htm

Hier wird nicht geparkt.

No parking here.

Im Gang wird nicht geraucht.

No smoking in the corridor.

Es wurde viel getanzt und gesungen.

There was lots of dancing and singing.

Works with agentive verbs only.

Not with break, fall, etc.

Page 108: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Consequences of Unaccusativity: Italian partitive clitics

http://www.sfu.ca/person/dearmond/405/405.ergative.unaccusative.htmSono passate tre settimane.Are passed three weeksThree weeks have passed.

Ne sono passate tre. Of-them are passed three Three of them have passed. Ne sono arrivati(?) tre. Of-them are arrived three Three of them have arrived.

* Ne hanno telefonato(?) tre. Of-them have phoned three Three of them have arrived.

Page 109: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Importance of unaccusativity Non agentive subjects, direct object, subjects

of passives: The water froze solid. He hammered the nail flat. The nail was hammered flat.

Agentive subjects and subjects of active, transitive verbs. He hammered the nail exhausted.

Doesn’t mean that he became exhausted as a result of hammering the nail.

He screamed hoarse. Doesn’t mean that he became hoarse as a result of

screaming.

Page 110: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Importance of Unaccusativity Non-agentive subjects behave like direct

objects. Passive subjects correspond to direct

objects of active sentences. The Unaccusative Hypothesis (Perlmutter

and Postal): Maybe non-agentive subjects are direct objects at some level of representation.

Page 111: Lexical Semantics in American Corpus Annotation Projects Lori Levin September 10, 2004 Tutorial at Clairvoyance Corporation

Example of insight from the unaccusative hypothesis Why can’t German unaccusative verbs

become impersonal passives? They are already passive! The non-

agentive subject was at some point an object that got promoted.