Mutiword Expressions: An Extremist Approach Charles J. Fillmore ICSI and UCB

Preview:

Citation preview

Mutiword Expressions:An Extremist Approach

Charles J. Fillmore

ICSI and UCB

Background:or, Why Do I Care?

FrameNet Project

How to evaluate progress

"Words" versus LUs: complain, take off, depend on

Search problems and word frequency

General questions of polysemy

Some corpus linguistics traditions

Certain technical problems of representation: parcelling out meanings

MWEs and the rest of the grammar

Estimation of vocabulary size

Questions of acquisition, typology, etc.

What is a MWE?

Any linguistic expression, involving more than one word, that requires an interpreter – human or machine – to have more than the abilities of an "Innocent Speaker-Hearer".

The concept is not limited to lexicalized (listable) expressions.

Innocent Speaker-Hearer

The ISH knows – individual simple lexical units, – the basic head-to-dependent grammatical relations, – the basic head-to-dependent semantic relations as

determined by the frame of the governing lexical unit,

– regular and specific rules for realizing these, – strategies for building a semantic structure out of all

this.

That's all it knows.

Dependency Representation

Since ISH's knowledge is about – unitary words and – word-to-word relations,

that can be represented in dependency diagrams in – which each node is a word and – each word-to-word link, i.e., each branch,

• stands for one of the basic grammatical relations and

• is capable of bearing a frame-based semantic relation to the governor.

Here's a simple case:

His parents gave me a copy of that fascinating book about frogs.

gave

parents me copy

his a of

book

that fascinating about

frogs

Basic syntactic relations

Complementation

Specification

Modification

(there are others)

Complementation

His parents gave me a copy of that fascinating book about frogs.

gave

parents me copy

his a of

book

that fascinating about

frogs

Complementation

His parents gave me a copy of that fascinating book about frogs.

gave

parents me copy

his a of

book

that fascinating about

frogs

Actually, copy of shouldbe treated as a MWE.

Specification

His parents gave me a copy of that fascinating book about frogs.

gave

parents me copy

his a of

book

that fascinating about

frogs

Specification

His parents gave me a copy of that fascinating book about frogs.

gave

parents me copy

his a of

book

that fascinating about

frogs

Actually his can also bethought of as satisfyinga frame requirement ofthe relational noun parents.

Modification

His parents gave me a copy of that fascinating book about frogs.

gave

parents me copy

his a of

book

that fascinating about

frogs

So ...

The study of MWEs proceeds by examining meaning units of the language that do not lend themselves to such a simple treatment.

(Consider a parser.)

Where the ISH idealization fails

1. Some apparent MWEs are best analyzed as single words, occupying one node.

2. Some MWEs are the product of "non-core" constructions and semi-independent mini-grammars.

3. Some MWEs are the products of "regular" processes but have institutionally stipulated meanings.

4. Some MWEs can be represented as dependency subgraphs (not "just" word strings, or collocate sets).

Where the ISH idealization fails

1. Some apparent MWEs are best analyzed as single words, occupying one node.

2. Some MWEs are the product of "non-core" constructions and semi-independent mini-grammars.

3. Some MWEs are the products of "regular" processes but have institutionally stipulated meanings.

4. Some MWEs can be represented as dependency subgraphs (not "just" word strings, or collocate sets).

1. "Runs"

“Runs"

There are things that look like MWEs (that are written as sequences of words), but they have no internal variation and may just as well be thought of as long words with spaces in them. Examples– used to, let alone, of course, all of a sudden, first off

Many are easily mislearned– used to > used of– by and large > by in large– to all intents and purposes > to all intensive purposes– an arm and a leg > a nominal egg

2. Special Constructions

Special Constructions

Some common grammatical constructions require structures that go beyond the "core" provisions of a grammar. Consider the structure of:– the faster we drive the sooner we'll get there– what's this scratch doing on my violin?– she's older than any of us realized– she wouldn't give her mother a nickel let alone a dollar

Minigrammars

Some MWEs are generated by simple generative structures, usually finite state automata, for which dependency – or constituency – representations are not always relevant.– Names– Numbers– Locations (addresses, coordinates)– Time Expressions– Kinterms

Personal Names

Reverend Dr T. Allen Hampton-Smith III

Components: titles, honorifics, given names, patronymics, family names, extensions, ...

English Kinterms

grandfather, great grandfather, great great grandfather, etc.

first cousin, second cousin, third cousin

first cousin once removed, second cousin three times removed, etc.

father-in-law, son-in-law, sister-in-law, etc.

siblings

X

A B

C D

E F

G H

cousins

X

A B

C D

E F

G H

second cousins

X

A B

C D

E F

G H

first cousins once removed

X

A B

C D

E F

G H

first cousins twice removed

X

A B

C D

E F

G H

Digression

Ordinary techniques of computational linguistics/corpus linguistics won't be able to recognize the constructional nature of some expressions.

Test case

another $600

Indefinite article

Qualifier Quantifier Plural Noun

a whopping 600 dollars

an additional 10 pages

a paltry 20 euros

a respectable 6,000 francs

*a mere - pages

*a - 12 pages

Indefinite article

Qualifier Quantifier Plural Noun

a whopping 600 dollars

an additional 10 pages

a paltry 20 euros

a respectable 6,000 francs

*a mere - pages

*a - 12 pages

But how do we analyze "another $600"?

Indefinite article

Qualifier Quantifier Plural Noun

a whopping 600 dollars

an additional 10 pages

a paltry 20 euros

a respectable 6,000 francs

*a mere - pages

*a - 12 pages

But how do we analyze "another $600"?

*an -other 600 $

Relations to the rest of the grammar

It would be most convenient if the products of minigrammars could be "sealed" and not interfere with the rest of the sentence. But:– Croatian names– Finnish numbers– Internal grammar

3. Stipulated Designations

Translucent Idioms:regular productions with stipulated designations

From one point of view these are just "long words" with special meanings, but they are semantically penetrable; e.g.,– names of organizations

The American Society for the Prevention of Cruelty to Animals. (ASPCA)

– names of titlesDeputy Undersecretary of Defense for Intelligence

– names of officially designated crimesassaulting a federal officer with a deadly or lethal weapon

4. Dependency Subgraphs

Dependency Subgraphs

Here we refer to lexical units that are continuous parts of dependency structures.

x

y

x

y z

x

y

z

Dependency Subgraphs

A given lexical unit of this kind can have its own subcategorization requirements.

x

y

x

y z

x

y

z

A A

A

(Motivating digression)

word strings - "wrist watch" - how to find - statistical significance ("of the")

discontinuous - "collocates" - within spans - within sentences

some kind of grammatical relation between them?

Subcategorization Details

Particle Verbs - Intransitive

Verb > particle is the lexical unit.

Exx: wake up, go away, sit down, shut up,

Interruptible: Shut the hell up!

V

partX

shut

upX shut

upthe_hellshut

Particle Verbs - Transitive

Verb > particle is the lexical unit.

Exx: take off ('remove'), take out ('date'),

Interruptible: Take your shoes off.I took her out once.

V

partX Y

take

offyour shoestake

take

offX Y

In the Old Days ...

About half a century ago it was generally believed that in Deep Structure, phrases like pick up, take off, etc., started out as single constituents, and a Particle Movement Transformation allowed the extraction of the particle so that it could follow the direct object.

[take off] [your shoes] >> [take] [your shoes] [off]

A dependency subgraph can recognize the unity of the two-word block without worrying about phrasal constituency.

Prepositional Verbs - Intransitive

Verb > preposition is the lexical unit.Exx: look for ('seek'), object to ('oppose'), look into ('investigate') Interruptible: I looked long and hard for the perfect wife.We objected strenuously to her proposal.Comment: Some PPs are omissible, some aren't. look (for), look into

V

X prep

Y

look

X for

Y

PP Omissibility

Omissible (under conditions of zero anaphora)

Look at it!- I'm looking.

Look for it.- I'm looking.

Non-omissible

Could you look into this problem for me?- *I've already started looking.

Prepositional Verbs - Transitive

Verb > preposition is the lexical unit.Exx: talk into ('persuade'), rid of Comment: PP is sometimes omissible: The judge cleared me (of all charges).They tried to talk me *(into quitting my job).Who will rid me *(of this meddlesome priest)?

V

X prep

Z

clear

X of

Z

Y

Y

Particle-&-Preposition Verbs

Verb > {part,prep} is the lexical unit.

Exx: put up with ('tolerate'), look up to ('respect'), break in on ('interrupt')

Not generally interruptible, I think (haven't checked corpus data).

V

partX prep

Y

put

upX with

Y

V+N+P Verbs

Verb > /N,prep/ is the lexical unit.Exx: take advantage of ('exploit'), take part in ('participate in'), take charge ofComments: N can be modified; N can be passive subject:Considerable advantage was taken of this opportunity.Pseudo-passive:They were cruelly taken advantage of.N does not take a determiner.

V

NX prep

Y

take

partX in

Y

Other Parts of Speech

Adjectives can have prepositional and clausal complements:– fond of cats; interested in math; similar to mud

Nouns can have prepositional and causal complements:– top of the tower; friend to the poor; journey into the

jungle; copy of the book

VP Idioms

Obvious ones– pull someone's leg, blow one's nose

– kick the bucket Less obvious ones– answer the door

(Would you answer the door?)– mention someone's name

(Did anybody mention my name at the party?)

Support Constructions

Support Verbs with Subject N

Verb > N is the lexical unit, N is semantic head, V is support verbExx: The wind is blowing, the fire is burning, the rain is falling, a riot occurred; an accident happenedComment: The frame is evoked by the noun. The support verb is selected by the noun.Compare "the fire is burning" with "the house is burning".

V

N

blow

wind

V

N

blow

wind

Note linearization: Since these are intransitive, the N is (or heads) the subject NP and the verb is the predicate.

V

blows

Support Verbs with Object N

Verb > N is the lexical unit, N is semantic head, V is support verb. N has its own valence.Exx: We had an argument with the kids. ('we argued with the kids')I made the decision to leave. ('I decided to leave')Comment: The frame is evoked by the noun. The SV is selected by the noun, which also brings in its own complement structure.Comment: The N doesn't have to be deverbal: wage war, commit a crime

V

NX

have

argumentX

Y

with

Ditransitive Support Verbs

Verb > N is the lexical unit, N is semantic head, V is support verb. X and Y are each participants in N's frame.Exx: She gave me a kiss. ('she kissed me')I paid him a bribe. ('I bribed him')They gave me good advice.('they advised me well')

V

NX

give

kissX

Y

X

SVs can resolve polysemy.

Polysemous event nouns can take different support verbs:– ('quarrel') have an argument– ('reason') make an argument

– ('rest') take a break– ('flight') make a break

A common test of SVs:

One frequent proposed characteristic of support verbs is that their nominal object can’t really be interrogated - meaning that the verb in question isn’t functioning as a self-standing verb. The following are not natural conversations:– What did you heave? - A sigh.– What have you made? - A decision to go home.– What did you have? - A fight with my brother.– What did you wreak? - Vengeance on my enemies.– What did you lodge? - A complaint.

Interchangeable with Verbs

She heaved a sigh. (She sighed.)

We made the decision to give up. (We decided to give up.)

I took a bath. (I bathed.)

He suffered a relapse. (He relapsed.)

Let’s say a prayer. (Let’s pray.)

Profiling Different Participants

Agent of eventperform an operationinflict injury

exact/wreak vengeancelaunch an attackgive instructionssubmit an applicationask a question

Undergoer of eventundergo an operation

sustain injury

have a setback

suffer a defeat

undergo an operation

receive a rebuke

get advice

Beyond "light verbs"

Simple cases: the verb has essential no meaning except to reveal that its subject is necessarily a participant in the event named by the noun.– a. active role– b. passive role

More nuanced cases: the verb contributes information about register, attitude, aktionsart, or the like.More extended cases: the verb identifies its subject as a participant in the larger scenario associated with the event named by the verb.

Examples

Simple, active: – he made a complaint

Nuanced: – he registered a complaint

Examples

Simple, active: – she gave an exam

Simple, passive: – he took/sat the exam

Examples

Simple, active: – she gave an exam

Simple, passive: – he took/sat the exam

Extended: – he passed/failed the exam

Examples

Simple, active: – she made a promise

Examples

Simple, active: – she made a promise

Extended: – she kept/broke her promise

For the full story, and then some, see ...

Mel'cuk, Igor' (1995), Phrasemes in language and phraseology in linguistics. In M. Everaert et al., Idioms: Structural and Psychological Perspectives. Lawrence Erlbaum Associates.

Mel'cuk, Igor' (1996), Lexical functions: a tool for the description of lexical relations in a lexicon. In Leo Wanner, ed., Lexical Functions in Lexicography and Natural Language Processing. John Benjamins.

Mel'cuk, Igor' (1998), Collocations and lexical functions. In Cowie 1998

Mel'cuk, Igor' (1995), The future of the lexicon in linguistic description and the explanatory combinatorial dictionary. Linguistics in the Morning Calm 3. 181-270. Hanshin: Seoul

Support Verbs with Adjective

Verb > A is the lexical unit, A is semantic head, V is support verb, A may have its own complements (e.g., rid of).

Exx: be + any predicate adjective; go crazy, turn red, get naked

Comment:The unit rid of seems to occur only with a SV.

V

AX

get

nakedX

Support Prepositions

Prep > N is the lexical unit, N is semantic head, V is support verb. N has its own valence.Exx: at risk, in danger, on fire, under scrutiny, under arrestSome are modifiable:at considerable risk, in grave danger, under careful scrutinyComment: The P>N structure may function adjectivally or adverbially; the N can have its own complements.(he participated in the race) at considerable risk to his health, (the building is) in danger of collapse

P

N

at

risk

More Complex Cases

Verb > P > N is the lexical unit, N is semantic head, V is support verb, N is generally not expandable.

Exx: take into account, take under consideration, have in (one's) possession

V

PX

N

take

underX

consideration

Y

Y

Support Verbs with PP

Verb > P > N is the lexical unit, N is semantic head, V is support verb. With possession there are two alignments of the arguments:Possessor - Possessed

I came into possession of these documents.

Possessed - Possessor

These documents came into my possession.

V

X prep

N

come

X into

possession

Transparent Nouns

N of N

N > of is the lexical unit, The second N is semantic head for purposes of external selection.Comment: sometimes the N > of is "transparent" to the pieces of an MWE; and sometimes the N > of > N is itself an MWE, especially in the case of aggregates and unitizers:

– a case of the flu– a round of golf– a herd of cattle– a flock of geese– a school of fish– a pinch of salt– a pod of whales

N

of

type

of

N

fish

N

of

bout

of

N

flu

Types of transparent nouns1. Aggregates

bunch, group, collection, herd, school, flock

2. Quantities flood, number, scores, storm

3. Types breed, class, ilk, kind, type, sort

4. Portions and Parts half, segment, top, bottom, part

5. Unitizers glass, bottle, box, serving

6. Evaluations gem, idiot, prince

"Transparent" to what?

Relation between locative preposition and object:– on the shelf; on this part of the shelf– in the room; in this part of the room

Relation between verb and typical collocating object– play golf; play a round of golf– eat fish; eat this type of fish

Relation between possessor and kin-term– my wife; my gem of a wife– her husband; her jerk of a husband

Compounds

N > N Compounds

N > N is the lexical unit; listed compounds have the dependent in red; the syntactic head is the frame evoker, the dependent is either a frame element or a "quale". The order is Modifier + Head.

N

risk

N

health

N

knife

N

fish

N+N Compounds

Some are just listed, their internal structure of etymological relevance only. (What's the head of light year? Often misused: "that was light years ago".)– light year, puppy love

Some are listed, with N2 as the head, N1 as satisfier of some requirement of N2; name pre-existing category.– bread knife, wine bottle, cork screw

Some are interpretable with reference to completion needs of N2.– fire risk, health risk, travel risks

A-N Compounds

N > A is the lexical unit; listed compounds have the dependent in red; the syntactic head is the frame evoker, the dependent is either a frame element or a "quale".

Ready-made A+P compounds:hot news, friendly fire, blind alley, dead end

N

police

A

federal

N

news

A

hot

"Pertinative" adjectives

Pertinatives are adjectives whose senses are defined in (some) dictionaries with the phrase "of or pertaining to". Traditional term: relational adjectives. WordNet term: pertainyms.

They are not used predicatively in the same meaning.

They aren't scalar, e.g., they don't get modified with very.

Pertinatives vs. Descriptives

judicial appointmenteconomic policyeducational practicecriminal law

linguistic societyCanadian governmentnational interest

these are MWEs

judicious appointmenteconomical housewifeeducational experiencecriminal behavior

ugly catamazing disclosurebored child

these aren't

Continuity Hypothesis

I assume the continuity of the lexicon and the constructicon.

Reference: Paul Kay & Charles J. Fillmore (1999), "Grammatical constructions and linguistic generalizations: the What's X Doing Y? construction", Language 75 1-33.

Claim: many lexically-headed constructions can be analyzed as dependency subtrees.

be

X doing

what Y

be is finite (not quite true)Y is secondary predicate,i.e.

APwith absoluteparticipiallocative phrase

Different linearizations and interruptions:

What are you doing here? (be before X)I wonder what she's doing wearing her mother's dress. (X before be)What the hell are you still doing standing out there in the rain?(various interruptions)What are you doing without any shoes on?

Meaning: X is Y, and that is anomalous.

Long line for pre-recall appointments to benchPhillip Matier, Andrew RossMonday, Augus t 25, 2003

1. As the recall clock ticks down, it's interesting to note how many Gray Davisloyalists are putting their names in for some highly coveted judicial appointments.

2. Among the more notable bench seekers:

n The governor's own legal affairs secretary, Barry Goode, who is beingvetted by the State Bar for an appointment to the First District Court ofAppeal in Sacramento.

n Davis' legal appointments secretary, Burt Pines, who over the past 4 1/2years has helped his boss fill 304 judgeships around the state. Pine is nowunder consideration himself for a seat on the L.A. Superior Court.

n And Jeremiah Hallisey, one of the governor's top San Francisco fund-raisers who put together last Thursday's big $1,000-a-head cocktail partyfor Davis at the Fairmont. Hallisey, who sits on the CaliforniaTransportation Commission, has filed papers for one of the Superior Courtopenings in either San Francisco or Contra Costa County.

3. And speaking of the Fairmont fund-raiser (which netted a respectable $600, 000),attendees told us the crowd looked like a casting call for wannabe judges andpeople seeking recall-proof commission appointments.

Personal names, long and short:Gray Davis DavisJeremiah Hallisey Hallisey

PlacesLos Angeles San Francisco

Organizations, InstitutionsFirst District Court of AppealL. A. Superior CourtCalifornia Transportation Commission

Noun+Noun Compoundsrecall clockDavis loyalistscasting callcommission appointments

Adjective + Noun Compoundslegal affairsjudicial appointmentmedical leavejudicial vacancy

Complex cases:legal affairs secretarylegal appointments secretary

Support Verbsmake ... appointmentssubmit to ... review

Transparent Nounsa stack of appointmentsa host of 11th hour appointments

Verb-headed phrasesput one's name in for (an appointment)file for (an opening)get the thumbs down fromget one's name clearedsign off onget caught flat-footed

Miscellaneousas the clock ticks downover the past four and a half yearsit is interesting to noteand speaking of ...a respectable 600 thousand dollarson the way out the dooron the chance there may be ...much lessin fairness

Bottom Line

Lexical units can be represented as dependency subgraphs, specifying a semantic head, a syntactic head, required/preferred dependents.Constraints on dependents can be specified lexically, sortally, morphosyntactically, and in terms of frame roles.Dependents can be marked as "closed" (not open to modification) and/or "local" (not subject to extraction) and/or "omissible".The lexical head of the construction bears information about contextual constraints: finiteness, inflection, polarity, etc.

Recommended