27
English Syntax Read J & M Chapter 9.

English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Embed Size (px)

DESCRIPTION

What is Syntax? Try 1: the rules for stringing words together to form sentences. The boys hit the ball. vs. Ball boys hit the the. I gave Sue a ride to the store vs. I gave Sue ride to store. I saw the book that Mary had written. vs. I saw the book what Mary had written. But if that’s all it were, we wouldn’t have to do much for understanding assuming legal input.

Citation preview

Page 1: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

English Syntax

Read J & M Chapter 9.

Page 2: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Two Kinds of Issues

•Linguistic – what are the facts about language?

The rules of syntax (grammar)

•Algorithmic – what are effective computational procedures for dealing with those facts?

Building parsers

Page 3: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

What is Syntax?

Try 1: the rules for stringing words together to form sentences.

The boys hit the ball. vs. Ball boys hit the the.

I gave Sue a ride to the store vs. I gave Sue ride to store.

I saw the book that Mary had written. vs.I saw the book what Mary had written.

But if that’s all it were, we wouldn’t have to do much for understanding assuming legal input.

Page 4: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

What is Syntax?

Try 2: The rules for forming constituents that correspond to meaningful entities.

Example: The cat with the furry tail purred.

Page 5: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Why Do We Care about Syntax?

Morphology

POS Tagging

Syntax

Semantics

Discourse Integration

Generation goes backwards. For this reason, we generally want declarative representations of the facts.

Page 6: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Sometimes We Need it Even if We Don’t Go All the Way

Question answering:

Lawyers whose clients committed fraud

vs

Lawyers who committed fraud

vs

Clients whose lawyers committed fraud

Page 7: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Finding Constituents in Sentences

A constituent is a word or group of words that functions as a unit.

How can we discern constituents?

•Semantically:

The cat with the furry tail purred.

•What can be chopped out and replaced by a single word?

Agnes purred.

* Agnes tail purred.

Page 8: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Finding Constituents in Sentences, con’t

•Preposed and postposed constructions:

Early next year I’d like to go to Paris.

I’d like to go to Paris early next year.

I’d like early next year to go to Paris.

* Early I’d like to go to Paris next year.

* I’d like early to go to Paris next year.

* The early next year old man would like to go to Paris.

Page 9: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

How Many Kinds of Constituents are There?

Although there may be an infinite number of possible constituent tokens, there’s quite a small number of constituent types, e.g., NP, PP, VP.

On what basis can we group tokens into types? Occurrence in similar contexts.

Page 10: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

How Many Kinds of Constituents are There, con’t

The cat with the furry tail purred.

Every dog wore a collar.

Most of the children in the room brought a dog with a furry tail and a collar.

The furry tail brought a room.

Every room purred.

A dog with a furry tail and a collar purred.

Mary saw most of the children in the room.

NPs occur as subjects, objects of verbs, and objects of prepositions.

Page 11: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Single Word Constituents

Single word constituents are exactly the parts of speech that we have already considered.

How many of these single word constituent types are there? Look at sizes of tagsets.

Lots of design decisions:

Sue bought the big white house.

* Sue bought the white big house.

Are big and white the same POS?

Page 12: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Simple Constituent Types Don’t Capture Everything

* The cat with a furry tail purred a collar.

Mary imagined a cat with a furry tail.

Mary decided to go.

* Mary decided a cat with a furry tail.

Mary decided a cat with a furry tail would be her next pet.

Mary gave Lucy the food.

* Mary decided Lucy the food.

Page 13: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Subcategorization

Frame Verb ExampleØ eat, sleep, … I want to eatNP prefer, find, leave, ... Find [NP the flight from Pittsburgh to Boston]NP NP show, give, … Show [NP me] [NP airlines with flights from Pittsburgh]PPfrom PPto fly, travel, … I would like to fly [pp from Boston] [pp to Philadelphia]NP PPwith help, load, … Can you help [NP me] [pp with a flight]VPto prefer, want, need, … I would prefer [VPto to go by United airlines]VPbrst can, would, might, … I can [VPbrst go from Boston]S mean Does this mean [S AA has a hub in Boston]?

Page 14: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

The Role of the Lexicon in Parsing

•Serves as the starting point for POS tagging.

•Provides additional information such as subcategorization:

•For verbs

•For adjectives:

I’m angry with Mary. I’m angry at Mary.

I’m mad at Mary. * I’m mad with Mary.

•For nouns:

Jane has a passion for old movies.

Jane has an interest in old movies.

Page 15: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

One Other Barrier to a Small Number of Kinds of Constituents -

Agreement

Number agreement:

The boys want to go to the game(s).

* The boy want to to to the game(s).

Case agreement:

I want to give it to him.

* Me want to give it to he.

In English it’s just pronouns, but not so in many other languages.

Page 16: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

The Solution – Augmenting the Constituent Types

To solve these and other problems, one strategy is to augment constituent types with other sorts of information:

V +pl +[NP NP] VP/NP/NP +pl Show

VP/NP +pl Show me

VP +pl Show me the book.

Page 17: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Specifying a Language

•The set of sentences in English is large (maybe even infinite).

•We want a concise (i. e., much shorter than a list of sentences) definition of it.

•We have a finite (in fact quite small) set of constituent types (NP, VP, etc.) from which to build our description.

So we appeal to recursion and write grammar rules such as:S NP VPVP V NPNP NP PP NP NP S (The boy who went to the store won the game.)PP prep NP

Page 18: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

A Context-Free Grammar for English

If we ignore:•subcategorization•agreement•gapping

Then we can build a context-free grammar for English that does a pretty good job of:

•generating all and only the acceptable sentences, and of

•building reasonable parse trees for those sentences.

We’ll look at whether English is formally context free later.

Page 19: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Context-Free GrammarsA context-free grammar (CFG) is a 4-tuple:

1. A set of non-terminal symbols N

2. A set of terminals (disjoint from N)

3. A set of productions P, each of the form A , where A is a non-terminal and is a string of symbols from the infinite set of strings (N)*

4. A designated start symbol S

In our grammar of English: is the set of POS, and• N is the set of remaining constituent types, e.g., NP, VP, PP

Page 20: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Derivations Using CFGsThe standard formal definition:

LG generated by grammar G is the set of strings composed of terminal symbols which can be derived from the designated start symbol S.

LG = {w | w is in * and S w}

But we won’t generally want our grammar to have to all the way to words. We want to let the lexicon do that. That’s why we let be the set of POS. So the grammar may generate strings such as:

N V Det N

Page 21: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Derivations Using CFGs

So we will use the following definition:

LG = {s | w is in * and S w and s can be derived from w by substituting words for POS as licensed by the lexicon}

Note that this doesn’t change the formal picture. We could instead augment our grammar with tens of thousands of rules of the form: N phlogiston

This is a system design decision.

Page 22: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Context-Free Grammars and Parse Trees

S

VPNP

Name V NP

Det NJohn ate

the pizza

S NP VPNP NameNP Det NVP V NP

(S (NP (NAME John))(VP (V ate)

(NP (ART the)(N pizza))))

Page 23: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Long Distance Dependencies

Who did she say she saw ____ coming down the hill?

She did say she saw who coming down the hill.

The boy she saw coming down the road was crying.

The boy she saw _____ coming down the road was crying.

Page 24: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Long Distance Dependencies – A Linguistic Solution

Transformational Grammar (Chomsky, 1965):

•A context free grammar generates base forms

•A transformational component moves constituents around and may delete them from the surface form.

But how can we run these rules backwards?

This approach went out of fashion at least 20 years ago.

Page 25: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Long Distance Dependencies – Computational Solutions

•Augmented Transition Networks: All arbitrary actions on the arcs. These permit insertions and movements of constituents.

But any procedural solution won’t be reversible for generation.

•Unification systems: Declarative patterns for assigning constituents to fill subcategorization slots.

Page 26: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Spoken Language SyntaxSpeech is collected in utterances rather than in text.

Spoken language is looser than written with more pauses, ‘nonverbal events’, disfluencies such as er, uh, um.

Sample spoken language utterances from users interacting with ATIS

Page 27: English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –

Spoken Language Syntax

The repair often has the same structure as the constituent immediately before the interruption point.