An Introduction to Natural Language Syntaxcs626-449/cs626-460-2008/public_html/2… · An...

Preview:

Citation preview

13 Jan 2006

An Introduction to Natural Language Syntax

Rajat Mohantyrkm@cse.iitb.ac.in

CS-460/IT-632Department of Computer Science and EngineeringIndian Institute of Technology, Bombay

13 Jan 2006

Outline

Grammatical Analysis Finite State GrammarPhrase Structure Grammar Transformational GrammarNatural Language Phenomena

13 Jan 2006

A Ubiquitous Task for NLP

Sequence labeling task can be at different levels.In written text

WordsPhrasesSentencesParagraphs

13 Jan 2006

Names for Labeling Tasks

Words: Part of Speech tagging

Phrases: Chunking

Sentences: Parsing

Paragraphs: Co-reference annotating

13 Jan 2006

Example (Words: POS Tagging)

<s> The dispute shows clearly the global power of Japan's financial titans.</s>

<s>[ The/DT dispute/NN ] shows/VBZ clearly/RB [ the/DT global/JJ power/NN ]of/IN [ Japan/NNP 's/POS financial/JJ titans/NNS ]./.

</s>

13 Jan 2006

Example (Phrases: Chunking)

The dispute

shows clearly

the global power

Japan's financial titans

of

13 Jan 2006

Example (Sentences: Parsing)

( (S (NP-SBJ The dispute)(VP shows

(ADVP-MNR clearly)(NP (NP the global power)

(PP of(NP (NP Japan 's)

financial titans)))).))

13 Jan 2006

Parse TreeS

VP

V

globalshows

NP

Det

The dispute

JJ

NP

PP

of Japan’s financial

titans

NP

Det N

the power

N

13 Jan 2006

Example (Sentences: Co-referencing)

( (S (NP-SBJ-1 The banks)(VP (ADVP-MNR badly)

want(S (NP-SBJ *-1)

(VP to(VP break

(PP into(NP (NP all aspects)

(PP of(NP the securities business))))))))

13 Jan 2006

What is Grammar?

A theory of languageA theory of competence of a native speaker (in the context of a Natural Language)A finite set of rules

that generates only and all sentences of a language.that assigns an appropriate structural description to each one.

An explicit model of competence

13 Jan 2006

What are the requirements?

An explicit model of competenceShould be able to generate an infinite set of grammatical sentences of the languageShould not generate any ungrammatical onesShould be able to account for ambiguities (i.e., If a sentence is understood to have two meanings, the grammar should give two different structural description)If two sentences are understood to have same meaning, the grammar should give the same structure for both at some levelIf two sentences are understood to have different internal relationship, the grammar should assign different structural description

13 Jan 2006

What is Syntax?

Syntax is the study of the combination of words into phrases, clauses and sentences

Syntax describes how sentences and their constituents are structured

13 Jan 2006

Grammatical Analysis Techniques

Two main devices

MorphologicalCategorialFunctional

SequentialHierarchicalTransformational

Breaking up a String Labeling the Constituents

A grammar may combine any of these devices for grammatical analysis.

13 Jan 2006

Breaking up and LabelingSequential Breaking up

Sequential Breaking up and Morphological Labeling

Sequential Breaking up and Categorial Labeling

Sequential Breaking up and Functional Labeling

Hierarchical Breaking upHierarchical Breaking up

and Categorial LabelingHierarchical Breaking up

and Functional Labeling

13 Jan 2006

Sequential Breaking up

that student solve ed the problem s+ + + + + +

That student solved the problems.

13 Jan 2006

Sequential Breaking up and Morphological Labeling

That student solved the problems.

that student solve ed the problem s

word word stem affix word stem affix

13 Jan 2006

Sequential Breaking up and Categorial Labeling

This boy can solve the problem.

They called her a taxi.

this boy can solve the problem

Det N Aux V Det N

They call ed taxi

Pron V Affix N

her

Pron

a

Det

13 Jan 2006

Sequential Breaking up and Functional Labeling

They called taxi

Subject Verbal IndirectObject

her

Direct Object

a

They called

Subject Verbal

taxi

DirectObject

her

Indirect Object

a

13 Jan 2006

Hierarchical Breaking up

Old men and women

Old men and women

Old men and

women

Old men and women Old men and women

womenandmenmenOld

13 Jan 2006

Hierarchical Breaking up and Categorial Labeling

S

VP

V Adv

ran away

NP

A N

Poor John

Poor John ran away.

13 Jan 2006

Hierarchical Breaking up and Functional Labeling

Immediate Constituent (IC) AnalysisConstruction types in terms of the function of the constituents:

Predication (subject + predicate)Modification (modifier + head)Complementation (verbal + complement)Subordination (subordinator + dependent unit)Coordination (independent unit + coordinator)

13 Jan 2006

Predication

[Birds]subject [fly]predicate

S

PredicateSubject

Birds fly

13 Jan 2006

Modification

[A]modifier [flower]head

John [slept]head [in the room]modifier

S

PredicateSubject

HeadJohn Modifier

slept In the room

13 Jan 2006

Complementation

He [saw]verbal [a lake]complement

S

PredicateSubject

VerbalHe Complement

saw a lake

13 Jan 2006

Subordination

John slept [in]subordinator [the room]dependent unit

S

PredicateSubject

HeadJohn Modifier

slept

the room

Subordinator Dependent Unit

in

13 Jan 2006

Coordination

[John came in time] independent unit [but]coordinator[Mary was not ready] independent unit

S

CoordinatorIndependent Unit

John came in time but Mary was not ready

Independent Unit

13 Jan 2006

S

HeadModifier

In the morning,the sky looked much brighter

Subordinator DU PredicateSubject

Head

Head

Head Verbal ComplementModifierModifier

Modifier

In the morning, the sky looked much brighter.

An Example

13 Jan 2006

Hierarchical Breaking up and Categorial / Functional Labeling

Hierarchical Breaking up coupled with Categorial /Functional Labeling is a very powerful device.

But there are ambiguities which demand something more powerful.

E.g., Love of GodSomeone loves GodGod loves someone

13 Jan 2006

Hierarchical Breaking up

Love of God Love of God

Noun Phrase

Prepositional Phrase

Head

DU

Modifier

Godoflove

Sub

love of God

Categorial Labeling Functional Labeling

13 Jan 2006

Types of Generative Grammar

Finite State Model (sequential)

Phrase Structure Model (sequential + hierarchical) + (categorial)

Transformational Model (sequential + hierarchical + transformational)

+ (categorial + functional)

13 Jan 2006

Finite State Model

THE MAN

MENCOME

COMES

THEMAN

MENCOME

COMES

OLD

The machine begins in the

initial state, runs through a

sequence of states (producing a word with each transition), and ends in the final state (producing

a sentence)

13 Jan 2006

Phrase Structure Model

13 Jan 2006

Phrase Structure Grammar (PSG)

A phrase-structure grammar G consists of a four tuple (V, T, S, P), where V is a finite set of alphabets (or vocabulary)

E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP, student, sing, etc.

T is a finite set of terminal symbols: T ⊂ VE.g., student, sing, etc.

S is a distinguished non-terminal symbol, also called start symbol: S ∈ VP is a set of production rules

13 Jan 2006

Noun Phrases

John

NP

N

student

NP

N

the

Det

student

NP

N

the

Det

intelligent

AdjP

John the student the intelligent student

13 Jan 2006

Noun Phrase

five

NP

Quant

his

Det

first

Ord

students

N

PhD

N

his first five PhD students

13 Jan 2006

Noun Phrase

five

NP

Quant

the

Det

students

N

best

AP

of my class

PP

The five best students of my class

13 Jan 2006

Verb Phrases

sing

VP

V

can

Aux

the ball

VP

NP

can

Aux

hit

V

can sing can hit the ball

13 Jan 2006

Verb Phrase

a flower

VP

NP

can

Aux

give

V

to Mary

PP

Can give a flower to Mary

13 Jan 2006

Verb Phrase

John

VP

NP

may

Aux

make

V

the chairman

NP

may make John the chairman

13 Jan 2006

Verb Phrase

the book

VP

NP

may

Aux

find

V

very interesting

AP

may find the book very interesting

13 Jan 2006

Prepositional Phrases

in the classroom

the river

PP

NP

near

P

the classroom

PP

NP

in

P

near the river

13 Jan 2006

Adjective Phrases

intelligent

AP

A

honest

AP

A

very

Degree

of sweets

AP

PP

fond

A

intelligent very honest fond of sweets

13 Jan 2006

Adjective Phrase

• very worried that she might have done badly in the assignment

that she might have done badly in the assignment

AP

S’

very

Degree

worried

A

13 Jan 2006

Phrase Structure Rules

Rewrite Rules:1. S NP VP2. NP Det N3. VP V NP4. Det the5. N boy, ball6. V hit

We interpret each rule X Y as the instruction rewrite X as Y.

The boy hit the ball.

13 Jan 2006

Derivation

SentenceNP + VP (1) S NP VPDet + N + VP (2) NP Det NDet + N + V + NP (3) VP V NPThe + N + V + NP (4) Det theThe + boy + V + NP (5) N boyThe + boy + hit + NP (6) V hitThe + boy + hit + Det + N (2) NP Det NThe + boy + hit + the + N (4) Det theThe + boy + hit + the + ball (5) N ball

The boy hit the ball.

13 Jan 2006

PSG Parse Tree

The boy hit the ball.S

VPNP

VNDet

the

NP

NDet

the ball

boy hit

13 Jan 2006

PSG Parse TreeJohn wrote those words in the Book of Proverbs.

S

VPNP

VPropN NP

John wrote thosewords

PP

NP

in

P

thebook

ofproverbs

NP PP

13 Jan 2006

Transformational Model

13 Jan 2006

Transformational Grammar

If a generative grammar makes use of all the three

SequentialHierarchicaltransformational

breaking up and two categorialfunctional

labeling is called a Transformational grammar (Universal Grammar).

13 Jan 2006

Other Grammar Formalisms

Lexical Functional Grammar (LFG)Generalised Phrase Structure Grammar (GPSG)Tree Adjoining Grammar (TAG)Categorial Grammar (CG)Head-driven Phrase Structure Grammar (HPSG)Systemic Functional Grammar (SFG)

13 Jan 2006

Levels of Representation in Universal Grammar (UG)

Lexicon

Move -alphaD(eep)-Structure

S(urface)-Structure

LF (logical form)

PF (phonetic form)

13 Jan 2006

Interacting subsystems

UG consists of interacting subsystems Various subcomponents of the rule system of grammarSubsystems of Principles

13 Jan 2006

Subcomponents

Subcomponents of the rule systemLexiconSyntax

Categorial componentTransformational component

PF-componentLF-component

13 Jan 2006

Principles

Subsystem of PrinciplesX-bar TheoryTheta-theoryGovernment Binding Principles Case TheoryControl Theory

13 Jan 2006

Issues in Phrase Structure Grammar

LimitationOvergeneration

SolutionsSubcategorization RestrictionsSelectional Restriction

13 Jan 2006

Overgeneration

UngrammaticalityThe boy relied on the girl.* The boy relied the girl.*The boy relied.

Grammatically sound but semantically odd*The boy frightens sincerity.*Sincerity kicked the boy.

13 Jan 2006

Ungrammaticality

Given sentences:The boy relied on the girl.* The boy relied the girl.*The boy relied.

PS Rules: VP V (NP) (PP)NP Det NV relyDet theN boy | girl

13 Jan 2006

Subcategorization Frame

Specify the categorial class of the lexical item.Specify the environment.Examples:

kick: [V; _ NP]cry: [V; _ ] rely: [V; _PP] put: [V; _ NP PP]think: : [V; _ S` ]

13 Jan 2006

Subcategorization Frame

forwardV__ NP PP

invitationN__ PP

accessibleA__ PP

e.g., An invitation to the party

e.g., A program making science is more accessible to young people

e.g., We will be forwarding our new catalogue to you

13 Jan 2006

Subcategorization Rules

V y /_NP]_ ]_PP]_NP PP]_S`]

Subcategorization Rule:

13 Jan 2006

Applying Subcategorization Rules

1. S NP VP2. VP V (NP) (PP) (S`)…3. NP Det N4. V rely / _PP]5. P on / _NP]6. Det the7. N boy, girl

* The boy relied the girl.*The boy relied.

• The boy relied on the girl.

13 Jan 2006

Semantically Odd Constructions

Can we exclude these two ill-formed structures ?

*The boy frightened sincerity.*Sincerity kicked the boy.

Necessity of a mechanism

13 Jan 2006

Selectional Restrictions

Inherent Properties of Nouns:[+/- ABSTRACT], [+/- ANIMATE]

E.g.,Sincerity [+ ABSTRACT]Boy [+ANIMATE]

Lexical information of this type can be used to set up a context sensitive ‘rewrite rule’.

13 Jan 2006

Selectional Rules

A selectional rule specifies certain selectional restrictions associated with a verb.

V y /[+/-ABSTARCT]

[+/-ANIMATE]

V frighten/ [+/-ABSTARCT]

[+ANIMATE]

____

__

__

*The boy frightened sincerity.*Sincerity kicked the boy.

13 Jan 2006

Nature of Transformation

TopicalizationTopicalized NPTopicalized PP

MovementWh-movementRelative Pronoun movement

13 Jan 2006

TopicalizationI can solve this problem.This problem, I can solve.I can solve *(this problem).

S

VP

Aux NP

can Det N

V

solve

the problem

NP

I

Pron

13 Jan 2006

TopicalizationThis problem, I can solve.

S

VPNP

I

Aux NP

can t(race)i

V

solve

NPi

Det N

this problem

Pron

13 Jan 2006

TopicalizationTo John, Mary gave the book.

S

VPNP

V

Mary gave

PPi

P

to John

NNP

PP

t(race)i

NP

Det N

the book

N

13 Jan 2006

Wh-movement

John can solve this problem.Which problem can John solve?

S

VP

Aux NP

can Det N

V

solve

this problem

NP

John

N

13 Jan 2006

Wh-movement

S

VPNP

John

Aux

NP

can t(race)i

V

solve

NPi

Wh-Det N

which problem

N

Comp

S`

[Which problemi can John solve ti ? ]

13 Jan 2006

Relative Pronoun Movement

John heard the claim which Bill made.S

VP

V NP

heardDet N S`

the claimi

NP

John

N

13 Jan 2006

Relative Pronoun Movement

S`

S

NP VP

N V NP

made t(race)i

Comp

Rel-Pron

NP

whichi Bill

[the claim whichi Bill made ti ].NP

DetN

the claimi

13 Jan 2006

Relative Pronoun Movement[The problemi thati he solved ti was easy].

S`

S

NP VP

Pron V NP

Comp

Rel-Pron

NP

solvedthati

S

NP VP

he

V AP

was

easy

Det N

the problemi

A

t(race)i

13 Jan 2006

Parser Output

The problem that he solved was easy.The problem that he solved was easy.

SBAR

S

NP VP

PRP VBD

IN

solvedthat he

NP VP

AUX ADJP

was

easy

DT NN

the problem

JJ

S

13 Jan 2006

X-bar Theory

It tells us how words are combined to make phrases and sentences.

It captures the commonality between different types of phrases, which PS-rules cannot.

13 Jan 2006

X-bar Projection

XP

X `

X ZP

YP

(Maximal projection)

(Intermediate projection)

(Zero projection)

13 Jan 2006

X-bar Projection

XP

X ZP

X `YP

(X-phrase)

(Head)

(Specifier)

(Complement)

13 Jan 2006

X-bar Projection

XP

X `

X

ZP

YP(Specifier)

X `

ZP(Head) (Complement)

(Adjunct)

13 Jan 2006

X-bar Projection

NP

N `

PP

NP

John’s N

solution

to the problem

13 Jan 2006

X-bar Projection

NP

N `

N

PP

Det

of the cricket match

theN `

PP

discussion

In the cabinet meeting

13 Jan 2006

X-bar Theory

[Specifier-Head-Complement]SHC

[Specifier-Complement-Head]SCH

[Head-Complement-Specifier]HCS

Every phrase is endocentric. There is a specific relation between the specifier and the head, i.e., Spec-Head configuration.

13 Jan 2006

C(onstituent)-command

C-command is a structural relation among the terminal and non-terminal nodes in a syntactic treeα c-commands β iff:

the first branching node dominating α also dominates βα does not dominate β

A

B

C D

E

F G

13 Jan 2006

C-commandNP

N `

N

Det

the cricket match

the N `

PP

discussion P

of

NP

PP

P

of

NP

N `Det

the

meetingN

13 Jan 2006

Government

α governs β iffα is a lexical head (or tensed I)α C-commands βNo barrier (VP, NP, PP, AP, or tensed IP) intervenes between α and β

13 Jan 2006

Theta-Theory

Hit: <1,2> (argument structure)<Agent, Patient> (thematic structure)

Smile: <1> (argument structure)<Agent> (thematic structure)

Forward: <1,2,3> (argument structure)<Agent, Theme, Goal> (thematic structure)

Theta-CriterionEach argument must be assigned a theta-roleEach theta-role must be assigned to an argument

13 Jan 2006

Thematic Roles

The man forwarded the mail to the minister.

forward

V__ NP PP

Event FORWARD [Agent THE MAN], [Theme THE MAIL],

[Goal TO THE MINISTER]

()

13 Jan 2006

Binding Principles

A relation, called Bindingα binds β iff

α c-commands βα and β are co-indexed

Rajivi likes himselfi.

13 Jan 2006

BindingIP

I `NP

Rajiv

I

TenseAGR

VP

NP

NPV

V `

like

himselfi

N`

t

N`

N

N

13 Jan 2006

BindingIP

I `NP

Rajiv’sbrother I

TenseAGR

VP

NP

NPV

V `

like N`

t

Nhimselfi

13 Jan 2006

Binding

Rajivi’s brotherj likes himself*i /j[Rajiv’s brother] is the antecedent of [himself]. [Rajiv] cannot be the antecedent of [himself].That is, the sentence cannot mean that “Rajivi’sbrother likes Rajivi”.A particular kind of structural relation is maintained between [Rajiv’s brother] and [himself], but not between [Rajiv] and [himself].This structural relation is called

C(onstituent)-command.

13 Jan 2006

For the purpose of interpretation, noun phrases have been conveniently divided into three groups:

Anaphors (Reflexives and Reciprocals)e.g., myself, yourself, each other, one another, etc

Pronounse.g. he, she, it, we, etc

R-Expressions e.g., John, Mumbai

Binding

13 Jan 2006

Binding Principles

Principle A: An anaphor is bound in its governing category

Rajivi likes himselfiPrinciple B: A pronominal is free in its governing category

Rajivi likes him*i / j

Principle C: An R-expression is always freeJohn likes Mary

ExamplesWe think that nobody likes us.*We think that nobody likes ourselves.

13 Jan 2006

Natural Language PhenomenaAgreement

Subject-verb agreementAgreement in Relative Pronouns (English):

The man who/*which I sawThe book which/*who I saw

AmbiguityThe mayor asked the police to stop drinking after midnight.Yesterday I saw a crane in the campus.

Negation ScopeJohn did not deliberately broke the glass.John deliberately did not broke the glass.

Quantifier ScopeEvery student likes a teacher in the class.

GappingJohn bought a story book and Mary a pen.Meena was crying because her mother was.

13 Jan 2006

Natural Language PhenomenaScrambling effectSlifting

John has robbed the bank, I believe.Sluicing

John bought something but I don’t know what [John bought t].Question

Auxiliary InversionWh-frontingIntonationWh-in situ

Control StructuresI compelled John to read this article.I promised John to read this article.

13 Jan 2006

Suggested ReadingsChomsky, N. 1957. Syntactic Structures. Mouton, The Hague.Chomsky, N. 1981. Lectures on Government and Binding. MIT, Mass.Radford, A. 1988. Transformational Grammar. CUP.Jurafsky, D and J. Martin, 2000. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, New Jersey.Allen, James, 1995. Natural Language Understanding. The Benjamins/Cummings Publishing Company, Inc. UK.

13 Jan 2006

Thank You

Recommended