The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign

The origin of syntactic bootstrapping: A computational model

Michael Connor, Cynthia Fisher, & Dan Roth

University of Illinois at Urbana-Champaign

How do children begin to understand sentences?

Topid rivvo den marplox.

Two problems of ambiguity

"the language"


"the world"

eating (sheep, food)feed (Johanna, sheep)help (Daddy, Johanna)scared (Johanna, sheep)

are-nice (sheep)want (sheep, food)

getting away (brother)

"the sentence" "the world"

Typical view: knowledge of meaning drives knowledge of syntax


feed (Johanna, sheep)

"the language" "the world"

Typical view: knowledge of meaning drives knowledge of grammar


feed (Johanna, sheep)

With or without assumed innate constraints on syntax-semantics linking (e.g. Pinker, 1984, 1989; Tomasello, 2003)

Sentence-meaning pairs

The problem of sentence and verb meanings

• Predicate terms don't label events, but adopt different perspectives on them (e.g., Clark, 1990; Fillmore, 1977; Gillette, Gleitman, Gleitman, & Lederer., 1999; Rispoli, 1989; Slobin, 1985)

"I'll put this here."versus

"This goes here."

"the sentence" "the world"

If we take seriously the ambiguity of scenes ...


eating (sheep, food)feed (Johanna, sheep)help (Daddy, Johanna)scared (Johanna, sheep)

are-nice (sheep)want (sheep, food)

getting away (brother)

Syntactic bootstrapping(Gleitman, 1990; Gleitman et al., 2005; Landau & Gleitman, 1985;

Naigles, 1990)

"I'll put this here."versus

"This goes here."

sentence structure verbs' semantic

predicate-argument structure

How could this be?

How could aspects of syntactic structure guide early sentence interpretation ... before the child has learned much about the syntax and morphology of this language?

Three main claims:

(1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press)

Three main claims:

(1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press)

(2) Early abstraction: Children are biased toward abstract representations of language experience. (Gertner, Fisher & Eisengart, 2006; Thothathiri & Snedeker, 2008)

(3) Independent encoding of sentence structure: Children gather distributional facts about verbs from listening experience. (Arunachalam & Waxman, 2010; Scott & Fisher, 2009; Yuan & Fisher, 2009)

A fourth prediction:

(4) 'Real' syntactic bootstrapping: Children can learn which words are verbs by tracking their syntactic argument-taking behavior in sentences.

A computational model of syntactic bootstrapping

• Computational experiments simulate a learner whose knowledge of sentence structure is under our control

• We equip the model with an unlearned bias to map nouns onto abstract semantic roles, and ask:

– Are partial representations based on sets of nouns useful for learning more about the native language?

• First case study: learning English word order

– Could a learner identify verbs as verbs by noting their tendency to occur with particular numbers of arguments?

Computational Model of Semantic Role Labeling (SRL)

• A semantic analysis of sentences at the level of who does what to whom

• For each verb in the sentence:

– SRL system tries to identify all constituents that fill a semantic role

– and to assign them roles (agent, patient, goal, etc.)

Predicate Identification:

Feature Extraction:

Argument Classification:

External Labeled Feedback

3

4

5

Parsing:

Argument Identification:

1

2

I like dark bread .

arg=I NP < Sverb= like

A0 A1 ...

arg=bread

NP < VPverb= like

A0 A1 ...

Semantic Role Labeling (SRL): The basic idea

I is an agentA0

bread is a patientA1

The nature of the semantic roles

• A key assumption of the SRL is that the semantic roles are abstract: – This is a property of the PropBank annotation scheme: verb-

specific roles are grouped into macro-roles (Palmer, Gildea, & Kinsbury, 2005; cf. Dowty, 1991)

Give:Arg0:Giver

Arg1:Thing given

Arg2:Recipient

Like:Arg0:Liker

Arg1:Object ofaffection

Have:Arg0:Owner

Arg1:Possession


• A key assumption of the SRL is that the semantic roles are abstract: – This is a result of the PropBank annotation scheme: verb specific

roles are grouped into macro-roles (e.g., Dowty, 1991)

Like:Arg0:Liker


Give:Arg0:Giver

Arg1:Thing given

Arg2:Recipient

Have:Arg0:Owner

Arg1:Possession

Agent


• A key assumption of the SRL is that the semantic roles are abstract:

– This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991)

• So, like children, an SRL learns to identify agents and patients in sentences, not givers and things-given.

Give:Arg0:Giver

Arg1:Thing given

Arg2:Recipient

Have:Arg0:Owner

Arg1:Possession

Patient/Theme

Like:Arg0:Liker



Feature Extraction:



3

4

5

Parsing:


1

2

I like dark bread .

arg=I NP < Sverb= like

A0 A1 ...

arg=bread

NP < VPverb= like

A0 A1 ...

This SRL knows the grammar ... & reads mindsI is an agent

A0bread is a patient

A1


Feature Extraction:



3

4

5

Parsing:


1

2

arg=? ?? < ?verb=

???

A0 A1 ...

arg=? ?? < ?verb=

???

A0 A1 ...

This SRL knows the grammar ... & reads minds

what are they talking about?



Feature Extraction:



3

4

5

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.

58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk

78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N

arg=I1st of two

before verb

verb= like

A0 A1 ...

arg=bread

2nd of two

after verb

verb= like

A0 A1 ...

Baby SRL

I is an agentA0



Feature Extraction:


InternalAnimacyFeedback

3

4

5

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.


78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N

arg=I1st of two

before verb

verb= like

A0 A1 ...

arg=bread

2nd of two

after verb

verb= like

A0 A1 ...

Baby SRL

I is animateA0

bread is unknown,agent is Inot A0

Unsupervised Part-of-Speech Clustering using a Hidden Markov Model (HMM)

• Yields context-sensitive clustering of words based on sequential dependencies (cf. Bannard et al., 2004; Chang et al., 2006; Mintz, 2003; Solan et al., 2005)

• A priori split between content and function words based on phonological differences (e.g., Shi et al., 1998; Connor et al., 2010)

– 80 hidden states, 30 pre-allocated to function words

• Trained on ~ 1 million words of unlabeled text from CHILDES

HMM:1 Trained on CDS 58

I

78

like

50

dark

48

bread

1

.


Linking rules

• But these clusters are unlabeled

• How do we link unlabeled clusters with particular grammatical categories, thus particular roles in sentence interpretation (SRL)?

– nouns, which are candidate arguments

– and verbs, which take arguments

HMM:1 Trained on CDS 58

I

78

like

50

dark

48

bread

1

.


Linking rules: Nouns first

• Syntactic bootstrapping is grounded in noun learning (e.g., Gillette et al., 1999)

– Children learn the meanings of some nouns without syntactic knowledge

• Each noun is assumed to be a candidate argument– A simple form of semantic bootstrapping (Pinker, 1984)

• (10 to 75) Seed Nouns sampled from M-CDI

1

2

HMM:


Trained on CDS

Seed Nouns N

58

I

78

like

50

dark

N

48

bread

1

.


Linking rules: What about verbs?

• Verbs take noun arguments

• Via syntactic bootstrapping, children identify a word as a verb by tracking its argument-taking behavior in sentences


3

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

78

like

50

dark

N

48

bread

1

.


Linking rules: What about verbs?

• Collect statistics over HMM training corpus: how often each state occurs with a particular number of arguments

• For each SRL input sentence, choose the word whose state is most likely to appear with the number of arguments found in the current sentence.


3

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.


78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N

78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N


Feature Extraction:


3

4

5

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.


arg=I1st of two

before verb

verb= like

A0 A1 ...

arg=bread

2nd of two

after verb

verb= like

A0 A1 ...

Baby SRL

This is a partial sentence representation.

78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N


Feature Extraction:


3

4

5

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.


arg=I1st of two

before verb

verb= like

A0 A1 ...

arg=bread

2nd of two

after verb

verb= like

A0 A1 ...

Baby SRL

This is a partial sentence representation.


I is an agentA0


Semantic-role feedback???

• Theories of language acquisition assume learning to understand sentences is a partially-supervised task.

– use existing knowledge of words & syntax to assign a meaning to a sentence

– the appropriateness of the meaning in context then provides the feedback

[Johanna rivvo den sheep.]


Baby SRL with Animacy-based Feedback

Bootstrapped animacy feedback (a list of concrete nouns):

1. Animates are agents; inanimates are non-agents

2. Each verb has at most one agent

3. If there are two known animate nouns

– Use current SRL classifier to choose the best agent (and use this decision as feedback to train the SRL)

• Note: Animacy feedback trains a simplified classifier, with only an agent/non-agent role distinction

Internal AnimacyFeedback

I is animateA0


Feature Extraction:


InternalAnimacyFeedback

3

4

5

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.


78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N

arg=I1st of two

before verb

verb= like

A0 notA0

arg=bread

2nd of two

after verb

verb= like

A0 notA0

Baby SRL

I is animateA0


Training the Baby SRL

3 corpora of child-directed speech

– Annotated parental utterances using PropBank annotation scheme

– Speech to Adam, Eve, Sarah (Brown, 1973)

Adam 01-23 (2;3 - 3;2)

• Train on 01-20: 3951 propositions, 8107 arguments

Eve 01-20 (1;6 - 2;3)


Sarah 01-90 (2;3 - 4;1)


Testing the Baby SRL

• Constructed test sentences like those used in experiments with children

– Unknown verbs & two animate nouns force the SRL to rely on syntactic knowledge

"Adam krads Daddy!"

krad

AdamMommyDaddyUrsula

...

AdamMommyDaddyUrsula

...

Testing the Baby SRL

• Compare systems trained on different syntactic features derived from the partial sentence representation

• Lexical baseline: Verb and argumentword features only

• Noun Pattern: Lexical features plusfeatures such as '1st of 2 nouns', '2nd

of 2 nouns'

• Verb Position: Lexical features plus'before verb', 'after verb'

• Combined: All feature types

arg=Iverb= like

arg=I1st of two

before verb

verb= like

arg=I1st of two

verb= like

arg=Ibefore verb

verb= like

Question 1: Are partial sentence representations useful, in principle, for learning new syntactic knowledge?

Learning English word order

Start with "gold standard" part-of-speech tagging and "gold standard" semantic role feedback

Are partial sentence representations useful, as a starting point for learning?

• We assume that representations as simple as an ordered set of nouns can yield useful information about verb argument structure and sentence interpretation.

• Is this true, in ordinary samples of language use?

• Despite all the noise this simple representation will add to the data?

Result 1: Partial sentence representations are useful for further syntax learning (A krads B)

Question 2: Can partial sentence representations be built via unsupervised clustering ... and the proposed linking rules?


3

HMM:


1

2

Trained on CDS

Seed Nouns N

58

I

V

78

like

50

dark

N

48

bread

1

.


78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N

Result 2: Partial sentence representations can be built via unsupervised clustering

Number of known seed nouns

Question 3: Now for the Baby SRL Are partial sentence representations built via unsupervised clustering useful for learning new syntactic knowledge? With animacy-based feedback?

Learning English word order

Starting from scratch ...

Result 3: Partial sentence representations based on unsupervised clustering are useful

SRL Summary

• Result 1: Partial sentence representations are useful for further syntax learning (gold arguments & feedback)

• Result 2: Partial sentence representations can be built via unsupervised clustering, and the proposed linking rules:

– Semantic bootstrapping for nouns, 'real' syntactic bootstrapping for verbs

• Result 3: Partial sentence representations based on unsupervised clustering are useful for further syntax learning

– Robust learning based on noisy unsupervised 'parse' of sentences, and noisy animacy-based feedback

• verbs precede objects in English

Three main claims:

(1) Structure-mapping: Syntactic bootstrapping begins with an unlearned bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms.

(2) Early abstraction: Children are biased toward usefully abstract representations of language experience.

(3) Independent encoding of sentence structure: Children gather distributional facts about new verbs from listening experience.

A fourth prediction:

(4) 'Real' syntactic bootstrapping: Children learn which words are the verbs by tracking their syntactic argument-taking behavior in sentences

Hey, she pushed her.Will you push me on the swing?John pushed the cat off the sofa

...

Verb = Push[noun1, noun2]

2-participant relation

Acknowledgements

Yael Gertner

Jesse Snedeker

Lila Gleitman

NSF

NICHD

Soondo Baek

Kate Messenger

Sylvia Yuan

Rose Scott

Documents

The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign