Upload
bertha-willis
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
The origin of syntactic bootstrapping: A computational model
Michael Connor, Cynthia Fisher, & Dan Roth
University of Illinois at Urbana-Champaign
How do children begin to understand sentences?
Topid rivvo den marplox.
Two problems of ambiguity
"the language"
Topid rivvo den marplox.
"the world"
eating (sheep, food)feed (Johanna, sheep)help (Daddy, Johanna)scared (Johanna, sheep)
are-nice (sheep)want (sheep, food)
getting away (brother)
"the sentence" "the world"
Typical view: knowledge of meaning drives knowledge of syntax
Topid rivvo den marplox.
feed (Johanna, sheep)
"the language" "the world"
Typical view: knowledge of meaning drives knowledge of grammar
Topid rivvo den marplox.
feed (Johanna, sheep)
With or without assumed innate constraints on syntax-semantics linking (e.g. Pinker, 1984, 1989; Tomasello, 2003)
Sentence-meaning pairs
The problem of sentence and verb meanings
• Predicate terms don't label events, but adopt different perspectives on them (e.g., Clark, 1990; Fillmore, 1977; Gillette, Gleitman, Gleitman, & Lederer., 1999; Rispoli, 1989; Slobin, 1985)
"I'll put this here."versus
"This goes here."
"the sentence" "the world"
If we take seriously the ambiguity of scenes ...
Topid rivvo den marplox.
eating (sheep, food)feed (Johanna, sheep)help (Daddy, Johanna)scared (Johanna, sheep)
are-nice (sheep)want (sheep, food)
getting away (brother)
Syntactic bootstrapping(Gleitman, 1990; Gleitman et al., 2005; Landau & Gleitman, 1985;
Naigles, 1990)
"I'll put this here."versus
"This goes here."
sentence structure verbs' semantic
predicate-argument structure
How could this be?
How could aspects of syntactic structure guide early sentence interpretation ... before the child has learned much about the syntax and morphology of this language?
Three main claims:
(1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press)
Three main claims:
(1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press)
(2) Early abstraction: Children are biased toward abstract representations of language experience. (Gertner, Fisher & Eisengart, 2006; Thothathiri & Snedeker, 2008)
(3) Independent encoding of sentence structure: Children gather distributional facts about verbs from listening experience. (Arunachalam & Waxman, 2010; Scott & Fisher, 2009; Yuan & Fisher, 2009)
A fourth prediction:
(4) 'Real' syntactic bootstrapping: Children can learn which words are verbs by tracking their syntactic argument-taking behavior in sentences.
A computational model of syntactic bootstrapping
• Computational experiments simulate a learner whose knowledge of sentence structure is under our control
• We equip the model with an unlearned bias to map nouns onto abstract semantic roles, and ask:
– Are partial representations based on sets of nouns useful for learning more about the native language?
• First case study: learning English word order
– Could a learner identify verbs as verbs by noting their tendency to occur with particular numbers of arguments?
Computational Model of Semantic Role Labeling (SRL)
• A semantic analysis of sentences at the level of who does what to whom
• For each verb in the sentence:
– SRL system tries to identify all constituents that fill a semantic role
– and to assign them roles (agent, patient, goal, etc.)
Predicate Identification:
Feature Extraction:
Argument Classification:
External Labeled Feedback
3
4
5
Parsing:
Argument Identification:
1
2
I like dark bread .
arg=I NP < Sverb= like
A0 A1 ...
arg=bread
NP < VPverb= like
A0 A1 ...
Semantic Role Labeling (SRL): The basic idea
I is an agentA0
bread is a patientA1
The nature of the semantic roles
• A key assumption of the SRL is that the semantic roles are abstract: – This is a property of the PropBank annotation scheme: verb-
specific roles are grouped into macro-roles (Palmer, Gildea, & Kinsbury, 2005; cf. Dowty, 1991)
Give:Arg0:Giver
Arg1:Thing given
Arg2:Recipient
Like:Arg0:Liker
Arg1:Object ofaffection
Have:Arg0:Owner
Arg1:Possession
The nature of the semantic roles
• A key assumption of the SRL is that the semantic roles are abstract: – This is a result of the PropBank annotation scheme: verb specific
roles are grouped into macro-roles (e.g., Dowty, 1991)
Like:Arg0:Liker
Arg1:Object ofaffection
Give:Arg0:Giver
Arg1:Thing given
Arg2:Recipient
Have:Arg0:Owner
Arg1:Possession
Agent
The nature of the semantic roles
• A key assumption of the SRL is that the semantic roles are abstract:
– This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991)
• So, like children, an SRL learns to identify agents and patients in sentences, not givers and things-given.
Give:Arg0:Giver
Arg1:Thing given
Arg2:Recipient
Have:Arg0:Owner
Arg1:Possession
Patient/Theme
Like:Arg0:Liker
Arg1:Object ofaffection
Predicate Identification:
Feature Extraction:
Argument Classification:
External Labeled Feedback
3
4
5
Parsing:
Argument Identification:
1
2
I like dark bread .
arg=I NP < Sverb= like
A0 A1 ...
arg=bread
NP < VPverb= like
A0 A1 ...
This SRL knows the grammar ... & reads mindsI is an agent
A0bread is a patient
A1
Predicate Identification:
Feature Extraction:
Argument Classification:
External Labeled Feedback
3
4
5
Parsing:
Argument Identification:
1
2
arg=? ?? < ?verb=
???
A0 A1 ...
arg=? ?? < ?verb=
???
A0 A1 ...
This SRL knows the grammar ... & reads minds
what are they talking about?
Topid rivvo den marplox.
Predicate Identification:
Feature Extraction:
Argument Classification:
External Labeled Feedback
3
4
5
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
arg=I1st of two
before verb
verb= like
A0 A1 ...
arg=bread
2nd of two
after verb
verb= like
A0 A1 ...
Baby SRL
I is an agentA0
bread is a patientA1
Predicate Identification:
Feature Extraction:
Argument Classification:
InternalAnimacyFeedback
3
4
5
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
arg=I1st of two
before verb
verb= like
A0 A1 ...
arg=bread
2nd of two
after verb
verb= like
A0 A1 ...
Baby SRL
I is animateA0
bread is unknown,agent is Inot A0
Unsupervised Part-of-Speech Clustering using a Hidden Markov Model (HMM)
• Yields context-sensitive clustering of words based on sequential dependencies (cf. Bannard et al., 2004; Chang et al., 2006; Mintz, 2003; Solan et al., 2005)
• A priori split between content and function words based on phonological differences (e.g., Shi et al., 1998; Connor et al., 2010)
– 80 hidden states, 30 pre-allocated to function words
• Trained on ~ 1 million words of unlabeled text from CHILDES
HMM:1 Trained on CDS 58
I
78
like
50
dark
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
Linking rules
• But these clusters are unlabeled
• How do we link unlabeled clusters with particular grammatical categories, thus particular roles in sentence interpretation (SRL)?
– nouns, which are candidate arguments
– and verbs, which take arguments
HMM:1 Trained on CDS 58
I
78
like
50
dark
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
Linking rules: Nouns first
• Syntactic bootstrapping is grounded in noun learning (e.g., Gillette et al., 1999)
– Children learn the meanings of some nouns without syntactic knowledge
• Each noun is assumed to be a candidate argument– A simple form of semantic bootstrapping (Pinker, 1984)
• (10 to 75) Seed Nouns sampled from M-CDI
1
2
HMM:
Argument Identification:
Trained on CDS
Seed Nouns N
58
I
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
Linking rules: What about verbs?
• Verbs take noun arguments
• Via syntactic bootstrapping, children identify a word as a verb by tracking its argument-taking behavior in sentences
Predicate Identification:
3
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
Linking rules: What about verbs?
• Collect statistics over HMM training corpus: how often each state occurs with a particular number of arguments
• For each SRL input sentence, choose the word whose state is most likely to appear with the number of arguments found in the current sentence.
Predicate Identification:
3
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
Predicate Identification:
Feature Extraction:
Argument Classification:
3
4
5
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
arg=I1st of two
before verb
verb= like
A0 A1 ...
arg=bread
2nd of two
after verb
verb= like
A0 A1 ...
Baby SRL
This is a partial sentence representation.
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
Predicate Identification:
Feature Extraction:
Argument Classification:
3
4
5
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
arg=I1st of two
before verb
verb= like
A0 A1 ...
arg=bread
2nd of two
after verb
verb= like
A0 A1 ...
Baby SRL
This is a partial sentence representation.
External Labeled Feedback
I is an agentA0
bread is a patientA1
Semantic-role feedback???
• Theories of language acquisition assume learning to understand sentences is a partially-supervised task.
– use existing knowledge of words & syntax to assign a meaning to a sentence
– the appropriateness of the meaning in context then provides the feedback
[Johanna rivvo den sheep.]
bread is unknown,agent is Inot A0
Baby SRL with Animacy-based Feedback
Bootstrapped animacy feedback (a list of concrete nouns):
1. Animates are agents; inanimates are non-agents
2. Each verb has at most one agent
3. If there are two known animate nouns
– Use current SRL classifier to choose the best agent (and use this decision as feedback to train the SRL)
• Note: Animacy feedback trains a simplified classifier, with only an agent/non-agent role distinction
Internal AnimacyFeedback
I is animateA0
Predicate Identification:
Feature Extraction:
Argument Classification:
InternalAnimacyFeedback
3
4
5
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
arg=I1st of two
before verb
verb= like
A0 notA0
arg=bread
2nd of two
after verb
verb= like
A0 notA0
Baby SRL
I is animateA0
bread is unknown,agent is Inot A0
Training the Baby SRL
3 corpora of child-directed speech
– Annotated parental utterances using PropBank annotation scheme
– Speech to Adam, Eve, Sarah (Brown, 1973)
Adam 01-23 (2;3 - 3;2)
• Train on 01-20: 3951 propositions, 8107 arguments
Eve 01-20 (1;6 - 2;3)
• Train on 01-18: 4029 propositions, 8499 arguments
Sarah 01-90 (2;3 - 4;1)
• Train on 01-83: 8570 propositions, 15599 arguments
Testing the Baby SRL
• Constructed test sentences like those used in experiments with children
– Unknown verbs & two animate nouns force the SRL to rely on syntactic knowledge
"Adam krads Daddy!"
krad
AdamMommyDaddyUrsula
...
AdamMommyDaddyUrsula
...
Testing the Baby SRL
• Compare systems trained on different syntactic features derived from the partial sentence representation
• Lexical baseline: Verb and argumentword features only
• Noun Pattern: Lexical features plusfeatures such as '1st of 2 nouns', '2nd
of 2 nouns'
• Verb Position: Lexical features plus'before verb', 'after verb'
• Combined: All feature types
arg=Iverb= like
arg=I1st of two
before verb
verb= like
arg=I1st of two
verb= like
arg=Ibefore verb
verb= like
Question 1: Are partial sentence representations useful, in principle, for learning new syntactic knowledge?
Learning English word order
Start with "gold standard" part-of-speech tagging and "gold standard" semantic role feedback
Are partial sentence representations useful, as a starting point for learning?
• We assume that representations as simple as an ordered set of nouns can yield useful information about verb argument structure and sentence interpretation.
• Is this true, in ordinary samples of language use?
• Despite all the noise this simple representation will add to the data?
Result 1: Partial sentence representations are useful for further syntax learning (A krads B)
Question 2: Can partial sentence representations be built via unsupervised clustering ... and the proposed linking rules?
Predicate Identification:
3
HMM:
Argument Identification:
1
2
Trained on CDS
Seed Nouns N
58
I
V
78
like
50
dark
N
48
bread
1
.
58: I, you, ya, Kent 78: like, got, did, had50: many, nice, good48: more, juice, milk
78: 13% 1-N, 54% 2-N50: 41% 1-N, 31% 2-N
Result 2: Partial sentence representations can be built via unsupervised clustering
Number of known seed nouns
Question 3: Now for the Baby SRL Are partial sentence representations built via unsupervised clustering useful for learning new syntactic knowledge? With animacy-based feedback?
Learning English word order
Starting from scratch ...
Result 3: Partial sentence representations based on unsupervised clustering are useful
SRL Summary
• Result 1: Partial sentence representations are useful for further syntax learning (gold arguments & feedback)
• Result 2: Partial sentence representations can be built via unsupervised clustering, and the proposed linking rules:
– Semantic bootstrapping for nouns, 'real' syntactic bootstrapping for verbs
• Result 3: Partial sentence representations based on unsupervised clustering are useful for further syntax learning
– Robust learning based on noisy unsupervised 'parse' of sentences, and noisy animacy-based feedback
• verbs precede objects in English
Three main claims:
(1) Structure-mapping: Syntactic bootstrapping begins with an unlearned bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms.
(2) Early abstraction: Children are biased toward usefully abstract representations of language experience.
(3) Independent encoding of sentence structure: Children gather distributional facts about new verbs from listening experience.
A fourth prediction:
(4) 'Real' syntactic bootstrapping: Children learn which words are the verbs by tracking their syntactic argument-taking behavior in sentences
Hey, she pushed her.Will you push me on the swing?John pushed the cat off the sofa
...
Verb = Push[noun1, noun2]
2-participant relation
Acknowledgements
Yael Gertner
Jesse Snedeker
Lila Gleitman
NSF
NICHD
Soondo Baek
Kate Messenger
Sylvia Yuan
Rose Scott