60
Models of Grammar Learning CS 182 Lecture April 26, 2007

Models of Grammar Learning CS 182 Lecture April 26, 2007

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Models of Grammar Learning CS 182 Lecture April 26, 2007

Models of Grammar Learning

CS 182 Lecture

April 26, 2007

Page 2: Models of Grammar Learning CS 182 Lecture April 26, 2007

2

What constitutes learning a language?

What are the sounds (Phonology) How to make words (Morphology) What do words mean (Semantics) How to put words together (Syntax) Social use of language (Pragmatics) Rules of conversations (Pragmatics)

Page 3: Models of Grammar Learning CS 182 Lecture April 26, 2007

3

Language Learning Problem

Prior knowledge Initial grammar G (set of ECG

constructions) Ontology (category relations) Language comprehension model

(analysis/resolution)

Hypothesis space: new ECG grammar G’ Search = processes for proposing new

constructions Relational Mapping, Merge, Compose

Page 4: Models of Grammar Learning CS 182 Lecture April 26, 2007

4

Language Learning Problem

Performance measure Goal: Comprehension should improve with training Criterion: need some objective function to guide

learning…

Minimum Description Length:

)P( log)|P( log)|P( log

)P()|P()|P(

)P(

)P()|P()|P(

MMXXM

MMXXM

X

MMXXM

)P( log)|P( log)|P( log MMXXM

Probability of Model given Data:

Page 5: Models of Grammar Learning CS 182 Lecture April 26, 2007

5

Minimum Description Length

Choose grammar G to minimize cost(G|D): cost(G|D) = α • size(G) + β • complexity(D|G)

Approximates Bayesian learning; cost(G|D) ≈ posterior probability P(G|D)

Size of grammar = size(G) ≈ prior P(G) favor fewer/smaller constructions/roles; isomorphic

mappings

Complexity of data given grammar ≈ likelihood P(D|G) favor simpler analyses

(fewer, more likely constructions) based on derivation length + score of derivation

Page 6: Models of Grammar Learning CS 182 Lecture April 26, 2007

6

Size Of Grammar

Size of the grammar G is the sum of the size of each construction:

Size of each construction c is:

where nc = number of constituents in c,

mc = number of constraints in c,

length(e) = slot chain length of element reference e

Gc

cG )size()size(

ce

cc emnc )length()size(

Page 7: Models of Grammar Learning CS 182 Lecture April 26, 2007

7

What do we know about language development?

(focusing mainly on first language acquisition of English-speaking, normal population)

Page 8: Models of Grammar Learning CS 182 Lecture April 26, 2007

8

Children are amazing learners

cooing

reduplicated babbling

first w

ord

0 mos 2 yr6 mos 3 yrs 4 yrs 5 yrs12 mos

two-word combinatio

ns

multi-word utte

rances

questions,

complex sentence

structures, c

onversatio

nal

princip

les

Page 9: Models of Grammar Learning CS 182 Lecture April 26, 2007

9

Phonology: Non-native contrasts

Werker and Tees (1984) Thompson: velar vs. uvular, /`ki/-/`qi/. Hindi: retroflex vs. dental, /t.a/-/ta/

0

2

4

6

8

10

12

14

16

18

20

6-8 months 8-10 months 10-12 months

yes

no

Page 10: Models of Grammar Learning CS 182 Lecture April 26, 2007

10

Finding words: Statistical learning

Saffran, Aslin and Newport (1996)

/bidaku/, /padoti/, /golabu/ /bidakupadotigolabubidaku/ 2 minutes of this continuous speech

stream By 8 months infants detect the words (vs

non-words and part-words)

pretty baby

Page 11: Models of Grammar Learning CS 182 Lecture April 26, 2007

11

Word order: agent and patient

Hirsch-Pasek and Golinkoff (1996) 1;4-1;7

mostly still in the one-word stage

Where is CM tickling BB?

Page 12: Models of Grammar Learning CS 182 Lecture April 26, 2007

12

Early syntax

agent + action ‘Daddy sit’ action + object ‘drive car’ agent + object ‘Mommy sock’ action + location ‘sit chair’ entity + location ‘toy floor’ possessor + possessed ‘my teddy’ entity + attribute ‘crayon big’ demonstrative + entity ‘this

telephone’

Page 13: Models of Grammar Learning CS 182 Lecture April 26, 2007

13

MOTHER: what are you doing?

NAOMI: I climbing up.

MOTHER: you’re climbing up?

2;0.18

FATHER: what’s the boy doing to the dog?

NAOMI: squeezing his neck.NAOMI: and the dog climbed

up the tree.NAOMI: now they’re both

safe.NAOMI: but he can climb

trees.4;9.3

FATHER: Nomi are you climbing up the books?

NAOMI: up.NAOMI: climbing.NAOMI: books.

1;11.3

Sachs corpus (CHILDES)

From Single Words To Complex Utterances

Page 14: Models of Grammar Learning CS 182 Lecture April 26, 2007

14

How Can Children Be So Good At Learning Language?

Gold’s Theorem:

No superfinite class of language is identifiable in the limit from positive data only

Principles & Parameters

Babies are born as blank slates but acquire language quickly (with noisy input and little correction) → Language must be innate:

Universal Grammar + parameter setting

But babies aren’t born as blank slates!

And they do not learn language in a vacuum!

Page 15: Models of Grammar Learning CS 182 Lecture April 26, 2007

15

Modifications of Gold’s Result

(Weakly) Ordered Examples, implicit negatives

Loosened Identification Conditions Complexity Measures, Best Fit

No Theorems will resolve these issues

Page 16: Models of Grammar Learning CS 182 Lecture April 26, 2007

16

Modeling the acquisition of grammar:

Theoretical assumptions

Page 17: Models of Grammar Learning CS 182 Lecture April 26, 2007

17

Language Acquisition

Opulence of the substrate Prelinguistic children already have rich

sensorimotor representations and sophisticated social knowledge

intention inference, reference resolution language-specific event conceptualizations

(Bloom 2000, Tomasello 1995, Bowerman & Choi, Slobin, et al.)

Children are sensitive to statistical information Phonological transitional probabilities Even dependencies between non-adjacent items

(Saffran et al. 1996, Gomez 2002)

Page 18: Models of Grammar Learning CS 182 Lecture April 26, 2007

18

Language Acquisition

Basic Scenes Simple clause constructions are associated

directly with scenes basic to human experience(Goldberg 1995, Slobin 1985)

Verb Island Hypothesis Children learn their earliest constructions

(arguments, syntactic marking) on a verb-specific basis

(Tomasello 1992)get ball

get bottle

get OBJECT

throw frisbee

throw ball

throw OBJECT…

this should be reminiscent of your

model merging assignment

Page 19: Models of Grammar Learning CS 182 Lecture April 26, 2007

19

Comprehensionis

partial.

(not just for dogs)

Page 20: Models of Grammar Learning CS 182 Lecture April 26, 2007

20

What children pick up from what they hear

Children use rich situational context / cues to fill in the gaps They also have at their disposal embodied knowledge and

statistical correlations (i.e. experience)

what did you throw it into?they’re throwing this in here.they’re throwing a ball.don’t throw it Nomi.

well you really shouldn’t throw things Nomi you know. remember how we told you you shouldn’t throw things.

what did you throw it into?they’re throwing this in here.they’re throwing a ball.don’t throw it Nomi.

well you really shouldn’t throw things Nomi you know. remember how we told you you shouldn’t throw things.

Page 21: Models of Grammar Learning CS 182 Lecture April 26, 2007

21

Language Learning Hypothesis

Children learn constructionsthat bridge the gap between

what they know from language

and

what they know from the rest of cognition

Page 22: Models of Grammar Learning CS 182 Lecture April 26, 2007

22

Modeling the acquisition of (early) grammar:

Comprehension-driven, usage-based

Page 23: Models of Grammar Learning CS 182 Lecture April 26, 2007

Natural Language Processing at Berkeley

Dan Klein

EECS Department

UC Berkeley

Page 24: Models of Grammar Learning CS 182 Lecture April 26, 2007

24

NLP: Motivation

It’d be great if machines could Read text and understand it Translate languages accurately Help us manage, summarize,

and aggregate information Use speech as a UI Talk to us / listen to us

But they can’t Language is complex Language is ambiguous Language is highly structured

Page 25: Models of Grammar Learning CS 182 Lecture April 26, 2007

25

Machine Translation

Syntactic MT Learn grammar

mappings between languages

Fully data-driven

Page 26: Models of Grammar Learning CS 182 Lecture April 26, 2007

26

Information Extraction

Unsupervised Coreference Resolution Take in lots of text Learn what the

entities are and how they corefer

Fully unsupervised, but gets supervised performance!

General research goal: unsupervised learning of meaning

Page 27: Models of Grammar Learning CS 182 Lecture April 26, 2007

27

Syntactic Learning

Grammar Induction Raw text in Learned grammars out Big result: this can be

done!

Grammar Refinement Coarse grammars in Detailed grammars

out Gives top parsing

systems

Page 28: Models of Grammar Learning CS 182 Lecture April 26, 2007

28

Influental members of the House Ways and Means Committee introduced legislation that would restrict how the new S&L bailout agency can raise capital, creating another potential obstacle to the government's sale of sick thrifts.

Syntactic Inference

Natural language is very ambiguous Grammars are huge Billions of parses

to consider Milliseconds to do it

Page 29: Models of Grammar Learning CS 182 Lecture April 26, 2007

30

Idea: Learn PCFGs with EM

Classic experiments on learning PCFGs with Expectation-Maximization [Lari and Young, 1990]

Full binary grammar over n symbols Parse uniformly/randomly at first Re-estimate rule expectations off of parses Repeat

Their conclusion: it doesn’t really work.

Xj

Xi

Xk{ X1 , X2 … Xn }

Page 30: Models of Grammar Learning CS 182 Lecture April 26, 2007

31

:( , , ) ( )

: ( )

( )

( | , , )( )

kT X i j yield T Sc

T yield T S

P T

P X i j SP T

2( | ) 1/a b cP X X X n

Re-estimation of PCFGs

Basic quantity needed for re-estimation with EM:

Can calculate in cubic time with the Inside-Outside algorithm. Consider an initial grammar where all productions have equal weight:

Then all trees have equal probability initially. Therefore, after one round of EM, the posterior over trees will (in the

absence of random perturbation) be approximately uniform over all trees, and symmetric over symbols.

Page 31: Models of Grammar Learning CS 182 Lecture April 26, 2007

32

Problem: “Uniform” Posteriors

Tree Uniform

Split Uniform

Page 32: Models of Grammar Learning CS 182 Lecture April 26, 2007

33

Problem: Model Symmetries

Symmetries

How does this relate to trees?

NOUN VERB ADJ NOUN

X1? X2?X1? X2?

NOUN VERB ADJ NOUN

NOUN

VERB

NOUNVERBADJ

Page 33: Models of Grammar Learning CS 182 Lecture April 26, 2007

34

Overview: NLP at UCB

Lots of research and resources: Dan Klein: Statistical NLP / ML Marti Hearst: Stat NLP / HCI Jerry Feldman: Language and Mind Michael Jordan: Statistical Methods / ML Tom Griffiths: Statistical Learning / Psychology ICSI Speech and AI groups (Morgan, Stolcke,

Shriberg, Narayanan…) Great linguistics and stats departments!

No better place to solve the hard NLP problems!

Page 34: Models of Grammar Learning CS 182 Lecture April 26, 2007

35

Other Approaches

Evaluation: fraction of nodes in gold trees correctly posited in proposed trees (unlabeled recall)

Some recent work in learning constituency: [Adrians, 99] Language grammars aren’t general PCFGs [Clark, 01] Mutual-information filters detect constituents,

then an MDL-guided search assembles them [van Zaanen, 00] Finds low edit-distance sentence pairs

and extracts their differences

Adriaans, 1999 16.8

Clark, 2001 34.6

van Zaanen, 2000 35.6

Page 35: Models of Grammar Learning CS 182 Lecture April 26, 2007

36

Page 36: Models of Grammar Learning CS 182 Lecture April 26, 2007

37

Embodied Construction Grammar (Bergen and

Chang 2005)

construction THROWER-THROW-OBJECTconstructional constituentst1 : REF-EXPRESSIONt2 : THROWt3 : OBJECT-REF

formt1f before t2f

t2f before t3f

meaningt2m.thrower ↔ t1m

t2m.throwee ↔ t3m

role-filler bindings

Page 37: Models of Grammar Learning CS 182 Lecture April 26, 2007

38

“you” you schema Addresseesubcase of Human

FORM (sound) MEANING (stuff)

Analyzing “You Throw The Ball”

“throw” throw

schema Throwroles:

throwerthrowee

“ball” ball schema Ballsubcase of Object

“block” blockschema Block

subcase of Object

t1 before t2t2 before t3

Thrower-Throw-Object

t2.thrower ↔ t1t2.throwee ↔ t3

“the”

Addressee

Throwthrowerthrowee

Ball

Page 38: Models of Grammar Learning CS 182 Lecture April 26, 2007

39

Constructions

(Utterance, Situation)

1. Learner passes input (Utterance + Situation) and current grammar to Analyzer.

Analyze

Semantic Specification,

Constructional Analysis

2. Analyzer produces SemSpec and Constructional Analysis.

3. Learner updates grammar:

Hypothesize

a. Hypothesize new map.

Reorganize

b. Reorganize grammar

(merge or compose).c. Reinforce

(based on usage).

Learning-Analysis Cycle (Chang, 2004)

Page 39: Models of Grammar Learning CS 182 Lecture April 26, 2007

40

Hypothesizing a new construction

through

relational mapping

Page 40: Models of Grammar Learning CS 182 Lecture April 26, 2007

41

“you”

“throw”

“ball”

you

throw

ball

“block” block

schema Addresseesubcase of Human

FORM (sound) MEANING (stuff)lexical constructions

Initial Single-Word Stage

schema Throwroles:

throwerthrowee

schema Ballsubcase of Object

schema Blocksubcase of Object

Page 41: Models of Grammar Learning CS 182 Lecture April 26, 2007

42

“you” you schema Addresseesubcase of Human

FORM MEANING

New Data: “You Throw The Ball”

“throw” throw

schema Throwroles:

throwerthrowee

“ball” ball schema Ballsubcase of Object

“block” blockschema Block

subcase of Object

“the”

Addressee

Throwthrowerthrowee

Ball

Self

SITUATION

Addressee

Throwthrowerthrowee

Ball

before

role-filler

throw-ball

Page 42: Models of Grammar Learning CS 182 Lecture April 26, 2007

43

New Construction Hypothesized

construction THROW-BALLconstructional constituentst : THROWb : BALL

formtf before bf

meaning tm.throwee ↔ bm

Page 43: Models of Grammar Learning CS 182 Lecture April 26, 2007

44

Three kinds of meaning relations

1. When B.m fills a role of A.m

2. When A.m and B.m are both filled by X

3. When A.m and B.m both fill roles of X

throw ball throw.throwee ↔ ball

put ball down put.mover ↔ balldown.tr ↔ ball

Nomi ball possession.possessor ↔ Nomipossession.possessed ↔

ball

Page 44: Models of Grammar Learning CS 182 Lecture April 26, 2007

45

Reorganizing the current grammar

through

merge and compose

Page 45: Models of Grammar Learning CS 182 Lecture April 26, 2007

46

Merging Similar Constructions

throw before block Throw.throwee = Block

throw before ball Throw.throwee = Ball

throw before-s ing Throw.aspect = ongoing

throw-ing the ball

throw the block

throw before Objectf THROW.throwee = Objectm

THROW-OBJECT

Page 46: Models of Grammar Learning CS 182 Lecture April 26, 2007

47

Resulting Construction

construction THROW-OBJECTconstructional constituentst : THROWo : OBJECT

formtf before of

meaning tm.throwee ↔ om

Page 47: Models of Grammar Learning CS 182 Lecture April 26, 2007

48

Composing Co-occurring Constructions

ball before offMotion mm.mover = Ballm.path = Offball off

throw before ball Throw.throwee = Ball

throw the ball

throw before ball ball before off

THROW.throwee = BallMotion mm.mover = Ballm.path = Off

THROW-BALL-OFF

Page 48: Models of Grammar Learning CS 182 Lecture April 26, 2007

49

Resulting Construction

construction THROW-BALL-OFFconstructional constituentst : THROWb : BALLo : OFF

formtf before bf

bf before of

meaning evokes MOTION as m tm.throwee ↔ bm

m.mover ↔ bm

m.path ↔ om

Page 49: Models of Grammar Learning CS 182 Lecture April 26, 2007

50

Precisely defining the learning algorithm

Page 50: Models of Grammar Learning CS 182 Lecture April 26, 2007

51

Example: The Throw-Ball Cxn

construction THROW-BALLconstructional constituents

t : THROWb : BALL

formtf before bf

meaning tm.throwee ↔ bm

size(THROW-BALL) = 2 + 2 + (2 + 3) = 9

ce

cc emnc )length(++)size(

Page 51: Models of Grammar Learning CS 182 Lecture April 26, 2007

52

Language Learning Problem

Performance measure Goal: Comprehension should improve with training Criterion: need some objective function to guide

learning…

Minimum Description Length:

)P( log)|P( log)|P( log

)P()|P()|P(

)P(

)P()|P()|P(

MMXXM

MMXXM

X

MMXXM

)P( log)|P( log)|P( log MMXXM

Probability of Model given Data:

Page 52: Models of Grammar Learning CS 182 Lecture April 26, 2007

53

Minimum Description Length

Choose grammar G to minimize cost(G|D): cost(G|D) = α • size(G) + β • complexity(D|G)

Approximates Bayesian learning; cost(G|D) ≈ posterior probability P(G|D)

Size of grammar = size(G) ≈ prior P(G) favor fewer/smaller constructions/roles; isomorphic

mappings

Complexity of data given grammar ≈ likelihood P(D|G) favor simpler analyses

(fewer, more likely constructions) based on derivation length + score of derivation

Page 53: Models of Grammar Learning CS 182 Lecture April 26, 2007

54

Size Of Grammar

Size of the grammar G is the sum of the size of each construction:

Size of each construction c is:

where nc = number of constituents in c,

mc = number of constraints in c,

length(e) = slot chain length of element reference e

Gc

cG )size()size(

ce

cc emnc )length()size(

Page 54: Models of Grammar Learning CS 182 Lecture April 26, 2007

55

Complexity of Data Given Grammar

Complexity of the data D given grammar G is the sum of the analysis score of each input token d:

Analysis score of each input token d is:

where c is a construction used in the analysis of d weightc ≈ relative frequency of c,

|typer| = number of ontology items of type r used,

heightd = height of the derivation graph,

semfitd = semantic fit provide by the analyzer

Dd

dGD )score()|(complexity

dddc cr

rc semfitheighttypeweightd

)score(

Page 55: Models of Grammar Learning CS 182 Lecture April 26, 2007

56

Preliminary Results

Page 56: Models of Grammar Learning CS 182 Lecture April 26, 2007

57

Experiment: Learning Verb Islands

Subset of the CHILDES database of parent-child interactions (MacWhinney 1991; Slobin et al.)

coded by developmental psychologists for form: particles, deictics, pronouns, locative phrases, etc. meaning: temporality, person, pragmatic function,

type of motion (self-movement vs. caused movement; animate being vs. inanimate object, etc.)

crosslinguistic (English, French, Italian, Spanish) English motion utterances: 829 parent, 690 child

utterances English all utterances: 3160 adult, 5408 child age span is 1;2 to 2;6

Page 57: Models of Grammar Learning CS 182 Lecture April 26, 2007

58

Learning Throw-Constructions

1. Don’t throw the bear. throw-bear

2. you throw it you-throw

3. throwing the thing. throw-thing

4. Don’t throw them on the ground. throw-them

5. throwing the frisbee. throw-frisbee

MERGE throw-OBJ

6. Do you throw the frisbee? COMPOSE you-throw-frisbee

7. She’s throwing the frisbee. COMPOSE she-throw-frisbee

Page 58: Models of Grammar Learning CS 182 Lecture April 26, 2007

59

Learning Results

Page 59: Models of Grammar Learning CS 182 Lecture April 26, 2007

60

Summary

Cognitively plausible situated learning processes What do kids start with?

perceptual, motor, social, world knowledge meanings of single words

What kind of input drives acquisition? Social-pragmatic knowledge Statistical properties of linguistic input

What is the learning loop? Use existing linguistic knowledge to analyze input Use social-pragmatic knowledge to understand

situation Hypothesize new constructions to bridge the gap

Page 60: Models of Grammar Learning CS 182 Lecture April 26, 2007

61

2H2O + 2SO2 + O2 → 2H2SO4

In the gas phase sulfur dioxide is oxidized by reaction with the hydroxyl radical

via a termolecular reaction:

SO2 OH· → HOSO2·

which is followed by:

HOSO2· + O2 → HO2· + SO3

In the presence of water sulfur trioxide (SO3) is converted rapidly to

sulfuric acid:

SO3(g) + H2O(l) → H2SO4(l)