29
Hao Wang, Toben Mintz Department of Psychology University of Southern California

Hao Wang, Toben Mintz Department of Psychology University of Southern California

Embed Size (px)

Citation preview

Page 1: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Hao Wang, Toben MintzDepartment of Psychology

University of Southern California

Page 2: Hao Wang, Toben Mintz Department of Psychology University of Southern California

The Problem of Learning Syntactical CategoriesGrammar includes manipulations of lexical

items based on their syntactical categories.Learning syntactical categories are

fundamental to the acquisition of language.

Page 3: Hao Wang, Toben Mintz Department of Psychology University of Southern California

The Problem of Learning Syntactical CategoriesNativist approach

Children are innately endowed with the possible syntactical categories.

How to map a lexical item to its syntactical category or categories?

Empirical approachChildren have to figure out the syntactical

categories in their target language, and assign categories to lexical items.

There is no or little help from syntactical constraints.

Page 4: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Approaches Based on Semantic CategoriesGrammatical Categories correspond to

Semantic/Conceptual Categories(Macnamara, 1972; Bowerman, 1973; Bates & MacWhinney, 1979; Pinker, 1984)

object noun action verb

But what aboutaction, noise, loveto think, to know

(Maratsos & Chalkley, 1980)

Page 5: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Grammatical Categories from Distributional AnalysesStructural Linguistics

Grammatical categories defined by similarities of word patterning (Bloomfield , 1933; Harris, 1951)

Maratsos & Chalkley (1980): Distributional learning theorylexical co-occurrence patterns(and morphology and semantics)

the cat is on the matcat, mat

Page 6: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Grammatical Categories from Distributional AnalysesPatterns across whole utterances

(Cartwright & Brent, 1997) My cat meowed.Your dog slept.Det N X/Y.

Bigram co-occurrence patterns(Mintz, Newport, & Bever, 1995, 2002; Redington, Chater & Finch, 1998)

the cat is on the mat

Page 7: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Frequent Frames (Mintz, 2003)Frames are defined as “two jointly occurring

words with one word intervening”.

“would you put the cans back ?” “you get the nuts .” “you take the chair back . “you read the story to Mommy .”

Frame: you_X_the

Page 8: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Sensitivity to Frame-like UnitsFrames lead to categorization in adults

(Mintz, 2002) Fifteen-month-olds are sensitive to frame-like

sequences (Gómez & Maye, 2005)

Page 9: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Distributional Analyses Using Frequent Frames (Mintz, 2003)Six corpora from CHILDES (MacWhinney, 2000).

Analyzed utterances to children under 2;6.Accuracy results

averaged overall corpora.

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Categorization Type

Mea

n To

ken

Accu

racy

Actual Categorization

Chance Categorization

Page 10: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Limitation of the Frequent Frame AnalysesRequires two passes through the corpus

Step 1, identify the frequent frames by tallying the frame frequency.

Step 2, categorizing words using those frames.Tracks the frequency of all frames

E.g., approximately 15000 frame types in one of the corpora in Mintz (2003).

Page 11: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Goal of current studyProvides a psychological plausible model of

word categorizationChildren possesses limited memory and

cognitive capacity.Human memory is imperfect.Children may not be able to track all the

frames he/she has encountered.

Page 12: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Features of current modelIt processes input and updates the

categorization frames dynamically.Frame is associated with and ranked by a

activation value.It has a limited memory buffer for frames.

Only stores the most activated 150 frames.It implements a forgetting function on the

memory.After processed a new frame, the activation of

all frames in the memory decreased by 0.0075.

Page 13: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Child Input CorporaSix corpora from CHILDES (MacWhinney, 2000).

Analyzed utterances to children under 2;6.

Peter (Bloom, Hood, Lightbown, 1974; Bloom, Lightbown, Hood, 1975)

Eve (Brown, 1973) Nina (Suppes, 1974)

Naomi (Sachs, 1983)

Anne (Theakston, Lieven, Pine, Rowland, 2001)

Aran (Theakston et al., 2001)

Mean Utterance/Child: ~17,200MIN: 6,950 ; MAX: 20,857

Page 14: Hao Wang, Toben Mintz Department of Psychology University of Southern California

ProcedureThe child-directed utterances from each

corpus was processed individuallyUtterances were presented to the model in

the order of appearance in the corpusEach utterance was segmented into frames

“you read the story to Mommy” you read the read the story the story to story to Mommy

Page 15: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Procedure continued…you read theread the storythe story tostory to Mommy

Memory

Activation Frame

1.0000 you_X_the

1.0000 read_X_story

1.0000 the_X_to

1.0000story_X_Momm

y

Page 16: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Procedure continued…The memory buffer

only stores most activated 150 frames.

It becomes full very quickly after processing several utterances.

Memory

Activation Frame

1.0000 you_X_the

1.0000 read_X_story

1.0000 the_X_to

1.0000story_X_Momm

y

1.0000 to_X_it

1.0000 the_X_on

… …

Page 17: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Procedure continued…“you put the”Frame: you_X_theLook up you_X_the

frame in the memoryIncrease the activation

of you_X_the frame by 1

Re-rank the memory by activation

Memory

Activation Frame

1.0000 you_X_the

1.0000 read_X_story

1.0000 the_X_to

1.0000story_X_Momm

y

1.0000 to_X_it

1.0000 the_X_on

… …

Page 18: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Procedure continued…“you have a”Frame: you_X_aLook up you_X_a frame

in the memorystory_X_Mommy < 1Remove story_X_MommyAdd you_X_a to memory,

set the activation to 1Re-rank the memory by

activation

Memory

Activation Frame

1.0000 you_X_the

1.0000 read_X_story

1.0000 the_X_to

1.0000 to_X_it

1.0000 the_X_on

0.8175story_X_Momm

y

… …

Page 19: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Procedure continued…A new frame not in

memoryThe activation of all

frames in memory are greater than 1

There is no change to the memory.

Memory

Activation Frame

1.0000 you_X_the

1.0000 read_X_story

1.0000 the_X_to

1.0000 to_X_it

1.0000 the_X_on

0.8175story_X_Momm

y

… …

Page 20: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Evaluating Model Performance

Hit: two words from the same linguistic category grouped together

False Alarm: two words from different linguistic categories grouped together

Upper bound of 1

alarmsfalsehits

hitsAccuracy

_

Page 21: Hao Wang, Toben Mintz Department of Psychology University of Southern California

VVVADVVV

Accuracy ExampleHits: 10False Alarms: 5Accuracy:

67.510

10

Page 22: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Ten Categories for AccuracyNoun, pronounVerb, Aux.,

CopulaAdjectivePrepositionAdverb

DeterminerWh-wordNegation --

“not”ConjunctionInterjection

Page 23: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Averaged accuracy across 6 corpora

Accuracy

Eve 0.782019

Peter 0.803401

Anne 0.872820

Aran 0.860191

Nina 0.828753

Naomi 0.773230

Average 0.820069

Page 24: Hao Wang, Toben Mintz Department of Psychology University of Southern California

The Development of AccuracyAccuracy

are very high and stable in the entire process

Page 25: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Compare to Frequent FramesAfter

processing about half of the corpus, 70% of frequent frames are in the most activated 45 frames in memory.

Page 26: Hao Wang, Toben Mintz Department of Psychology University of Southern California

# w2 type w2 token Activation Frame0 9 351 326.25225 what_X_you1 20 230 205.151 you_X_to2 70 203 178.16525 you_X_it3 27 115 90.7135 you_X_a4 44 115 90.379 you_X_the5 3 110 85.2665 are_X_doing6 5 110 85.10525 what_X_that7 15 108 83.2965 you_X_me8 38 90 65.2905 to_X_it9 2 89 65.132 would_X_like

10 11 86 61.1075 why_X_you

Memory of Final Step of Eve Corpus

Page 27: Hao Wang, Toben Mintz Department of Psychology University of Southern California

Stability of Frames in MemoryBig

changes of frames in memory in early stage, but become stable after processing 10% of the corpus

Page 28: Hao Wang, Toben Mintz Department of Psychology University of Southern California

SummaryAfter processed the entire corpus, the

learning algorithm has identified almost all of the frequent frames by highest activation.

Consequently, high accuracy of word categorization is achieved.

After processing fewer than half of the utterances, the 45 most activated frames included approximately 70% of frequent frames.

Page 29: Hao Wang, Toben Mintz Department of Psychology University of Southern California

SummaryFrames are a robust cue for categorizing

words.With limited and imperfect memory, the

learning algorithm can identify most frequent frames after processing a relatively small number of utterances. Thus yield a high accuracy of word categorization.