Jessica S. Horst (jessica-horst@uiowa) Bob McMurray Larissa K. Samuelson Dept. of Psychology

Connectionist Time and Dynamic Systems Time in One Architecture?

Modeling Word Learning at Two Timescales

Jessica S. Horst ([email protected])Bob McMurray

Larissa K. Samuelson

Dept. of PsychologyUniversity of Iowa

Two Time Scales in Neural NetworksConnectionist and dynamical systems accounts:

• stress change over time• complement each other in timescale

Dynamic Systems: online processes

Connectionist Networks: long-term learning

Many domains of development require both timescales:

Example: language development requires • sensitivity to brief and sequential nature of the input• slower developmental processes.

Two Time Scales in Language AcquisitionWord learning often attributed to fast mapping

- quick link between a novel name and a novel object (e.g., Carey, 1978).

But, recent empirical data suggests that fast mapping and word learning may represent two distinct time scales (Horst &

Samuelson, April, 2005).

- Fast Mapping: quick process emerging in the moment.

- Word Learning: gradual process over the course of development

We capture both timescales in a recurrent network….

• Activation feed from input layers to decision layers.

• Decision units compete via inhibition.

• Activation feeds back to input layers.

• Cycle continues until system settles.c

Initial State (Before Learning)

Aud

itor

y In

puts V

isual Inputs

Decision Units (Hidden) Layer

The Architecture

(McMurray & Spivey, 2000)

• Unsupervised Hebbian learning occurs on every cycle.

• Online decision dynamics reflect auditory and visual competitors.

0 2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cycles

Act

iva

tion

The Model

End StatePost Learning

Intermediate StateDuring Learning

• 15 Auditory & 15 Visual units• 90 Decision units• Names presented singly with a

variable number of objects

• Name-Decision & Object-Decision associations strengthened via learning

• After 4000 training trials network forms localist representations

• Learns name-object links and to ignore visual competitors

Aud

itory

Inp

ut

1

2

3

4

5

6

7

8

9

10

Decision Units

10 20 30 40 50 60 70 80 90A

udito

ry In

pu

t

1

2

3

4

5

6

7

8

9

10

Decision Units

10 20 30 40 50 60 70 80 90

9 16 26 30 32 39 41 49 6567

0.05

0.1

0.15

0.2

Con

nect

ion

Str

engt

h

Fast: Moment by Moment• Online information integration and constraint

satisfaction (e.g., McClelland & Elman, 1986, Dell, 1981)

• Reaches a pattern of stable activation through input based on auditory and visual inputs and stored knowledge (weights)

• Model makes correct name-object links based on the latest input

Slow: Over the Long-Term• Unsupervised Hebbian Learning• Associates words with visual targets• Learns to ignore visual competitors

Two Time Scales

The two time scales are not independent

Long-term learning depends critically on the dynamics of the fast time scales

• Competition between decision units ensures pseudo-localist representations—critical for Hebbian learning (e.g. Rumelhart & Zipser, 1986)

• Learning occurs on each cycle

- Influences processing cycle-by-cycle & trial-by-trial

• Accumulated learning across trials leads to learning on long-term time scale (i.e., word learning)

Dependent Time Scales

Empirical Results

0

0.2

0.4

0.6

0.8

1

Familiar Name Novel Name

Pro

port

ion

of C

orre

ct C

hoic

es

Chance

• 24-month-old children• Saw 2 familiar & 1 novel objects• Asked to get familiar and novel

objects (e.g., “get the cow!” or “get the yok!”)

Fast Time Scale

Cow (familiar)

Block (familiar)

Yok (novel)

• Children were excellent at fast mapping (finding the referent of novel and familiar words in the moment).

***

***

Slow Time ScaleAfter a 5-minute delay, children were asked to pick a newly fast-mapped name (e.g., “get the yok!”) Yok

(target)Fode

(named foil)unnamed foil(prev. seen)

• Children unable to retain mappings after a 5-minute delay

0

0.2

0.4

0.6

0.8

1

Familiar Name Novel Name Retention

Pro

port

ion

of C

orre

ct C

hoic

es

Chance

***

***

• Initial findings replicated with simpler tasks:• effect of number of names or trials?

• Children’s difficulty in retaining newly fast-mapped names is not related to the number of names or trials

Replication

Fast Mapping Retention

9/12 ** 4/9 n.s.

Fast Mapping Retention

7/12 * 4/7 n.s.

* Binomial, p < .05, ** Binomial, p < .01

Replication #1 (N = 12) Replication #2 (N = 12)

• 1 Novel Name• 8 Familiar Names• 7 Preference Trials

• 1 Novel Name• 2 Familiar Names

Simulations

• 20 networks initialized with random weights• 15 word lexicon (names & objects):

• 5 familiar words

• 5 novel words

• 5 held out

• Trained on 5 familiar items for 5000 epochs• Items presented in random order• Run in the Fast Mapping Experiment:

• 10 fast mapping trials (5 familiar, 5 novel)

• 5 retention trials

• Learning was not turned off during experiment.

How The Model BehavesFast Time Scale:• Model succeeded on both types of fast-mapping trials• Model behavior patterned with empirical results

0

0.2

0.4

0.6

0.8

1

Familiar Name Novel Name

Pro

port

ion

of C

orre

ct C

hoic

es

******

Chance

0

0.2

0.4

0.6

0.8

1

Familiar Name Novel Name Retention

Pro

port

ion

of C

orre

ct C

hoic

es

******

Slow Time Scale:• The model fails to “retain” the newly learned words after

a “delay”

Chance

How The Model “Thinks”• Analyses of weight matrices revealed that relatively little

learning occurred during the test phase.

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Cycles (novel words)0 5 10 15 20

0

0.2

0.4

0.6

0.8

1

Cycles (familiar words)

Act

ivat

ion

Act

ivat

ion

End

End

0

0.4

0.8

1.2

1.6

2

FamiliarWords

FamiliarWords

NovelWords

ControlWords

AfterLearning

After Test

Squ

ared

Dev

iati

ons

Change (RMS) in portions of weight matrix

0

0.000001

0.000002

0.000003

0.000004

0.000005

Familiar Words Novel Words Control Words

After TestS

quar

ed D

evia

tion

s

Temporal dynamics of processing

1 4 80 8666

2

4

6

8

10

12

14

10 20 30 40 50 60 70 80 90

Pri

or

to E

xperi

men

t10 20 30 40 50 60 70 80 90

2

4

6

8

10

12

14

Aft

er

Exp

eri

men

t

0.05

0.1

0.15

0.2

Con

nect

ion

Str

engt

h

• Two time scales captured in a single architecture:– Fast, online: fast mapping

– Slow, long-term: word learning

• The model replicated the empirical findings:– Excellent word learning and fast mapping

– Poor “retention”

• Has sufficient knowledge to select the referent at a given moment in time, given auditory and visual input and stored knowledge (weights).

• But not enough to subsequently “know” the word.

Conclusions

• In-the-moment learning:– Subtly biases behavior

– Combined with activation dynamics, yields correct response.

– Does not provide robust, context-independent word knowledge (in the short term)

• Continued training on fast-mapped words (i.e., 5000 epochs) makes them familiar words.

• Accumulation of this learning provides robust context-independent word knowledge over development.

Conclusions

Take-Home Messages

1) A fast-mapped word is not a known word…

…but a known word is known, because it has been fast-mapped many, many

times.

2) Understanding development requires models that integrate both short-term dynamic processes and long-term learning.

Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan & A. Miller (Eds.), Linguistic Theory and Psychological Reality (pp. 264-293). Cambridge, MA: MIT Press.

Dell, Gary S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3) 283-321.

Horst, J.S. & Samuelson, L.K. (2005, April). Slow Down: Understanding the Time Course Behind Fast Mapping. Poster session presented at the 2005 Biennial Meeting of the Society for Research in Child Development, Atlanta, GA.

McClelland, J. & Elman, J. (1986). The TRACE Model of Speech Perception, Cognitive Psychology, 18(1), 1-86.

McMurray, B., & Spivey, M. (2000). The Categorical Perception of Consonants: The Interaction of Learning and Processing, The Proceedings of the Chicago Linguistics Society, 34(2), 205-220.

Rumelhart, D. & Zipser, D. (1986). Feature Discovery By Competitive Learning. In Rumelhart, D., & McClelland, J. (Eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, Cambridge, MA: MIT Press.

References

AcknowledgementsThe authors would like to thank Joseph Toscano for programming assistance and support.

This work was supported by NICHD Grant R01-HD045713 to LKS.

Documents

Jessica S. Horst (jessica-horst@uiowa) Bob McMurray Larissa K. Samuelson Dept. of Psychology