Upload
cadman-hodges
View
40
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Connectionist Time and Dynamic Systems Time in One Architecture? Modeling Word Learning at Two Timescales. Jessica S. Horst ([email protected]) Bob McMurray Larissa K. Samuelson Dept. of Psychology University of Iowa. Two Time Scales in Neural Networks. - PowerPoint PPT Presentation
Citation preview
Connectionist Time and Dynamic Systems Time in One Architecture?
Modeling Word Learning at Two Timescales
Jessica S. Horst ([email protected])Bob McMurray
Larissa K. Samuelson
Dept. of PsychologyUniversity of Iowa
Two Time Scales in Neural NetworksConnectionist and dynamical systems accounts:
• stress change over time• complement each other in timescale
Dynamic Systems: online processes
Connectionist Networks: long-term learning
Many domains of development require both timescales:
Example: language development requires • sensitivity to brief and sequential nature of the input• slower developmental processes.
Two Time Scales in Language AcquisitionWord learning often attributed to fast mapping
- quick link between a novel name and a novel object (e.g., Carey, 1978).
But, recent empirical data suggests that fast mapping and word learning may represent two distinct time scales (Horst &
Samuelson, April, 2005).
- Fast Mapping: quick process emerging in the moment.
- Word Learning: gradual process over the course of development
We capture both timescales in a recurrent network….
• Activation feed from input layers to decision layers.
• Decision units compete via inhibition.
• Activation feeds back to input layers.
• Cycle continues until system settles.c
Initial State (Before Learning)
Aud
itor
y In
puts V
isual Inputs
Decision Units (Hidden) Layer
The Architecture
(McMurray & Spivey, 2000)
• Unsupervised Hebbian learning occurs on every cycle.
• Online decision dynamics reflect auditory and visual competitors.
0 2 4 6 8 10 12 14 16 180
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cycles
Act
iva
tion
The Model
End StatePost Learning
Intermediate StateDuring Learning
• 15 Auditory & 15 Visual units• 90 Decision units• Names presented singly with a
variable number of objects
• Name-Decision & Object-Decision associations strengthened via learning
• After 4000 training trials network forms localist representations
• Learns name-object links and to ignore visual competitors
Aud
itory
Inp
ut
1
2
3
4
5
6
7
8
9
10
Decision Units
10 20 30 40 50 60 70 80 90A
udito
ry In
pu
t
1
2
3
4
5
6
7
8
9
10
Decision Units
10 20 30 40 50 60 70 80 90
9 16 26 30 32 39 41 49 6567
0.05
0.1
0.15
0.2
Con
nect
ion
Str
engt
h
Fast: Moment by Moment• Online information integration and constraint
satisfaction (e.g., McClelland & Elman, 1986, Dell, 1981)
• Reaches a pattern of stable activation through input based on auditory and visual inputs and stored knowledge (weights)
• Model makes correct name-object links based on the latest input
Slow: Over the Long-Term• Unsupervised Hebbian Learning• Associates words with visual targets• Learns to ignore visual competitors
Two Time Scales
The two time scales are not independent
Long-term learning depends critically on the dynamics of the fast time scales
• Competition between decision units ensures pseudo-localist representations—critical for Hebbian learning (e.g. Rumelhart & Zipser, 1986)
• Learning occurs on each cycle
- Influences processing cycle-by-cycle & trial-by-trial
• Accumulated learning across trials leads to learning on long-term time scale (i.e., word learning)
Dependent Time Scales
Empirical Results
0
0.2
0.4
0.6
0.8
1
Familiar Name Novel Name
Pro
port
ion
of C
orre
ct C
hoic
es
Chance
• 24-month-old children• Saw 2 familiar & 1 novel objects• Asked to get familiar and novel
objects (e.g., “get the cow!” or “get the yok!”)
Fast Time Scale
Cow (familiar)
Block (familiar)
Yok (novel)
• Children were excellent at fast mapping (finding the referent of novel and familiar words in the moment).
***
***
Slow Time ScaleAfter a 5-minute delay, children were asked to pick a newly fast-mapped name (e.g., “get the yok!”) Yok
(target)Fode
(named foil)unnamed foil(prev. seen)
• Children unable to retain mappings after a 5-minute delay
0
0.2
0.4
0.6
0.8
1
Familiar Name Novel Name Retention
Pro
port
ion
of C
orre
ct C
hoic
es
Chance
***
***
• Initial findings replicated with simpler tasks:• effect of number of names or trials?
• Children’s difficulty in retaining newly fast-mapped names is not related to the number of names or trials
Replication
Fast Mapping Retention
9/12 ** 4/9 n.s.
Fast Mapping Retention
7/12 * 4/7 n.s.
* Binomial, p < .05, ** Binomial, p < .01
Replication #1 (N = 12) Replication #2 (N = 12)
• 1 Novel Name• 8 Familiar Names• 7 Preference Trials
• 1 Novel Name• 2 Familiar Names
Simulations
• 20 networks initialized with random weights• 15 word lexicon (names & objects):
• 5 familiar words
• 5 novel words
• 5 held out
• Trained on 5 familiar items for 5000 epochs• Items presented in random order• Run in the Fast Mapping Experiment:
• 10 fast mapping trials (5 familiar, 5 novel)
• 5 retention trials
• Learning was not turned off during experiment.
How The Model BehavesFast Time Scale:• Model succeeded on both types of fast-mapping trials• Model behavior patterned with empirical results
0
0.2
0.4
0.6
0.8
1
Familiar Name Novel Name
Pro
port
ion
of C
orre
ct C
hoic
es
******
Chance
0
0.2
0.4
0.6
0.8
1
Familiar Name Novel Name Retention
Pro
port
ion
of C
orre
ct C
hoic
es
******
Slow Time Scale:• The model fails to “retain” the newly learned words after
a “delay”
Chance
How The Model “Thinks”• Analyses of weight matrices revealed that relatively little
learning occurred during the test phase.
0 5 10 15 200
0.2
0.4
0.6
0.8
1
Cycles (novel words)0 5 10 15 20
0
0.2
0.4
0.6
0.8
1
Cycles (familiar words)
Act
ivat
ion
Act
ivat
ion
End
End
0
0.4
0.8
1.2
1.6
2
FamiliarWords
FamiliarWords
NovelWords
ControlWords
AfterLearning
After Test
Squ
ared
Dev
iati
ons
Change (RMS) in portions of weight matrix
0
0.000001
0.000002
0.000003
0.000004
0.000005
Familiar Words Novel Words Control Words
After TestS
quar
ed D
evia
tion
s
Temporal dynamics of processing
1 4 80 8666
2
4
6
8
10
12
14
10 20 30 40 50 60 70 80 90
Pri
or
to E
xperi
men
t10 20 30 40 50 60 70 80 90
2
4
6
8
10
12
14
Aft
er
Exp
eri
men
t
0.05
0.1
0.15
0.2
Con
nect
ion
Str
engt
h
• Two time scales captured in a single architecture:– Fast, online: fast mapping
– Slow, long-term: word learning
• The model replicated the empirical findings:– Excellent word learning and fast mapping
– Poor “retention”
• Has sufficient knowledge to select the referent at a given moment in time, given auditory and visual input and stored knowledge (weights).
• But not enough to subsequently “know” the word.
Conclusions
• In-the-moment learning:– Subtly biases behavior
– Combined with activation dynamics, yields correct response.
– Does not provide robust, context-independent word knowledge (in the short term)
• Continued training on fast-mapped words (i.e., 5000 epochs) makes them familiar words.
• Accumulation of this learning provides robust context-independent word knowledge over development.
Conclusions
Take-Home Messages
1) A fast-mapped word is not a known word…
…but a known word is known, because it has been fast-mapped many, many
times.
2) Understanding development requires models that integrate both short-term dynamic processes and long-term learning.
Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan & A. Miller (Eds.), Linguistic Theory and Psychological Reality (pp. 264-293). Cambridge, MA: MIT Press.
Dell, Gary S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3) 283-321.
Horst, J.S. & Samuelson, L.K. (2005, April). Slow Down: Understanding the Time Course Behind Fast Mapping. Poster session presented at the 2005 Biennial Meeting of the Society for Research in Child Development, Atlanta, GA.
McClelland, J. & Elman, J. (1986). The TRACE Model of Speech Perception, Cognitive Psychology, 18(1), 1-86.
McMurray, B., & Spivey, M. (2000). The Categorical Perception of Consonants: The Interaction of Learning and Processing, The Proceedings of the Chicago Linguistics Society, 34(2), 205-220.
Rumelhart, D. & Zipser, D. (1986). Feature Discovery By Competitive Learning. In Rumelhart, D., & McClelland, J. (Eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, Cambridge, MA: MIT Press.
References
AcknowledgementsThe authors would like to thank Joseph Toscano for programming assistance and support.
This work was supported by NICHD Grant R01-HD045713 to LKS.