View
226
Download
1
Category
Tags:
Preview:
Citation preview
Statistical learning, cross-constraints, and the acquisition of
speech categories:a computational approach.
Joseph Toscano & Bob McMurray
Psychology Department
University of Iowa
Acknowledgements
• Acknowledgements:– Dick Aslin– The MACLab
Learning phonetic categories
• Infants are initially able to discriminate many different phonetic contrasts.
• They must learn which ones are relevant to their native language.
• This is accomplished within the first year of life, and infants quickly adopt the categories present in their language (Werker & Tees, 1984).
Learning phonetic categories
• What is needed for statistical learning?
• A signal and a mechanism– Availability of statistics (signal)– Sensitivity to statistics (mechanism)
• continuous sensitivity to VOT
• ability to track frequencies and build clusters
Statistics in the signal
• What statistical information is available?
• Lisker & Abramson, 1964 did a cross-language analysis of speech– Measured voice-onset time (VOT) from several
speakers in different languages
Statistics in the signal
• The statistics are available in the signal
Tamil
Cantonese
English
Sensitivity to statistics
• Are infants sensitive to statistics in speech?– Maye et al., 2002 asked this– Two groups of infants
• Infants are sensitive to within-category detail (McMurray & Aslin, 2005)
Learning phonetic categories
• Infants can obtain phoneme categories from exposure to tokens in the speech signal
VOT
frequency
0ms 50ms
+voice -voice
Statistical Learning Model
• Statistical learning in a computational model
• What do we need the model to do:– Show learnability. Are statistics sufficient?– Developmental timecourse.– Implications for speech in general.– Can model explain more than category learning?
Statistical Learning Model
• Clusters of VOTs are Gaussian distributions
Tamil
Cantonese
English
Statistical Learning Model
• Gaussians defined by three parameters:
• Each phoneme category can be represented by these three parameters
VOT
Φμ – the center of the distributionσ – the spread of the distributionΦ – the height of the distribution, reflected by the probability of a particular value
Statistical Learning Model
• Modeling approach: mixture of Gaussians
-80 -60 -40 -20 0 20 40 60 80 1000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
Phonetic Dimension (e.g. VOT)
Cate
gory
Map
ping
Stre
ngth
(Pos
terio
r)
/b/ /p/
Statistical Learning Model
• Gaussian distributions represent the probability of occurrence of a particular feature (e.g. VOT)
• Start with a large number of Gaussians to reflect many different values for the feature.
-80 -60 -40 -20 0 20 40 60 80 1000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
Phonetic Dimension (e.g. VOT)
Cate
gory
Map
ping
Stre
ngth
(Pos
terio
r) /b/ /p/
-80 -60 -40 -20 0 20 40 60 80 1000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Phonetic Dimension (e.g. VOT)
Cat
egor
y M
appi
ng S
tren
gth
(pos
terio
r)
-80 -60 -40 -20 0 20 40 60 80 1000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Phonetic Dimension (e.g. VOT)
Cat
egor
y M
appi
ng S
tren
gth
(pos
terio
r)
Statistical Learning Model
• Learning occurs via gradient descent– Take a single data point as input– Adjust the location and width of the distribution by a
certain amount, defined by a learning rule
Move the center of the dist closer to the data point
Make the dist wider to accommodate the data point
Statistical Learning Model
• Learning rule:
{
Probability of a particular point
Proportion of space under that
Gaussian
Equation of a Gaussian= x
Can the model learn?
• Can the model learn speech categories?
Can the model learn?
• The model in action• Fails to learn correct number of categories
– Too many distributions under each curve
– Is this a problem? Maybe.
• Solution: Introduce competition
• Competition through winner-take-all strategy– Only the closest matching Gaussian is adjusted
Does learning need to be constrained?
• Can the model learn speech categories? Yes.
• Does learning need to be constrained?
Does learning need to be constrained?
• Unconstrained feature space– Starting VOTs distributed from -1000 to +1000 ms– Model fails to learn– Similar to a situation in which the model has too few
starting distributions
Does learning need to be constrained?
• Constrained feature space– Starting VOTs distributed from -100 to +100 ms– Within the range of actual voice onset times used in
language.
Are constraints linguistic?
• Can the model learn speech categories? Yes.
• Does learning need to be constrained? Yes.
• Do constraints need to be linguistic?
Are constraints linguistic?
• Cross-linguistic constraints– Combined data from languages used in Lisker &
Abramson, 1964, and several other languages
Are constraints linguistic?
• VOTs from:– English– Thai– Spanish– Cantonese– Korean– Navajo– Dutch– Hungarian– Tamil– Eastern Armenian– Hindi– Marathi– French
• Test the model with two different sets of starting states:
Cross-linguistic: based on distribution of VOTs across languages
Random normally distributed: centered around 0ms, range ~ -100ms to +100ms
VOT
VOT
• Test the model with two different sets of starting states:
Cross-linguistic: based on distribution of VOTs across languages
Random normally distributed: centered around 0ms, range ~ -100ms to +100ms
Are linguistic constraints helpful?
• Can the model learn speech categories? Yes.
• Does learning need to be constrained? Yes.
• Do constraints need to be linguistic? No.
• Do cross-language constraints help?
Are linguistic constraints helpful?
• This is the part of the talk that I don’t have any slides for yet.
What do infants do?
• Can the model learn speech categories? Yes.
• Does learning need to be constrained? Yes.
• Do constraints need to be linguistic? No.
• Do cross-language constraints help? Sometimes.
• What do infants do?
What do infants do?
• As infants get older, their ability to discriminate different VOT contrasts decreases.– Initially able to discriminate many contrasts– Eventually discriminate only those of their native
language
What do infants do?
• Each model’s discrimination over time– Random normal: decreases
– Cross-linguistic: slight increase
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0 2000 4000 6000 8000 10000 12000
crossling
random normal
Linear (crossling)
Linear (random normal)
What do infants do?
• Cross-linguistic starting states lead to faster category acquisition
• Why wouldn’t infants take advantage of this?– Too great a risk of over-generalization– Better to take more time to do the job right than to do
it too quickly
Recommended