Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Sound categorizationDr. Emily Morgan
Sound categorization• Hear an acoustic signal, recover the sound category• Example: Distinguish between two stops which
differ only in voicing, e.g.• /p/ vs. /b/• /t/ vs. /d/
2
Voice Onset Time (VOT) is the primary cue distinguishing voiced from voiceless stops
3
(Chen, 1980)
/b/ /p/
Identification task (/ba/ vs. /pa/)
4
Voice onset time (msec)0 10 20 30 40 50 60 70
100
50
0
/ba/ /pa/
• How do listeners categorize these acoustic signals?
What is the generative model for the production of the acoustic signal?
5
C
S
category (/b/ or /p/)
acoustic signal(in particular, the VOT)
C ~ Binomial(p/b/)S|C ~ N(𝜇C , 𝜎)where 𝜇C depends on the value of C
-20 0 20 40 60 80
0.000
0.010
0.020
0.030
VOT
Pro
babi
lity
dens
ity
-20 0 20 40 60 80
0.000
0.010
0.020
0.030
VOT
Pro
babi
lity
dens
ity Concrete example:p/b/ = 0.5𝜇/b/ = 0𝜇/p/ = 50𝜎 = 12
Detour: How do we actually calculate the normal distribution?• It’s defined by the equation:
6
Suppose I hear a sound with VOT of 27ms. Which category does it belong to?
7
𝑃 𝐶 = /b/ 𝑆 = 27) = ?
BayesRule:
𝑃 𝐶 𝑆 =𝑃 𝑆 𝐶 ∗ 𝑃(𝐶)
𝑃(𝑆)
=𝑃 𝑆 𝐶 ∗ 𝑃(𝐶)
∑: 𝑃 𝑆 𝐶 ∗ 𝑃(𝐶)
=𝑃 27 /b/ ∗ 𝑃(/b/)
𝑃 27 /b/ ∗ 𝑃 /b/ + 𝑃 27 /p/ ∗ 𝑃(/p/)
=0.0026 ∗ 0.5
0.0026 ∗ 0.5 + 0.0053 ∗ 0.5
=.33C ~ Binomial(p/b/)p/b/ = 0.5
𝑃 𝐶 = /b/ = 0.5
S|C ~ N(𝜇C , 𝜎)𝜇/b/ = 0, 𝜇/p/ = 50, 𝜎 = 12
𝑃 𝑆 𝐶 =12𝜋𝜎D
exp(𝑥 − 𝜇𝐶)D
2𝜎D
𝑃 27 /b/ = HDI∗HDJ
exp (DKLM)J
D∗HDJ≈ 0.0026
Plotting P(C=/b/|S) for different values of S gives us a categorization function
• The model’s predicted categorization function has the same shape as the human categorization data• Evidence that our model could be representing the way
humans do this categorization
8
Benefits of computational modeling• Our model makes precise, numeric predictions
about how often listeners should categorization an ambiguous stimulus as /b/ vs. /p/• What further predictions does our model make?
9
Prediction: Categorization function slope changes with category variance
10
Clayards (2008) tested this prediction
Clayards (2008)• Participants were familiarized with synthesized
audio, with VOTs drawn from either the broader or narrower distribution
11
beach peach
• Then tested on categorizing VOTs across the whole continuum
Clayards (2008): Results
12
Prediction:
Results:
So far…• Modeled phoneme categories as Gaussian/normal
distributions over VOTs• Calculated the exact posterior probability that a
sound belongs to a particular category (using Bayes rule)• Predicted that the categorization function should
become steeper if category variance is smaller• Confirmed our prediction
13
Interlude: Marr’s (1982) levels of analysis for cognitive models• Computational level• What is the structure of the information processing
problem?• What are the inputs and outputs?• What information is relevant to solving the problem?
• Algorithmic level• What representations and algorithms are used?
• Implementational level• How are the representations and algorithms
implemented neurally?
• These levels are mutually constraining, and all part of fully understanding cognition
14
Perceptual magnet effect• Empirical work by Iverson & Kuhl (1995)• Modeling work by Feldman & Griffiths (2007)
15
28
Perceptual magnet effect
16
Perceptual magnet effect
17
Perceptual*Magnet*Effect
/ε/
/i/
(Iverson & Kuhl, 1995)
Perceptual magnet effectPerceptual*Magnet*Effect
(Iverson & Kuhl, 1995)
Perceptual+Magnet+Effect+
Perceived+S.muli:+
Actual+S.muli:+
(Iverson & Kuhl, 1995)
To account for this, we need a new generative model for speech perception
18
This is the perceptual magnet effect.
• Why does it occur?• To answer, we’ll need a slightly more complicated
generative model
Perceptual*Magnet*Effect
/ε/
/i/
(Iverson & Kuhl, 1995)
Perceptual*Magnet*Effect
/ε/
/i/
(Iverson & Kuhl, 1995)
19
C
T
category
target acoustic signal
S acoustic signal as heard by listener
/i/
noise in the signal
Assumption: Listener infers not just P(C|S) but also P(T|S)
20
C
T
category
target acoustic signal
S acoustic signal as heard by listener
C ~ Binomial(pC)
T|C ~ N(𝜇C , 𝜎C)
S|T ~ N(T , 𝜎S)
Let’s start by considering a case where we know the category C(or equivalently, where there’s only one category)
21
Statistical*Model
€
N µc,σ c2( )
T
€
N T,σS2( )
Phonetic Category ‘c’
Speech Signal Noise
S
Target Production
Speech Sound
22
Statistical*Model
€
N µc,σ c2( )
T
€
N T,σS2( )
Phonetic Category ‘c’
Speech Signal Noise
S
Target Production
Speech Sound
23
Statistical*Model
?€
N µc,σ c2( )
Phonetic Category ‘c’
SSpeech Sound
€
N T,σS2( )
Speech Signal Noise• Need to infer probability of a target T given the acoustic signal S
24
𝑃 𝑇 𝑆 =𝑃 𝑆 𝑇 ∗ 𝑃(𝑇)
𝑃(𝑆)
T target acoustic signal
S acoustic signal as heard by listener
T ~ N(𝜇C , 𝜎C)
S|T ~ N(T , 𝜎S)
𝑃 𝑇 𝑆 = ?
Bayes Rule:
or if S is the data and T is the hypothesis:
𝑃 ℎ 𝑑 =𝑃 𝑑 ℎ ∗ 𝑃(ℎ)
𝑃(𝑑)
posteriorlikelihood prior
25
Statistical*Model
?€
N µc,σ c2( )
Phonetic Category ‘c’
SSpeech Sound
Speech Signal Noise
Prior, p(h)
Hypotheses, h
Data, d Likelihood, p(d|h)
€
N T,σS2( )
26
Bayes*for*Speech*Perception
PriorLikelihood
SSpeech Sound
27
Bayes*for*Speech*Perception
PriorLikelihood
Posterior
SSpeech Sound
28
Bayes*for*Speech*Perception
€
E T |S,c[ ]=σ c2S+σS
2µc
σ c2 +σS
2
PriorLikelihood
Posterior
29
Perceptual*Warping(S)
(inferred best-guess T)
• In real speech perception, the listener also has to infer the category C• Marginalizeovercategories:
P 𝑇 𝑆 = ]^
P 𝑇 𝑆, 𝐶 ∗ P(𝐶|𝑆)
Multiple categories
30
solution for a single
category
probability of category membership(calculated via further applications of Bayes Rule and marginalization)
31
Multiple*Categories
SSpeech Sound
32
Multiple*Categories
SSpeech Sound
33
Multiple*Categories
€
E T |S,c[ ]=σ c2S+σS
2µc
σ c2 +σS
2
34
Multiple*Categories
€
E T |S,c[ ]=σ c2S+σS
2µc
σ c2 +σS
2
35
Multiple*Categories
€
E T |S[ ]=σ c2S+σS
2µc
σ c2 +σS
2 p c |S( )c∑
36
Perceptual*Warping
So far…• The predictions of the model qualitatively match
the pattern of the perceptual magnet effect• But a benefit of these models is to make
quantitative predictions• Do the model’s quantitative predictions match the
degree of perceptual warping experienced by listeners?
37
Iverson & Kuhl (1995)
38
Perceptual*Magnet*Effect
/ε/
/i/
(Iverson & Kuhl, 1995)
• Human perceptual distance between stimuli estimated via discrimination & identification tasks• Model
perceptual distance between stimuli calculated with parameter values from previous empirical studies
Human vs. Model resultsModeling*the*/i/H/e/*DataModeling+the+/i/Q/e/+Data+
1 2 3 4 5 6 7 8 9 10 11 12 130
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2Relative Distances Between Neighboring Stimuli
Perc
eptu
al D
ista
nce
Stimulus Number
MDS
Model
39
• Good quantitative as well as qualitative fit between human data and model results
What have we learned?• A Bayesian model of speech perception predicts
the perceptual magnet effect• i.e. the perceptual magnet effect arises because
listeners use their prior knowledge about likely target productions (within a given category) to inform what they think they heard• This pulls their beliefs closer to the category mean
• This model relies on the (still-controversial) assumption that listeners are trying to infer phonetic detail, not just phonemic categories• The success of this model provides support for that
assumption
40
Reading Feldman & Griffiths (2007)• Notice a common structure:• Identify an empirical psycholinguistic phenomenon (or
set of related phenomena)• Propose a model that could account for this
phenomenon• Implement the model and test its predictions against the
human data
41
Review• Part I: A 2-variable model (category à signal) predicts
the categorization function for a voicing distinction• Correctly predicts the relationship between category variance
and categorization function slope• Definition of the normal/Gaussian distribution• Practice with Bayes Rule
• Interlude: Marr’s levels of analysis• Part II: Perceptual magnet effect
• A 3-variable model correctly predicts the perceptual magnet effect, both qualitatively and quantitatively
• Introduce the expectation of a probability distribution (i.e. the expected average if you took many draws from this distribution)
• Evidence that listeners infer phonetic detail as well as phonemic category
42