Linguistic Society of America - 2 Sound categorization · 2020. 2. 28. · Sound categorization •Hear an acoustic signal, recover the sound category •Example: Distinguish between

Sound categorizationDr. Emily Morgan

Sound categorization• Hear an acoustic signal, recover the sound category• Example: Distinguish between two stops which

differ only in voicing, e.g.• /p/ vs. /b/• /t/ vs. /d/

2

Voice Onset Time (VOT) is the primary cue distinguishing voiced from voiceless stops

3

(Chen, 1980)

/b/ /p/

Identification task (/ba/ vs. /pa/)

4

Voice onset time (msec)0 10 20 30 40 50 60 70

100

50

0

/ba/ /pa/

• How do listeners categorize these acoustic signals?

What is the generative model for the production of the acoustic signal?

5

C

S

category (/b/ or /p/)

acoustic signal(in particular, the VOT)

C ~ Binomial(p/b/)S|C ~ N(𝜇C , 𝜎)where 𝜇C depends on the value of C

-20 0 20 40 60 80

0.000

0.010

0.020

0.030

VOT

Pro

babi

lity

dens

ity

-20 0 20 40 60 80

0.000

0.010

0.020

0.030

VOT

Pro

babi

lity

dens

ity Concrete example:p/b/ = 0.5𝜇/b/ = 0𝜇/p/ = 50𝜎 = 12

Detour: How do we actually calculate the normal distribution?• It’s defined by the equation:

6

Suppose I hear a sound with VOT of 27ms. Which category does it belong to?

7

𝑃 𝐶 = /b/ 𝑆 = 27) = ?

BayesRule:

𝑃 𝐶 𝑆 =𝑃 𝑆 𝐶 ∗ 𝑃(𝐶)

𝑃(𝑆)

=𝑃 𝑆 𝐶 ∗ 𝑃(𝐶)

∑: 𝑃 𝑆 𝐶 ∗ 𝑃(𝐶)

=𝑃 27 /b/ ∗ 𝑃(/b/)

𝑃 27 /b/ ∗ 𝑃 /b/ + 𝑃 27 /p/ ∗ 𝑃(/p/)

=0.0026 ∗ 0.5

0.0026 ∗ 0.5 + 0.0053 ∗ 0.5

=.33C ~ Binomial(p/b/)p/b/ = 0.5

𝑃 𝐶 = /b/ = 0.5

S|C ~ N(𝜇C , 𝜎)𝜇/b/ = 0, 𝜇/p/ = 50, 𝜎 = 12

𝑃 𝑆 𝐶 =12𝜋𝜎D

exp(𝑥 − 𝜇𝐶)D

2𝜎D

𝑃 27 /b/ = HDI∗HDJ

exp (DKLM)J

D∗HDJ≈ 0.0026

Plotting P(C=/b/|S) for different values of S gives us a categorization function

• The model’s predicted categorization function has the same shape as the human categorization data• Evidence that our model could be representing the way

humans do this categorization

8

Benefits of computational modeling• Our model makes precise, numeric predictions

about how often listeners should categorization an ambiguous stimulus as /b/ vs. /p/• What further predictions does our model make?

9

Prediction: Categorization function slope changes with category variance

10

Clayards (2008) tested this prediction

Clayards (2008)• Participants were familiarized with synthesized

audio, with VOTs drawn from either the broader or narrower distribution

11

beach peach

• Then tested on categorizing VOTs across the whole continuum

Clayards (2008): Results

12

Prediction:

Results:

So far…• Modeled phoneme categories as Gaussian/normal

distributions over VOTs• Calculated the exact posterior probability that a

sound belongs to a particular category (using Bayes rule)• Predicted that the categorization function should

become steeper if category variance is smaller• Confirmed our prediction

13

Interlude: Marr’s (1982) levels of analysis for cognitive models• Computational level• What is the structure of the information processing

problem?• What are the inputs and outputs?• What information is relevant to solving the problem?

• Algorithmic level• What representations and algorithms are used?

• Implementational level• How are the representations and algorithms

implemented neurally?

• These levels are mutually constraining, and all part of fully understanding cognition

14

Perceptual magnet effect• Empirical work by Iverson & Kuhl (1995)• Modeling work by Feldman & Griffiths (2007)

15

28

Perceptual magnet effect

16

Perceptual magnet effect

17

Perceptual*Magnet*Effect

/ε/

/i/

(Iverson & Kuhl, 1995)

Perceptual magnet effectPerceptual*Magnet*Effect


Perceptual+Magnet+Effect+

Perceived+S.muli:+

Actual+S.muli:+


To account for this, we need a new generative model for speech perception

18

This is the perceptual magnet effect.

• Why does it occur?• To answer, we’ll need a slightly more complicated

generative model


/ε/

/i/



/ε/

/i/


19

C

T

category

target acoustic signal

S acoustic signal as heard by listener

/i/

noise in the signal

Assumption: Listener infers not just P(C|S) but also P(T|S)

20

C

T

category

target acoustic signal


C ~ Binomial(pC)

T|C ~ N(𝜇C , 𝜎C)

S|T ~ N(T , 𝜎S)

Let’s start by considering a case where we know the category C(or equivalently, where there’s only one category)

21

Statistical*Model

€

N µc,σ c2( )

T

€

N T,σS2( )

Phonetic Category ‘c’

Speech Signal Noise

S

Target Production

Speech Sound

22

Statistical*Model

€

N µc,σ c2( )

T

€

N T,σS2( )


Speech Signal Noise

S

Target Production

Speech Sound

23

Statistical*Model

?€

N µc,σ c2( )


SSpeech Sound

€

N T,σS2( )

Speech Signal Noise• Need to infer probability of a target T given the acoustic signal S

24

𝑃 𝑇 𝑆 =𝑃 𝑆 𝑇 ∗ 𝑃(𝑇)

𝑃(𝑆)

T target acoustic signal


T ~ N(𝜇C , 𝜎C)

S|T ~ N(T , 𝜎S)

𝑃 𝑇 𝑆 = ?

Bayes Rule:

or if S is the data and T is the hypothesis:

𝑃 ℎ 𝑑 =𝑃 𝑑 ℎ ∗ 𝑃(ℎ)

𝑃(𝑑)

posteriorlikelihood prior

25

Statistical*Model

?€

N µc,σ c2( )


SSpeech Sound

Speech Signal Noise

Prior, p(h)

Hypotheses, h

Data, d Likelihood, p(d|h)

€

N T,σS2( )

26

Bayes*for*Speech*Perception

PriorLikelihood

SSpeech Sound

27


PriorLikelihood

Posterior

SSpeech Sound

28


€

E T |S,c[ ]=σ c2S+σS

2µc

σ c2 +σS

2

PriorLikelihood

Posterior

29

Perceptual*Warping(S)

(inferred best-guess T)

• In real speech perception, the listener also has to infer the category C• Marginalizeovercategories:

P 𝑇 𝑆 = ]^

P 𝑇 𝑆, 𝐶 ∗ P(𝐶|𝑆)

Multiple categories

30

solution for a single

category

probability of category membership(calculated via further applications of Bayes Rule and marginalization)

31

Multiple*Categories

SSpeech Sound

32

Multiple*Categories

SSpeech Sound

33

Multiple*Categories

€


2µc

σ c2 +σS

2

34

Multiple*Categories

€


2µc

σ c2 +σS

2

35

Multiple*Categories

€

E T |S[ ]=σ c2S+σS

2µc

σ c2 +σS

2 p c |S( )c∑

36

Perceptual*Warping

So far…• The predictions of the model qualitatively match

the pattern of the perceptual magnet effect• But a benefit of these models is to make

quantitative predictions• Do the model’s quantitative predictions match the

degree of perceptual warping experienced by listeners?

37

Iverson & Kuhl (1995)

38


/ε/

/i/


• Human perceptual distance between stimuli estimated via discrimination & identification tasks• Model

perceptual distance between stimuli calculated with parameter values from previous empirical studies

Human vs. Model resultsModeling*the*/i/H/e/*DataModeling+the+/i/Q/e/+Data+

1 2 3 4 5 6 7 8 9 10 11 12 130

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Distances Between Neighboring Stimuli

Perc

eptu

al D

ista

nce

Stimulus Number

MDS

Model

39

• Good quantitative as well as qualitative fit between human data and model results

What have we learned?• A Bayesian model of speech perception predicts

the perceptual magnet effect• i.e. the perceptual magnet effect arises because

listeners use their prior knowledge about likely target productions (within a given category) to inform what they think they heard• This pulls their beliefs closer to the category mean

• This model relies on the (still-controversial) assumption that listeners are trying to infer phonetic detail, not just phonemic categories• The success of this model provides support for that

assumption

40

Reading Feldman & Griffiths (2007)• Notice a common structure:• Identify an empirical psycholinguistic phenomenon (or

set of related phenomena)• Propose a model that could account for this

phenomenon• Implement the model and test its predictions against the

human data

41

Review• Part I: A 2-variable model (category à signal) predicts

the categorization function for a voicing distinction• Correctly predicts the relationship between category variance

and categorization function slope• Definition of the normal/Gaussian distribution• Practice with Bayes Rule

• Interlude: Marr’s levels of analysis• Part II: Perceptual magnet effect

• A 3-variable model correctly predicts the perceptual magnet effect, both qualitatively and quantitatively

• Introduce the expectation of a probability distribution (i.e. the expected average if you took many draws from this distribution)

• Evidence that listeners infer phonetic detail as well as phonemic category

42

Documents

Linguistic Society of America - 2 Sound categorization · 2020. 2. 28. · Sound categorization •Hear an acoustic signal, recover the sound category •Example: Distinguish between