127
Perception & Cognition, One at last in Spoken Word Recognition Temporal Integration at Two Time Scales Bob McMurray University of Iowa Dept. of Psychology 5/9/05 Cochlear Implant Team UIHC

Perception & Cognition, One at last in Spoken Word Recognition

  • Upload
    jarvis

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Perception & Cognition, One at last in Spoken Word Recognition Temporal Integration at Two Time Scales. 5/9/05 Cochlear Implant Team UIHC. Bob McMurray University of Iowa Dept. of Psychology. Collaborators. Richard Aslin Michael Tanenhaus David Gow. Joe Toscano Dana Subik - PowerPoint PPT Presentation

Citation preview

Page 1: Perception & Cognition, One at last in Spoken Word Recognition

Perception & Cognition,One at last in Spoken Word Recognition

Temporal Integration at Two Time Scales

Bob McMurrayUniversity of Iowa

Dept. of Psychology

5/9/05Cochlear Implant TeamUIHC

Page 2: Perception & Cognition, One at last in Spoken Word Recognition

Collaborators

Richard AslinMichael TanenhausDavid Gow

Joe ToscanoDana SubikJulie Markant

Page 3: Perception & Cognition, One at last in Spoken Word Recognition

Perceptual processesContinuous acoustic detail critical for

High-level language Processes:

• Word Recognition• Syntax• Reference

Specifically:

Sensitivity to fine-grained perceptual detail can help integrate information over time.

Page 4: Perception & Cognition, One at last in Spoken Word Recognition

Perceptual processesContinuous acoustic detail

critical for

High-level language Processes:

• Word Recognition• Syntax• Reference

Specifically:

Sensitivity to fine-grained perceptual detail can help integrate information over time.

Page 5: Perception & Cognition, One at last in Spoken Word Recognition

Perceptual processesContinuous acoustic detail

prov

ide

support for

High-level language Processes:

• Word Recognition

Page 6: Perception & Cognition, One at last in Spoken Word Recognition

Ganong (1980): Lexical information biases perception of ambiguous phonemes.

d t

duke / tukedoot / toot

% /t

/ Phoneme Restoration (Warren, 1970, Samuel, 1997).

Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003)

Page 7: Perception & Cognition, One at last in Spoken Word Recognition

Ganong (1980): Lexical information biases perception of ambiguous phonemes.

Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003)

phonemes

words

Page 8: Perception & Cognition, One at last in Spoken Word Recognition

Ganong (1980): Lexical information biases perception of ambiguous phonemes.

Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003)

phonemes

words

Page 9: Perception & Cognition, One at last in Spoken Word Recognition

Perceptual processesContinuous acoustic detail

prov

ide

support for

High-level language Processes:

• Word Recognition

Invariance, Covariance & Temporal Integration

• Short-term storage.• Covariance.• Limit sensitivity to

necessary detail.

Page 10: Perception & Cognition, One at last in Spoken Word Recognition

In language, information arrives sequentially.

• Partial syntactic and semantic representations are formed as words arrive.

The Eastside is prettier than the

• Words are identified over sequential phonemes.

əŋ

Westside.

Page 11: Perception & Cognition, One at last in Spoken Word Recognition

Spoken Word Recognition is an ideal arena in which to study these issues because:

• Research divides word recognition into perceptual and cognitive mechanisms.

• Perceptual information available for temporal information integration.

• Cognitive architectures may support perception.

Page 12: Perception & Cognition, One at last in Spoken Word Recognition

Scales of temporal integration in word recognition

• A Word: ordered series of articulations.- Build abstract representations.- Form expectations about future events.- Fast (online) processing.

• A phonology: - Abstract across utterances.- Expectations about possible future events.- Slow (developmental) processing

Page 13: Perception & Cognition, One at last in Spoken Word Recognition

Mechanisms of Temporal Integration

Stimuli do not change arbitrarily.

Perceptual cues reveal something about the change itself.

Active integration:• Anticipating future events• Retain partial present representations.• Resolve prior ambiguity.

Page 14: Perception & Cognition, One at last in Spoken Word Recognition

Representational Medium: Lexical Activation

Lexical activation shows:

• Online processing dynamics.

• Sensitivity to fine-grained detail.

• Integration of asynchronous material.

Page 15: Perception & Cognition, One at last in Spoken Word Recognition

Overview

2) Lexical activation is sensitive to fine-grained detail in speech.

1) Speech perception and Spoken Word Recognition.

3) Fast temporal integration: taking advantage of regularity in the signal for temporal integration.

4) Slow temporal integration: Developmental consequences

Page 16: Perception & Cognition, One at last in Spoken Word Recognition

bakery

ba…

basic

barrier

barricade bait

baby

Xkery

bakery

X

XXX

Online Word Recognition

• Information arrives sequentially• At early points in time, signal is temporarily ambiguous.

• Later arriving information disambiguates the word.

Page 17: Perception & Cognition, One at last in Spoken Word Recognition

Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest moments of input.

• Activation Based: Lexical candidates (words) receive activation to the degree they match the input.

• Parallel Processing: Multiple items are active in parallel.

• Competition: Items compete with each other for recognition.

Page 18: Perception & Cognition, One at last in Spoken Word Recognition

time

Input: b... u… tt… e… r

beach

bump putter

dog

butter

Page 19: Perception & Cognition, One at last in Spoken Word Recognition

These processes have been well defined for a phonemic representation of the input.

Considerably less ambiguity if we consider subphonemic information.

• Bonus: processing dynamics may solve problems in speech perception.

Example: subphonemic effects of motor processes.

Page 20: Perception & Cognition, One at last in Spoken Word Recognition

Coarticulation

Sensitivity to these perceptual details might yield earlier disambiguation.

Lexical activation could store these perceptual details.

Example: CoarticulationArticulation (lips, tongue…) reflects current, future and past events.

Subtle subphonemic variation in speech reflects temporal organization.

n n

e et c

k

Any action reflects future actions as it unfolds.

Page 21: Perception & Cognition, One at last in Spoken Word Recognition

These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded.

Example: Categorical Perception

Page 22: Perception & Cognition, One at last in Spoken Word Recognition

Categorical Perception

B

P

Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme).

• Sharp identification of tokens on a continuum.

VOT

0

100

PB

% /p

/

ID (%/pa/)0

100Discrim

ination

Discrimination

• Discrimination poor within a phonetic category.

Page 23: Perception & Cognition, One at last in Spoken Word Recognition

Evidence against the strong form of Categorical Perception from psychophysical-type tasks:

Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974)Carney, Widin & Viemeister (1977)

Training Samuel (1977)Pisoni, Aslin, Perey & Hennessy (1982)

Goodness Ratings Miller (1997)Massaro & Cohen (1983)

Page 24: Perception & Cognition, One at last in Spoken Word Recognition

Acoustic

Sublexical Units

/b/

/la//a/

/l/ /p/

/ip/

Speech Perception• Acoustics -> phonemes• Perceptual processes (e.g.

templates)

LexiconWord Recognition• Phonemes -> words• Cognitive processes (e.g.

competition, activation)

Fundamental independence of fields.

Enabled by CP.

Evidence against CP seen to support paradigm.

Page 25: Perception & Cognition, One at last in Spoken Word Recognition

?Does within-category acoustic detail

systematically affect higher level language?

Is there a gradient effect of subphonemic detail on lexical activation?

Experiment 1

Page 26: Perception & Cognition, One at last in Spoken Word Recognition

A gradient relationship would yield systematic effects of subphonemic information on lexical activation.

If this gradiency is useful for temporal integration, it must be preserved over time.

Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation.

McMurray, Aslin & Tanenhaus (2002)

Page 27: Perception & Cognition, One at last in Spoken Word Recognition

Use a speech continuum—more steps yields a better picture acoustic mapping.

KlattWorks: generate synthetic continua from natural speech.

Acoustic Detail

9-step VOT continua (0-40 ms)

6 pairs of words.beach/peach bale/pale bear/pearbump/pump bomb/palm butter/putter

6 fillers.lamp leg lock ladder lip leafshark shell shoe ship sheep shirt

Page 28: Perception & Cognition, One at last in Spoken Word Recognition
Page 29: Perception & Cognition, One at last in Spoken Word Recognition

How do we tap on-line recognition?With an on-line task: Eye-movements

Subjects hear spoken language and manipulate objects in a visual world.

Visual world includes set of objects with interesting linguistic properties.

a beach, a peach and some unrelated items.

Eye-movements to each object are monitored throughout the task.

Temporal Dynamics

Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

Page 30: Perception & Cognition, One at last in Spoken Word Recognition

• Relatively natural task.

• Eye-movements generated very fast (within 200ms of first bit of information).

• Eye movements time-locked to speech.

• Subjects aren’t aware of eye-movements.

• Fixation probability maps onto lexical activation..

Why use eye-movements and visual world paradigm?

Page 31: Perception & Cognition, One at last in Spoken Word Recognition

A moment to view the items

Task

Page 32: Perception & Cognition, One at last in Spoken Word Recognition
Page 33: Perception & Cognition, One at last in Spoken Word Recognition

Task

Bear

Repeat 1080 times

Page 34: Perception & Cognition, One at last in Spoken Word Recognition

By subject: 17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms

High agreement across subjects and items for category boundary.

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

VOT (ms)

prop

orti

on /p

/

B P

Identification Results

Page 35: Perception & Cognition, One at last in Spoken Word Recognition

Task

Target = Bear

Competitor = Pear

Unrelated = Lamp, Ship

200 ms

1

2

3

4

5

Trials

Time

% f

ixat

ions

Page 36: Perception & Cognition, One at last in Spoken Word Recognition

Task

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 400 800 1200 1600 0 400 800 1200 1600 2000

Time (ms)

More looks to competitor than unrelated items.

VOT=0 Response= VOT=40 Response=

Fix

atio

n p

ropo

rtio

n

Page 37: Perception & Cognition, One at last in Spoken Word Recognition

Task

Given that • the subject heard bear• clicked on “bear”…

How often was the subject looking at the “pear”?

Categorical Results Gradient Effect

target

competitor

time

Fix

atio

n p

rop

orti

on target

competitor competitorcompetitor

time

Fix

atio

n p

rop

orti

on target

Page 38: Perception & Cognition, One at last in Spoken Word Recognition

Results

0 400 800 1200 16000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 ms5 ms10 ms15 ms

VOT

0 400 800 1200 1600 2000

20 ms25 ms30 ms35 ms40 ms

VOT

Com

pet

itor

Fix

atio

ns

Time since word onset (ms)

Response= Response=

Long-lasting gradient effect: seen throughout the timecourse of processing.

Page 39: Perception & Cognition, One at last in Spoken Word Recognition

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

CategoryBoundary

Response= Response=

Looks to

Looks to C

omp

etit

or F

ixat

ion

s

B: p=.017* P: p<.001***Clear effects of VOT

Linear Trend B: p=.023* P: p=.002***

Area under the curve:

Page 40: Perception & Cognition, One at last in Spoken Word Recognition

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

Response= Response=

Looks to

Looks to

B: p=.014* P: p=.001***Clear effects of VOT

Linear Trend B: p=.009** P: p=.007**

Unambiguous Stimuli Only

CategoryBoundaryC

omp

etit

or F

ixat

ion

s

Page 41: Perception & Cognition, One at last in Spoken Word Recognition

Summary

Subphonemic acoustic differences in VOT have gradient effect on lexical activation.

• Gradient effect of VOT on looks to the competitor.

• Seems to be long-lasting.

• Effect holds even for unambiguous stimuli.

Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).

Page 42: Perception & Cognition, One at last in Spoken Word Recognition

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

The Proposed Framework

2) Acoustic detail is represented as gradations in activation across the lexicon.

3) This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration.

4) This has fundamental consequences for development: learning phonological organization.

Sensitivity & Use

Page 43: Perception & Cognition, One at last in Spoken Word Recognition

Lexical Sensitivity

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

Voicing Laterality, Manner, Place Natural Speech

X Metalinguistic Tasks P

B Sh

L

Bear

Page 44: Perception & Cognition, One at last in Spoken Word Recognition

Lexical Sensitivity

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

0 5 10 15 20 25 30 35 40

VOT (ms)

CategoryBoundary

0

0.02

0.04

0.06

0.08

0.1

Response=BLooks to B

Response=PLooks to B

Com

peti

tor

Fix

atio

ns

Voicing Laterality, Manner, Place Natural Speech

X Metalinguistic Tasks

Page 45: Perception & Cognition, One at last in Spoken Word Recognition

Lexical Sensitivity

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

0 5 10 15 20 25 30 35 40

VOT (ms)

CategoryBoundary

0

0.02

0.04

0.06

0.08

0.1

Response=BLooks to B

Response=PLooks to B

Com

peti

tor

Fix

atio

ns

Voicing Laterality, Manner, Place Natural Speech

X Metalinguistic Tasks

Page 46: Perception & Cognition, One at last in Spoken Word Recognition

Lexical Sensitivity

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

Voicing Laterality, Manner, Place Natural Speech

X Metalinguistic Tasks

? Non minimal pairs? Duration of effect

(experiment 1)

Page 47: Perception & Cognition, One at last in Spoken Word Recognition

2) Acoustic detail is represented as gradations in activation across the lexicon.

time

Input: b... u… m… p…

bun

bumper

pump

dump

bump

bomb

Page 48: Perception & Cognition, One at last in Spoken Word Recognition

3) This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration.

Regressive ambiguity resolution (exp 1):• Ambiguity retained until more information arrives.

Progressive expectation building (exp 2):• Phonetic distinctions are spread over time• Anticipate upcoming material.

Temporal Integration

Page 49: Perception & Cognition, One at last in Spoken Word Recognition

4) Consequences for development: learning phonological organization.

Learning a language: • Integrating input across many utterances to build

long-term representation.

Sensitivity to subphonemic detail (exp 4 & 5).• Allows statistical learning of categories (model).

Development

Page 50: Perception & Cognition, One at last in Spoken Word Recognition

?Experiment 2

?

How long are gradient effects of within-category detail maintained?

Can subphonemic variation play a role in ambiguity resolution?

How is information at multiple levels integrated?

Page 51: Perception & Cognition, One at last in Spoken Word Recognition

Competitor still active - easy to activate it rest of the way.

Competitor completely inactive- system will “garden-path”.

P ( misperception ) distance from boundary.

Gradient activation allows the system to hedge its bets.

What if initial portion of a stimulus was misperceived?

Misperception

Page 52: Perception & Cognition, One at last in Spoken Word Recognition

time

Input: p/b eI r ə k i t…

parakeet

barricade

Categorical Lexicon

barricade vs. parakeet

parakeet

barricade

Gradient Sensitivity

/ beIrəkeId / vs. / peIrəkit /

Page 53: Perception & Cognition, One at last in Spoken Word Recognition

10 Pairs of b/p items.

Voiced Voiceless OverlapBumpercar Pumpernickel 6

Barricade Parakeet 5

Bassinet Passenger 5

Blanket   Plankton 5

Beachball Peachpit 4

Billboard Pillbox 4

Drain Pipes Train Tracks 4

Dreadlocks Treadmill    4

Delaware Telephone   4

Delicatessen Television   4

Methods

Page 54: Perception & Cognition, One at last in Spoken Word Recognition

X

Page 55: Perception & Cognition, One at last in Spoken Word Recognition

0

5

10

15

20

25

30

35

0

0.2

0.4

0.6

0.8

1

300 600 900

Time (ms)

Fix

atio

ns to

Tar

get

VOT

Barricade -> Parricade

Eye Movement Results

Faster activation of target as VOTs near lexical endpoint.

--Even within the non-word range.

Page 56: Perception & Cognition, One at last in Spoken Word Recognition

0

5

10

15

20

25

30

35

0

0.2

0.4

0.6

0.8

1

300 600 900

Time (ms)

Fix

atio

ns to

Tar

get

VOT

Barricade -> Parricade

Eye Movement Results

Parakeet -> Barakeet

300 600 900 1200

Time (ms)

Faster activation of target as VOTs near lexical endpoint.

--Even within the non-word range.

Page 57: Perception & Cognition, One at last in Spoken Word Recognition

Gradient effect of within-category variation without minimal-pairs.

Experiment 2 Conclusions

Gradient effect long-lasting: mean POD = 240 ms.

Regressive ambiguity resolution:

• Subphonemic gradations maintained until more information arrives.

• Subphonemic gradation can improve (or hinder) recovery from garden path.

Page 58: Perception & Cognition, One at last in Spoken Word Recognition

Progressive Expectation Formation

Can within-category detail be used to predict future acoustic/phonetic events?

Yes: Phonological regularities create systematic within-category variation.

• Predicts future events.

Page 59: Perception & Cognition, One at last in Spoken Word Recognition

time

Input: m… a… rr… oo… ng… g… oo… s…

maroon

goose

goat

duck

Word-final coronal consonants (n, t, d) assimilate the place of the following segment.

Place assimilation -> ambiguous segments —anticipate upcoming material.

Experiment 3: Anticipation

Maroong Goose Maroon Duck

Page 60: Perception & Cognition, One at last in Spoken Word Recognition

Subject hears “select the maroon duck”“select the maroon goose”“select the maroong goose”“select the maroong duck” *

We should see faster eye-movements to “goose” after assimilated consonants.

Page 61: Perception & Cognition, One at last in Spoken Word Recognition

Results

Looks to “goose“ as a function of time

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600Time (ms)

Fix

atio

n P

rop

orti

on

Assimilated

Non Assimilated

Onset of “goose” + oculomotor delay

Anticipatory effect on looks to non-coronal.

Page 62: Perception & Cognition, One at last in Spoken Word Recognition

Inhibitory effect on looks to coronal (duck, p=.024)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600Time (ms)

Fix

atio

n P

rop

orti

on

AssimilatedNon Assimilated

Looks to “duck” as a function of time

Onset of “goose” + oculomotor delay

Page 63: Perception & Cognition, One at last in Spoken Word Recognition

Sensitivity to subphonemic detail:• Increase priors on likely upcoming events.• Decrease priors on unlikely upcoming events.• Active Temporal Integration Process.

Occasionally assimilation creates ambiguity• Resolves prior ambiguity: mudg drinker• Similar to experiment 2…

• Progressive effect delayed 200ms by lexical competition.

Page 64: Perception & Cognition, One at last in Spoken Word Recognition

Lexical activation is exquisitely sensitive to within-category detail.

This sensitivity is useful to integrate material over time.

• Regressive Ambiguity resolution. • Progressive Facilitation

Underpins a potentially lexical role in speech perception.

Adult Summary

Page 65: Perception & Cognition, One at last in Spoken Word Recognition

Historically, work in speech perception has been linked to development.

Sensitivity to subphonemic detail must revise our view of development.

Development

Use: Infants face additional temporal integration problems

No lexicon available to clean up noisy input: rely on acoustic regularities.

Extracting a phonology from the series of utterances.

Page 66: Perception & Cognition, One at last in Spoken Word Recognition

Sensitivity to subphonemic detail:

For 30 years, virtually all attempts to address this question have yielded categorical discrimination (e.g. Eimas, Siqueland, Jusczyk & Vigorito, 1971).

Exception: Miller & Eimas (1996).• Only at extreme VOTs.• Only when habituated to non- prototypical token.

Page 67: Perception & Cognition, One at last in Spoken Word Recognition

Nonetheless, infants possess abilities that would require within-category sensitivity.

• Infants can use allophonic differences at word boundaries for segmentation (Jusczyk, Hohne & Bauman, 1999; Hohne, & Jusczyk, 1994)

• Infants can learn phonetic categories from distributional statistics (Maye, Werker & Gerken, 2002; Maye & Weiss, 2004).

Use?

Page 68: Perception & Cognition, One at last in Spoken Word Recognition

Speech production causes clustering along contrastive phonetic dimensions.

E.g. Voicing / Voice Onset TimeB: VOT ~ 0P: VOT ~ 40

Result: Bimodal distribution

Within a category, VOT forms Gaussian distribution.

VOT0ms 40ms

Statistical Category Learning

Page 69: Perception & Cognition, One at last in Spoken Word Recognition

• Extract categories from the distribution.

+voice -voice

• Record frequencies of tokens at each value along a stimulus dimension.

VOT

freq

uenc

y

0ms 50ms

To statistically learn speech categories, infants must:

• This requires ability to track specific VOTs.

Page 70: Perception & Cognition, One at last in Spoken Word Recognition

Why no demonstrations of sensitivity?

• HabituationDiscrimination not ID.Possible selective adaptation.Possible attenuation of sensitivity.

• Synthetic speechNot ideal for infants.

• Single exemplar/continuumNot necessarily a category representation

Experiment 4: Reassess issue with improved methods.

Experiment 4

Page 71: Perception & Cognition, One at last in Spoken Word Recognition

Head-Turn Preference Procedure (Jusczyk & Aslin, 1995)

Infants exposed to a chunk of language:

• Words in running speech.

• Stream of continuous speech (ala statistical learning paradigm).

• Word list.

Memory for exposed items (or abstractions) assessed:• Compare listening time between consistent and

inconsistent items.

HTPP

Page 72: Perception & Cognition, One at last in Spoken Word Recognition

Test trials start with all lights off.

Page 73: Perception & Cognition, One at last in Spoken Word Recognition

Center Light blinks.

Page 74: Perception & Cognition, One at last in Spoken Word Recognition

Brings infant’s attention to center.

Page 75: Perception & Cognition, One at last in Spoken Word Recognition

One of the side-lights blinks.

Page 76: Perception & Cognition, One at last in Spoken Word Recognition

When infant looks at side-light……he hears a word

Beach… Beach… Beach…

Page 77: Perception & Cognition, One at last in Spoken Word Recognition

…as long as he keeps looking.

Page 78: Perception & Cognition, One at last in Spoken Word Recognition

7.5 month old infants exposed to either 4 b-, or 4 p-words.

80 repetitions total.

Form a category of the exposed class of words.

PeachBeach

PailBail

PearBear

PalmBomb

Measure listening time on…

VOT closer to boundary

Competitors

Original words

Pear*Bear*

BearPear

PearBear

Methods

Page 79: Perception & Cognition, One at last in Spoken Word Recognition

B* and P* were judged /b/ or /p/ at least 90% consistently by adult listeners.

B*: 97%P*: 96%

Stimuli constructed by cross-splicing naturally produced tokens of each end point.

B: M= 3.6 ms VOTP: M= 40.7 ms VOT

B*: M=11.9 ms VOTP*: M=30.2 ms VOT

Page 80: Perception & Cognition, One at last in Spoken Word Recognition

Novelty/Familiarity preference varies across infants and experiments.

1221P

1636B

FamiliarityNoveltyWithin each group will we see evidence for gradiency?

We’re only interested in the middle stimuli (b*, p*).

Infants were classified as novelty or familiarity preferring by performance on the endpoints.

Novelty or Familiarity?

Page 81: Perception & Cognition, One at last in Spoken Word Recognition

Categorical

What about in between?

After being exposed to bear… beach… bail… bomb…

Infants who show a novelty effect……will look longer for pear than bear.

Gradient

Bear*Bear Pear

Lis

teni

ng T

ime

Page 82: Perception & Cognition, One at last in Spoken Word Recognition

4000

5000

6000

7000

8000

9000

10000

Target Target* Competitor

Lis

ten

ing

Tim

e (m

s)

B

P

Exposed to:

Novelty infants (B: 36 P: 21)

Target vs. Target*:Competitor vs. Target*:

p<.001p=.017

Results

Page 83: Perception & Cognition, One at last in Spoken Word Recognition

Familiarity infants (B: 16 P: 12)

Target vs. Target*:Competitor vs. Target*:

P=.003p=.012

4000

5000

6000

7000

8000

9000

10000

Target Target* Competitor

Lis

ten

ing

Tim

e (m

s) B

P

Exposed to:

Page 84: Perception & Cognition, One at last in Spoken Word Recognition

NoveltyN=21

P P* B

.024*

.009**

P P* B

.024*

.009**

4000

5000

6000

7000

8000

9000

10000

Lis

ten

ing

Tim

e (m

s)

Infants exposed to /p/

P* B4000

5000

6000

7000

8000

9000

.018*

.028*

.018*

P

Lis

ten

ing

Tim

e (m

s).028*

FamiliarityN=12

Page 85: Perception & Cognition, One at last in Spoken Word Recognition

NoveltyN=36

<.001**>.1

<.001**>.2

4000

5000

6000

7000

8000

9000

10000

B B* P

Lis

ten

ing

Tim

e (m

s)

Infants exposed to /b/

FamiliarityN=16

4000

5000

6000

7000

8000

9000

10000

B B* P

Lis

ten

ing

Tim

e (m

s).06

.15

Page 86: Perception & Cognition, One at last in Spoken Word Recognition

7.5 month old infants show gradient sensitivity to subphonemic detail.

• Clear effect for /p/• Effect attenuated for /b/.

Contrary to all previous work:

Experiment 4 Conclusions

Page 87: Perception & Cognition, One at last in Spoken Word Recognition

Reduced effect for /b/… But:

Bear Pear

Lis

teni

ng T

ime

Bear*

Null Effect?

Bear Pear

Lis

teni

ng T

ime

Bear*

Expected Result?

Page 88: Perception & Cognition, One at last in Spoken Word Recognition

• Bear* Pear

Bear Pear

Lis

teni

ng T

ime

Bear*

Actual result.

• Category boundary lies between Bear & Bear*- Between (3ms and 11 ms) [??]

• Within-category sensitivity in a different range?

Page 89: Perception & Cognition, One at last in Spoken Word Recognition

Same design as experiment 3.

VOTs shifted away from hypothesized boundary

Train

40.7 ms.Palm Pear Peach Pail

3.6 ms.Bomb* Bear* Beach* Bale*

-9.7 ms.Bomb Bear Beach Bale

Test:

Bomb Bear Beach Bale -9.7 ms.

Experiment 5

Page 90: Perception & Cognition, One at last in Spoken Word Recognition

Familiarity infants (34 Infants)

4000

5000

6000

7000

8000

9000

B- B P

Lis

ten

ing

Tim

e (m

s)

=.05*

=.01**

Page 91: Perception & Cognition, One at last in Spoken Word Recognition

Novelty infants (25 Infants)

=.02*

=.002**

4000

5000

6000

7000

8000

9000

B- B P

Lis

ten

ing

Tim

e (m

s)

Page 92: Perception & Cognition, One at last in Spoken Word Recognition

• Within-category sensitivity in /b/ as well as /p/.

• Shifted category boundary in /b/: not consistent with adult boundary (or prior infant work). Why?

Experiment 5 Conclusions

Page 93: Perception & Cognition, One at last in Spoken Word Recognition

/b/ results consistent with (at least) two mappings.C

ateg

ory

Map

ping

Str

engt

h

1) Shifted boundary

• Inconsistent with prior literature.

• Why would infants have this boundary?

VOT

/b/ /p/

Page 94: Perception & Cognition, One at last in Spoken Word Recognition

/p/

VOT

Adult boundary

/b/

Cat

egor

y M

appi

ngS

tren

gth

HTPP is a one-alternative task. Asks: B or not-B not: B or P

Hypothesis: Sparse categories: by-product of efficient learning.

2) Sparse Categoriesunmappedspace

Page 95: Perception & Cognition, One at last in Spoken Word Recognition

Distributional learning model

1) Model distribution of tokens asa mixture of Gaussian distributions over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with the highest posterior probability is the “category”.

VOT

3) Each Gaussian has threeparameters:

/b/

VOT

Adult boundary

/p/

Cat

egor

y M

appin

gSt

rengt

h

unmappedspace/b/

VOT

Adult boundary

/p/

Cat

egor

y M

appin

gSt

rengt

h

unmappedspace

Computational Model

Page 96: Perception & Cognition, One at last in Spoken Word Recognition

Statistical Category Learning

1) Start with a set of randomly selected Gaussians.

2) After each input, adjust each parameter to find best description of the input.

3) Start with more Gaussians than necessary--model doesn’t innately know how many categories.

-> 0 for unneeded categories.

VOT VOT

Page 97: Perception & Cognition, One at last in Spoken Word Recognition
Page 98: Perception & Cognition, One at last in Spoken Word Recognition

Overgeneralization • large • costly: lose phonetic distinctions…

Page 99: Perception & Cognition, One at last in Spoken Word Recognition

Undergeneralization• small • not as costly: maintain distinctiveness.

Page 100: Perception & Cognition, One at last in Spoken Word Recognition

To increase likelihood of successful learning:• err on the side of caution.• start with small

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60

Starting

P(S

ucc

ess)

2 Category Model

39,900ModelsRun

3 Category Model

Page 101: Perception & Cognition, One at last in Spoken Word Recognition

Sparseness coefficient: % of space not strongly mapped to any category.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2000 4000 6000 8000 10000 12000

Training Epochs

Avg

Sp

arse

nes

s C

oeff

icie

nt

Starting

VOT

Small

.5-1

Unmapped space

Page 102: Perception & Cognition, One at last in Spoken Word Recognition

Start with large σ

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2000 4000 6000 8000 10000 12000

Training Epochs

Avg

Sp

arsi

ty C

oeff

icie

nt

20-40

Starting

VOT

.5-1

Page 103: Perception & Cognition, One at last in Spoken Word Recognition

Intermediate starting σ

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2000 4000 6000 8000 10000 12000

Training Epochs

Avg

Sp

arsi

ty C

oeff

icie

nt

12-17

3-11

Starting

VOT

.5-1

20-40

Page 104: Perception & Cognition, One at last in Spoken Word Recognition

1) Occasionally model leaves sparse regions at the end of learning.

• Competition/Choice framework:Additional competition or selection mechanisms during processing: categorization despite incomplete information.

Limitations

2) Multi-dimensional categories1-D: 3 parameters / category2-D: 6 “ “3-D: 13 “ “

4-D: 15 “ “• Cue/model-reliability may reduce dimensionality.

Page 105: Perception & Cognition, One at last in Spoken Word Recognition

• Similar properties in terms of starting and sparseness.

VOT

Categories• Competitive Hebbian Learning

(Rumelhart & Zipser, 1986).

• Not constrained by a particular equation—can fill space better.

Non-parametric approach?

Page 106: Perception & Cognition, One at last in Spoken Word Recognition

Small or even medium starting ’s lead to sparse category structure during infancy—much of phonetic space is unmapped.

To avoid overgeneralization……better to start with small estimates for

Sparse categories:Similar temporal integration to exp 2

Retain ambiguity (and partial representations) until more input is available.

Model Conclusions

Page 107: Perception & Cognition, One at last in Spoken Word Recognition

Examination of sparseness/completeness of categories needs a two alternative task.

Anticipatory Eye Movements(McMurray & Aslin, 2005)

Infants are trained to make anticipatory eye movements in response to auditory or visual stimulus.

Post-training, generalization can be assessed with respect to both targets.

bear

pail

AEM Paradigm

Quicktime Demo

Also useful with• Color• Shape• Spatial Frequency• Faces

Page 108: Perception & Cognition, One at last in Spoken Word Recognition

Anticipatory Eye Movements

Train: Bear0: LeftPail35: Right

Test: Bear0 Pear40

Bear5 Pear35

Bear10 Pear30

Bear15 Pear25

Same naturally-produced tokens from Exps 4 & 5.

palm

beach

Experiment 6

Page 109: Perception & Cognition, One at last in Spoken Word Recognition

Expected results

VOT

Adult boundary

unmapped

space

VOTVOT

Pail

Per

form

ance

Bear

Sparse categories

Page 110: Perception & Cognition, One at last in Spoken Word Recognition

% Correct: 67%9 / 16 Better than chance.Training Tokens {

0

0.25

0.5

0.75

1

0 10 20 30 40

VOT

% C

orre

ct

Beach

Palm

Results

Page 111: Perception & Cognition, One at last in Spoken Word Recognition

Infants show graded sensitivity to subphonemic detail.

/b/-results: regions of unmapped phonetic space.

Statistical approach provides support for sparseness.• Given current learning theories, sparseness results from

optimal starting parameters.

Empirical test will require a two-alternative task.• AEM: train infants to make eye-movements in

response to stimulus identity.

What is the role of the emerging lexicon?

Infant Summary

Page 112: Perception & Cognition, One at last in Spoken Word Recognition

Conclusions

Infant and adults sensitive to subphonemic detail.

Sensitivity is important to adult and developing word recognition systems.

1) Short term cue integration.2) Long term phonology learning.

In both cases…Partially ambiguous material is retained by

lexical activation until more data arrives.

Partially active representations anticipate likelihood of future material (classes of words)

Page 113: Perception & Cognition, One at last in Spoken Word Recognition

Conclusions

Spoken language is defined by change.

But the information to cope with it is in the signal—if the lexicon looks online.

Within-category acoustic variation is signal, not noise.

Page 114: Perception & Cognition, One at last in Spoken Word Recognition

Within-Category Variation is Used in Spoken Word Recognition

Temporal Integration at Two Time Scales

Bob McMurrayUniversity of Iowa

Dept. of Psychology

Page 115: Perception & Cognition, One at last in Spoken Word Recognition

Head-Tracker Cam Monitor

IR Head-Tracker Emitters

EyetrackerComputer

SubjectComputer

Computers connected via Ethernet

Head

2 Eye cameras

Page 116: Perception & Cognition, One at last in Spoken Word Recognition

Misperception: Additional Results

Page 117: Perception & Cognition, One at last in Spoken Word Recognition

10 Pairs of b/p items.• 0 – 35 ms VOT continua.

20 Filler items (lemonade, restaurant, saxophone…)

Option to click “X” (Mispronounced).

26 Subjects

1240 Trials over two days.

Page 118: Perception & Cognition, One at last in Spoken Word Recognition

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 5 10 15 20 25 30 35

Barricade

Res

pon

se R

ate

Voiced

Voiceless

NW

Identification Results

Parricade

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 5 10 15 20 25 30 35

Voiced

Voiceless

NW

Barakeet Parakeet

Res

pon

se R

ate

Significant target responses even at extreme.

Graded effects of VOT on correct response rate.

Page 119: Perception & Cognition, One at last in Spoken Word Recognition

“Garden-path” effect:Difference between looks to each target (b

vs. p) at same VOT.

VOT = 0 (/b/)

0

0.2

0.4

0.6

0.8

1

0 500 1000

Time (ms)

Fix

atio

ns

to T

arge

t

Barricade

Parakeet

VOT = 35 (/p/)

0 500 1000 1500

Time (ms)

Phonetic “Garden-Path”

Page 120: Perception & Cognition, One at last in Spoken Word Recognition

-0.1

-0.05

0

0.05

0.1

0.15

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den

-Pat

h E

ffec

t(

Bar

rica

de

- P

arak

eet

)

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den

-Pat

h E

ffec

t (

Bar

rica

de

- P

arak

eet

)

Target

Competitor

GP Effect:Gradient effect of VOT.

Target: p<.0001Competitor: p<.0001

Page 121: Perception & Cognition, One at last in Spoken Word Recognition

Assimilation: Additional Results

Page 122: Perception & Cognition, One at last in Spoken Word Recognition

runm picks

runm takes ***

When /p/ is heard, the bilabial feature can be assumed to come from assimilation (not an underlying /m/).

When /t/ is heard, the bilabial feature is likely to be from an underlying /m/.

Page 123: Perception & Cognition, One at last in Spoken Word Recognition

Within-category detail used in recovering from assimilation: temporal integration.

• Anticipate upcoming material• Bias activations based on context

- Like Exp 2: within-category detail retained to resolve ambiguity..

Phonological variation is a source of information.

Exp 3 & 4: Conclusions

Page 124: Perception & Cognition, One at last in Spoken Word Recognition

Subject hears“select the mud drinker”“select the mudg gear” “select the mudg drinker

Critical Pair

Page 125: Perception & Cognition, One at last in Spoken Word Recognition

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Fix

atio

n P

rop

orti

on

Initial Coronal:Mud Gear

Initial Non-Coronal:Mug Gear

Onset of “gear” Avg. offset of “gear” (402 ms)

Mudg Gear is initially ambiguous with a late bias towards “Mud”.

Page 126: Perception & Cognition, One at last in Spoken Word Recognition

0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Time (ms)

Fix

atio

n P

ropo

rtio

n

Initial Coronal: Mud Drinker

Initial Non-Coronal: Mug Drinker

Onset of “drinker” Avg. offset of “drinker (408 ms)

Mudg Drinker is also ambiguous with a late bias towards “Mug” (the /g/ has to come from somewhere).

Page 127: Perception & Cognition, One at last in Spoken Word Recognition

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 200 400 600Time (ms)

Fix

atio

n P

rop

orti

on

Assimilated

Non Assimilated

Onset of “gear”

Looks to non-coronal (gear) following assimilated or non-assimilated consonant.

In the same stimuli/experiment there is also a progressive effect!