21

Voice source characteristics in speaker segregation

  • Upload
    apu

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Voice source characteristics in speaker segregation. Patti Adank. Aim project : to establish whether voice source characteristics of speakers can be useful to listeners when attending to a target speaker in a multi-speaker situation. - PowerPoint PPT Presentation

Citation preview

Page 1: Voice source characteristics in speaker segregation
Page 2: Voice source characteristics in speaker segregation

Voice source characteristics in

speaker segregation

Patti Adank

Page 3: Voice source characteristics in speaker segregation

• Some speaker-related characteristics have been

found to be helpful:

Darwin et al. 2003, F0 (pitch) and vocal tract length (VTL)

differences between concurrent speakers help listeners attending

to the target speaker

• Aim project:

to establish whether voice source characteristics of speakers can

be useful to listeners when attending to a target speaker in a

multi-speaker situation

Page 4: Voice source characteristics in speaker segregation

• Speaker-related differences that might aid listeners:

- style of speech

- voice quality: creaky voice, roughness, breathiness

• My experiments:

- establish the possible relevance of acoustic aspect of a creaky

voice: jitter

• Speaker-related differences that aid listeners:

- F0 difference (if > 2 semitones)

- Vocal tract length difference (VTL) (if > 1.08)

- Effects of F0 and VTL are superadditive Darwin et al. 2003

Page 5: Voice source characteristics in speaker segregation

Time (s)0 0.0663689

-0.8568

0.9091

0

Pitch: periodicity of the voice source

Time (s)0 0.111872

-0.8568

0.9091

0

Page 6: Voice source characteristics in speaker segregation

Time (s)0 0.1065

-0.7458

0.8588

0

Jitter: a-periodicity of the voice source

Page 7: Voice source characteristics in speaker segregation

• Literature:

- McAdams (1989): natural jitter present in speaker’s voice may be

helpful for listeners

- Ellis (1993): segregate simultaneously presented vowels using

jitter differences alone, for a computational model

Page 8: Voice source characteristics in speaker segregation

How could jitter help listeners?

•Auditory Scene Analysis

- primitive segregations cues

bottom-up

involuntary listening

- schema-driven segegation cues (Bregman, 1990)

top-down

voluntary/effortful listening

Page 9: Voice source characteristics in speaker segregation

•Pitch =

primitive segregation cue

(Scheffers, 1983, Assmann & Summerfield, 1990 etc…)

+

schema-driven segregation cue

(Darwin et al, 2003)

Page 10: Voice source characteristics in speaker segregation

• Hypotheses:

0. jitter does not aid the auditory system

1. jitter is only a primitive segregation cue

2. jitter is a primitive cue AND schema-driven cue

3. jitter is only a schema-driven segregation cue

Page 11: Voice source characteristics in speaker segregation

• Experiments:

1. one double-vowel experiment with pitch as the experimental

factor to replicate earlier results for pitch as a primitive cue

2. one double-vowel experiment with jitter as the experimental

factor to establish if jitter is a primitive cue

3. An experiment like Darwin et al., with pitch and jitter as

factors to establish if jitter is a schema-driven cue

Page 12: Voice source characteristics in speaker segregation

• Experiment 1:

- Double-vowel experiment to test pitch effect

- Synthetic vowels (Klat 1990):

AH, EE, ER, OO, OR, 200 milliseconds

- five versions of each vowel:

100 Hz, +1/4 semitone (st), +1/2 st, +1 st, +2 st

Page 13: Voice source characteristics in speaker segregation

• Experiment 2:

- Double-vowel experiment to test jitter effect

- Synthetic vowels (Klat 1990) altered version:

AH, EE, ER, OO, OR, 200 milliseconds

- five versions of each vowel:

100 Hz, +/-1%, +/-2%, +/-4%, +/-8%

Page 14: Voice source characteristics in speaker segregation

• Procedure (1 & 2):

- 7 listeners (5 British-English, 2 bilingual)

- categorization pre-test (45 stimuli)

- experiment 1 (or 2):

presentation double vowel (125 combinations)

select one of 15 options

Page 15: Voice source characteristics in speaker segregation
Page 16: Voice source characteristics in speaker segregation

Results pitch

2626262626N =

Pitch

+2+1+1/2+ 1/4100 Hz

95

% C

I P

ER

CE

NT

70

60

50

40

Page 17: Voice source characteristics in speaker segregation

3030303030N =

Jitter

+/-8%+/-4%+/-2%+/-1%0

95

% C

I P

ER

CE

NT

70

60

50

40

Results jitter

Page 18: Voice source characteristics in speaker segregation

• Hypotheses:

0. jitter does not aid the auditory system

1. jitter is only a primitive segregation cue

2. jitter is a primitive cue AND schema-driven cue

3. jitter is only a schema-driven segregation cue

4. jitter is a primitive segregation cue if there is also a pitch

difference.

Page 19: Voice source characteristics in speaker segregation

1010101010N =

v2 = jit & f0 max

v1 = jitmax v2 = f0m

Jit max

F0 max

baseline

95

% C

I P

ER

CE

NT

80

70

60

50

40

Results jitter & pitch

Page 20: Voice source characteristics in speaker segregation

Is there still hope for jitter?

• Next experiment: test if jitter is schema-driven cue

Setup as in Darwin et al.:

2 sentences from same speaker presented simultaneously

attend to target sentence

report on target words

vary jitter and pitch of the sentences

Page 21: Voice source characteristics in speaker segregation