1
200 500 1000 2000 5000 10 20 30 40 Frequency (Hz) Energy (dB) English Mandarin 200 500 1000 2000 5000 -20 -10 0 10 20 30 40 Frequency (Hz) Energy (dB) MM ME EE MM ME - ME EE - MM EE - p ! 0.0003 We hypothesize three sources of systematic variation in L2 speech: 1) Language-specific interactions Dependent on the acoustic-phonetics of the specific L1 and L2 involved Basis for predicting ease of acquisition of particular structures and specific miscommunications (e.g. Perceptual Assimilation Model, Best, 1995) 2) Status-specific (native/L1 vs. non-native/L2) characteristics Processing difficulty inherent in speaking any L2 Basis for identification of native- versus foreign-accented speech and determination of L2 proficiency (e.g. Versant Test, 2011) 3) Speaker-specific characteristics Indexical characteristics that transcend language- and status-specificity Basis for cross-language talker identification (e.g. Winters, Levi, & Pisoni, 2008; Perrachione, Pierrehumbert & Wong, 2009; Perrachione, Del Tufo & Gabrieli, 2011) The ALLSSTAR project seeks evidence for these three sources of variability in a coherent data set. We focus on global acoustic properties which are independent of the system of linguistic contrasts of any particular language, and therefore observable across each of the three comparisons in the ALLSSTAR “triangle” approach (Fig. 1). 1. Background and Introduction 2. Methods Talker and Language Variation in the LTASS of English, Mandarin & Mandarin-accented English Lauren Ackerman, Lisa Hesterberg & Ann Bradlow Department of Linguistics, Northwestern University [email protected] Speech samples taken from the ALLSSTAR Corpus (Archive of L1 & L2 Scripted & Spontaneous Transcriptions & Recordings) • Both scripted and spontaneous speech in each speaker’s L1 & L2 • HINT sentences (Soli & Wong, 2008) • Extract from The Little Prince (de Saint-Exupéry, 1943) • Universal Declaration of Human Rights • The North Wind and the Sun passage (IPA Handbook, 1999) • Two spontaneous prompts: Q&A (5-minute monologue answering open- ended questions, 2 picture stories) • To date: 87 talkers representing a total of 20 languages (diff. n/language) Present analysis: Three language/status conditions: • Mandarin-English bilinguals in L1 and L2 (MM and ME) (n=14) • English monolinguals in L1 only (EE) (n=20) One speech material: The North Wind and Sun (NWS) passage LTASS analysis: • PRAAT script that averages energy in 50 Hz bins (i.e. 200-249, etc.) • Under 200 Hz excluded for confound with F0, analyzed separately Acknowledgements Chun Liang Chan for the speech database “toolbox” NU International Summer Institute staff & participants Grant funding from NIDCD (R01DC005794) The study focuses on the long-term average speech spectrum (LTASS, the energy by frequency function averaged across time): Language-specificity: Conflicting evidence on whether languages differ in terms of LTASS No major differences (Byrne et al. 1994, McCullough et al. 1993) Different frequency ranges for optimal speech perception (Yang et al. 2008) Status-specificity: Unclear whether L1 and L2 speech differ in terms of LTASS, but there is some evidence that talkers can control LTASS. Mid-frequency energy (1000-3000 Hz) higher in English clear than conversational speech (Hazan & Markham, 2004) raising the possibility of systematic L1 vs. L2 differences within a speaker. Speaker-specificity: LTASS is a reliable cue to talker identity (e.g. Rose 2002) 3. Results References Byrne, D., Dillon, H., & Khanh, T. (1994). An international comparison of long-term average speech spectra. J. Acoust. Soc. Am., 96(4), 2108-2120. Handbook of the International Phonetic Alphabet: a guide to the use of the International Phonetic Alphabet. (1999). Cambridge: Cambridge University Press. Hazan, V., & Markham, D. (2004). Acoustic-phonetic correlates of talker intelligibility in adults and children. J. Acoust. Soc. Am., 116(5), 3108-3118. Ladefoged, P. (2005). Vowels and Consonants (2nd ed.). Maiden, MA: Blackwell. Recordings retrieved from: http://www.phonetics.ucla.edu/course/chapter1/ chapter1.html McCullough, A., Tu, C., & Lew, H. L. (1993). Speech-spectrum analysis of Mandarin: Implications for hearing-aid fitting in a multi-ethnic society. Journal of the American Academy of Audiology, 4, 50-52. Rose, P. (2002). Forensic Speaker Identification. New York: Taylor & Francis. Yang, L., Zhang, J., & Yan, Y. (2008). An Improved STI Method for Evaluating Mandarin Speech Intelligibility. Paper presented at the International Conference on Audio, Processing, Shanghai. Versant Test of Spoken English (2011). Palo Alto: Pearson Education, Inc. 4. Conclusions 1) Cross-language variation 2) English across native vs. non-native speakers 3) L1 and L2 within individual bilinguals L1-Eng L2-Eng L1-Other Figure 1: The ALLSSTAR “triangle” approach Language-specificity (Figure 2): Mandarin L1 (MM) vs. English L1 (EE) Significant differences in some parts of the frequency range, mostly >4000 Hz. Question: Do these inter-talker/language differences stem from the phoneme compositions of the Mandarin and English NWS passages or from inter-talker differences? Figure 3: NWS model constructed from individual IPA phonemes by a single talker (Ladefoged, 2005). Phonemes spliced together to reflect counts of phones in NWS. Isolates the influence of phoneme composition while controlling talker characteristics. Focus on >4000 Hz region (region of significant diff in MM vs. EE) No differences between English and Mandarin in the single talker model. MM vs. EE differences likely stem from speaker-specificity (inter-talker differences) more than from language-specificity. Status-specificity: English L2 (ME) vs. English L1 (EE) Few significant differences across the frequency spectrum; all of which coincide with the talker/language differences evident in the MM vs EE comparison. Keeping the language constant mitigates some, but not all, of the observed L1 vs. L1 (MM vs. EE) differences, suggesting contributions from both talker and language to LTASS. Question : Do the L2 vs. L1 differences stem from differences in L2 proficiency? We examined correlations between Versant scores (Versant Test, 2011) and mean energy in two frequency ranges (higher energy = more English-like): 1800-2100 Hz: r= 0.50, p<.05 (frequency range where MM & ME differ from EE) 4250-8000 Hz: r= 0.54, p<.05 (frequency range where only MM differs from EE) Suggests a possible contribution of language-specificity, not just talker-specificity to the observed MM vs. EE LTASS differences. More proficient L2 talkers produce L2 English LTASSs closer to L1 English LTASSs. Speaker-specificity: English L2 (ME) vs. Mandarin L1 (MM) No significant differences. Within-speaker correlation for LTASS, r=0.89, p<0.0001. Keeping the talker constant eliminates most LTASS differences. Figure 2: LTASS for all subjects by language Figure 3: LTASS for single-talker model All comparisons suggest that the overwhelming source of variability in the LTASS is speaker-specificity with relatively small contributions from language-specificity and status-specificity . This suggests that LTASS is a reliable cue for language- independent talker identification across a bilingual’s two languages. In contrast, since LTASS is relatively stable within an individual across languages and across native/L1 and non-native/L2 speech, it is not a reliable cue for language identification or for L2 proficiency.

Talker and Language Variation in the LTASS of English ...gradstudents.wcas.northwestern.edu/~lma777/documents/ltass_lsa2012.pdf• Mandarin-English bilinguals in L1 and L2 (MM and

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Talker and Language Variation in the LTASS of English ...gradstudents.wcas.northwestern.edu/~lma777/documents/ltass_lsa2012.pdf• Mandarin-English bilinguals in L1 and L2 (MM and

200 500 1000 2000 5000

10

20

30

40

Frequency (Hz)

Energ

y (

dB

)

EnglishMandarin

200 500 1000 2000 5000

-20

-10

010

20

30

40

Frequency (Hz)

Energ

y (

dB

)

MM

ME

EE

MM ME-

ME EE-

MM EE-

p ! 0.0003

We hypothesize three sources of systematic variation in L2 speech:

1) Language-specific interactions   Dependent on the acoustic-phonetics of the specific L1 and L2 involved   Basis for predicting ease of acquisition of particular structures and specific miscommunications (e.g. Perceptual Assimilation Model, Best, 1995)

2) Status-specific (native/L1 vs. non-native/L2) characteristics  Processing difficulty inherent in speaking any L2 Basis for identification of native- versus foreign-accented speech and determination of L2 proficiency (e.g. Versant Test, 2011)

3) Speaker-specific characteristics   Indexical characteristics that transcend language- and status-specificity Basis for cross-language talker identification (e.g. Winters, Levi, & Pisoni, 2008; Perrachione, Pierrehumbert & Wong, 2009; Perrachione, Del Tufo & Gabrieli, 2011)

The ALLSSTAR project seeks evidence for these three sources of variability in a coherent data set. We focus on global acoustic properties which are independent of the system of linguistic contrasts of any particular language, and therefore observable across each of the three comparisons in the ALLSSTAR “triangle” approach (Fig. 1).

1. Background and Introduction 2. Methods

Talker and Language Variation in the LTASS of English, Mandarin & Mandarin-accented English Lauren Ackerman, Lisa Hesterberg & Ann Bradlow

Department of Linguistics, Northwestern University [email protected]

Speech samples taken from the ALLSSTAR Corpus (Archive of L1 & L2 Scripted & Spontaneous Transcriptions & Recordings)

• Both scripted and spontaneous speech in each speaker’s L1 & L2 •  HINT sentences (Soli & Wong, 2008) •  Extract from The Little Prince (de Saint-Exupéry, 1943) •  Universal Declaration of Human Rights •  The North Wind and the Sun passage (IPA Handbook, 1999) •  Two spontaneous prompts: Q&A (5-minute monologue answering open-ended questions, 2 picture stories)

•  To date: 87 talkers representing a total of 20 languages (diff. n/language)

Present analysis: Three language/status conditions:

• Mandarin-English bilinguals in L1 and L2 (MM and ME) (n=14) • English monolinguals in L1 only (EE) (n=20)

One speech material: The North Wind and Sun (NWS) passage

LTASS analysis: • PRAAT script that averages energy in 50 Hz bins (i.e. 200-249, etc.) • Under 200 Hz excluded for confound with F0, analyzed separately

Acknowledgements •  Chun Liang Chan for the speech database “toolbox” •  NU International Summer Institute staff & participants •  Grant funding from NIDCD (R01DC005794)

The study focuses on the long-term average speech spectrum (LTASS, the energy by frequency function averaged across time): Language-specificity: Conflicting evidence on whether languages differ in terms of LTASS

•  No major differences (Byrne et al. 1994, McCullough et al. 1993) •  Different frequency ranges for optimal speech perception (Yang et al. 2008)

Status-specificity: Unclear whether L1 and L2 speech differ in terms of LTASS, but there is some evidence that talkers can control LTASS.

•  Mid-frequency energy (1000-3000 Hz) higher in English clear than conversational speech (Hazan & Markham, 2004) raising the possibility of systematic L1 vs. L2 differences within a speaker.

Speaker-specificity: LTASS is a reliable cue to talker identity (e.g. Rose 2002)

3. Results

References Byrne, D., Dillon, H., & Khanh, T. (1994). An international comparison of long-term

average speech spectra. J. Acoust. Soc. Am., 96(4), 2108-2120. Handbook of the International Phonetic Alphabet: a guide to the use of the International

Phonetic Alphabet. (1999). Cambridge: Cambridge University Press. Hazan, V., & Markham, D. (2004). Acoustic-phonetic correlates of talker intelligibility in

adults and children. J. Acoust. Soc. Am., 116(5), 3108-3118. Ladefoged, P. (2005). Vowels and Consonants (2nd ed.). Maiden, MA: Blackwell.

Recordings retrieved from: http://www.phonetics.ucla.edu/course/chapter1/chapter1.html

McCullough, A., Tu, C., & Lew, H. L. (1993). Speech-spectrum analysis of Mandarin: Implications for hearing-aid fitting in a multi-ethnic society. Journal of the American Academy of Audiology, 4, 50-52.

Rose, P. (2002). Forensic Speaker Identification. New York: Taylor & Francis. Yang, L., Zhang, J., & Yan, Y. (2008). An Improved STI Method for Evaluating Mandarin

Speech Intelligibility. Paper presented at the International Conference on Audio, Processing, Shanghai.

Versant Test of Spoken English (2011). Palo Alto: Pearson Education, Inc.

4. Conclusions

1) Cross-language variation

2) English across native vs. non-native speakers

3) L1and L2 within individual bilinguals

L1-Eng

L2-Eng

L1-Other

Figure 1: The ALLSSTAR “triangle” approach

Language-specificity (Figure 2): Mandarin L1 (MM) vs. English L1 (EE) Significant differences in some parts of the frequency range, mostly >4000 Hz.

Question: Do these inter-talker/language differences stem from the phoneme compositions of the Mandarin and English NWS passages or from inter-talker differences?

Figure 3: NWS model constructed from individual IPA phonemes by a single talker (Ladefoged, 2005).

Phonemes spliced together to reflect counts of phones in NWS. Isolates the influence of phoneme composition while controlling talker characteristics.

Focus on >4000 Hz region (region of significant diff in MM vs. EE) No differences between English and Mandarin in the single talker model. MM vs. EE differences likely stem from speaker-specificity (inter-talker differences) more than from language-specificity.

Status-specificity: English L2 (ME) vs. English L1 (EE) Few significant differences across the frequency spectrum; all of which coincide with the talker/language differences evident in the MM vs EE comparison. Keeping the language constant mitigates some, but not all, of the observed L1 vs. L1 (MM vs. EE) differences, suggesting contributions from both talker and language to LTASS.

Question: Do the L2 vs. L1 differences stem from differences in L2 proficiency?

We examined correlations between Versant scores (Versant Test, 2011) and mean energy in two frequency ranges (higher energy = more English-like):

1800-2100 Hz: r= 0.50, p<.05 (frequency range where MM & ME differ from EE) 4250-8000 Hz: r= 0.54, p<.05 (frequency range where only MM differs from EE) Suggests a possible contribution of language-specificity, not just talker-specificity to the observed MM vs. EE LTASS differences. More proficient L2 talkers produce L2 English LTASSs closer to L1 English LTASSs.

Speaker-specificity: English L2 (ME) vs. Mandarin L1 (MM)

No significant differences. Within-speaker correlation for LTASS, r=0.89, p<0.0001. Keeping the talker constant eliminates most LTASS differences.

Figure 2: LTASS for all subjects by language

Figure 3: LTASS for single-talker model

•  All comparisons suggest that the overwhelming source of variability in the LTASS is speaker-specificity with relatively small contributions from language-specificity and status-specificity.

•  This suggests that LTASS is a reliable cue for language-independent talker identification across a bilingual’s two languages.

•  In contrast, since LTASS is relatively stable within an individual across languages and across native/L1 and non-native/L2 speech, it is not a reliable cue for language identification or for L2 proficiency.