4
Frequency Effects on Perceptual Compensation for Coarticulation Alan C. L. Yu 1 , Ed King 1 , Morgan Sonderreger 2 1 Phonology Laboratory, Department of Linguistics, University of Chicago 2 Department of Computer Science, University of Chicago [email protected], [email protected], [email protected] Abstract Errors in compensating perceptually for effects of coarticulation in speech have been hypothesized as one of the major sources of sound change in language. Little research has elucidated the conditions under which such errors might take place. Using the paradigm of selective adaptation, this paper reports the results of a series of experiments testing for the effect of frequency on likelihood of perceptual compensation for coarticulation by lis- teners. The results suggest that perceptual compensation might be ameliorated (which might result in hypocorrection) or ex- aggerated (i.e. hypercorrection) depending on the relative fre- quency of the categories that are being perceived in their spe- cific coarticulated contexts. Index Terms: perceptual compensation, sound change, selec- tive adaptation. 1. Introduction A fundamental property of speech is its tremendous variability. Much research has shown that human listeners take such vari- ability into account in speech perception [1, 2, 3, 4, 5, 6]. Bed- dor and colleagues [7], for example, found that adult English and Shona speakers perceptually compensate for the coarticula- tory anticipatory raising of /a/ in C/i/ context and the anticipa- tory lowering of /e/ in C/a/ context. Both English and Shona listeners report hearing more /a/ in the context of a following /i/ than in the context of a following /a/. Many scholars have hy- pothesized that a primary source of systematic sound changes in language comes from errors in perceiving the intended speech signal [8, 9]. That is, errors in listeners’ classification of speak- ers’ intended pronunciation, if propogated, might result in sys- tematic changes in the sound systems of all speakers-listeners within the speech community. Ohala [8], in particular, argues that hypocorrective sound change (e.g., assimilation and vowel harmony) obtains when a contextual effect is misinterpreted as an intrinsic property of the segment (i.e. an increase in false positive in sound categorization). For example, an ambiguous /a/ token might be erroneously categorized as /e/ in the context of a following /i/ if the listeners fail to take into account of the anticipatory raising effect of /i/. If enough /a/ exemplars are misidentified as /e/, a pattern of vowel harmony might emerge. That is, the language will show a prevalence of mid vowels be- fore /i/ and low vowels before /a/. On the other hand, a hyper- corrective sound change (e.g., dissimilation) emerges when the listener erroneously attributes intended phonetic properties as contextual variation (i.e. an increase in false negative in sound identification). In this case, an ambiguous /e/ might be mis- classifed as /a/ in the context of a following /i/. If enough /e/ exemplars are misidentified as /a/ when followed by /i/, a dis- similatory pattern of only low vowels before high vowels and non-low vowels before low vowels might emerge. While much effort has gone into identifying the likely sources of such errors [10, 11, 8, 12, 9], little is known about the source of regularity in listener misperception that leads to the systematic nature of sound change. That is, why would random and haphazard mis- perception in an individual’s percept lead to systematic reorga- nization of the sound system within the individual and within the speech community? The present study demonstrates that the likelihood of listeners adjusting their categorization pattern contextually (i.e. perceptual compensation) may be affected by the frequencies of the sound categories occurring in the specific contexts. In particular, the present study expands on Beddor et al.’s work [7] on the perceptual compensation for vowel-to- vowel coarticulation in English, showing that the way English listeners compensate perceptually for the effect of regressive coarticulation from a following vowel (either /i/ or /a/) depends on the relative frequency of the coarticulated vowels (i.e. the relative frequency of /a/ and /e/ appearing before /i/ or /a/). The idea that category frequency information affects speech percep- tion is not new. Research on selective adaptation has shown that repeated exposure to a particular speech sound, say /s/, would shift the identification of ambiguous sounds, say sounds that are half-way between /s/ and /S/, away from the repeatedly pre- sented sound towards the alternative [13, 14, 15]. In perceptual learning studies, repeated exposure to an ambiguous sound, say a /s/-/f/ mixture, in /s/-biased lexical contexts induces retuned perception such that subsequent sounds are heard as /s/ even in lexically neutral contexts [16, 17]. The experiments report below extend Beddor et al. [7]’s findings by presenting three groups of participants with the same training stimuli but vary- ing the frequency with which they hear each token. The purpose of ths present study is to demonstrate that contextually-sensitive category frequency information can induce selective adaptation effects in perceptual compensation and to examine the implica- tions of such effects on theories of sound change. 2. Methods 2.1. Stimuli The training stimuli consisted of CV1CV2 syllables where C is one of /p, t, k/, V1 is either /a/ or /e/, and V2 is either /a/ or /i/. To avoid any vowel-to-vowel coarticulatory effect in the training stimuli, a phonetically-trained native English speaker (second author) produced each syllable of the training stimuli in isolation (/pa/ /pe/, /pi/, /ta/, /te/, ti/, /ka/, /ke/, /ki/). The train- ing disyllablic stimuli were assembled by splicing together the appropriate syllable and were resynthesized with a consistent intensity and pitch profile to avoid potential confound of stress. The test stimuli consisted of two series of /pV1pV2/ disyllables where V2 is either /a/ or /i/. The first syllable, pV1, is a 9-step continuum resynthesized in PRAAT by varing in F1, F2, and F3

Paper diomede

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Paper diomede

Frequency Effects on Perceptual Compensation for Coarticulation

Alan C. L. Yu1, Ed King1, Morgan Sonderreger2

1Phonology Laboratory, Department of Linguistics, University of Chicago2Department of Computer Science, University of Chicago

[email protected], [email protected], [email protected]

AbstractErrors in compensating perceptually for effects of coarticulationin speech have been hypothesized as one of the major sourcesof sound change in language. Little research has elucidated theconditions under which such errors might take place. Using theparadigm of selective adaptation, this paper reports the resultsof a series of experiments testing for the effect of frequency onlikelihood of perceptual compensation for coarticulation by lis-teners. The results suggest that perceptual compensation mightbe ameliorated (which might result in hypocorrection) or ex-aggerated (i.e. hypercorrection) depending on the relative fre-quency of the categories that are being perceived in their spe-cific coarticulated contexts.Index Terms: perceptual compensation, sound change, selec-tive adaptation.

1. IntroductionA fundamental property of speech is its tremendous variability.Much research has shown that human listeners take such vari-ability into account in speech perception [1, 2, 3, 4, 5, 6]. Bed-dor and colleagues [7], for example, found that adult Englishand Shona speakers perceptually compensate for the coarticula-tory anticipatory raising of /a/ in C/i/ context and the anticipa-tory lowering of /e/ in C/a/ context. Both English and Shonalisteners report hearing more /a/ in the context of a following /i/than in the context of a following /a/. Many scholars have hy-pothesized that a primary source of systematic sound changes inlanguage comes from errors in perceiving the intended speechsignal [8, 9]. That is, errors in listeners’ classification of speak-ers’ intended pronunciation, if propogated, might result in sys-tematic changes in the sound systems of all speakers-listenerswithin the speech community. Ohala [8], in particular, arguesthat hypocorrective sound change (e.g., assimilation and vowelharmony) obtains when a contextual effect is misinterpreted asan intrinsic property of the segment (i.e. an increase in falsepositive in sound categorization). For example, an ambiguous/a/ token might be erroneously categorized as /e/ in the contextof a following /i/ if the listeners fail to take into account of theanticipatory raising effect of /i/. If enough /a/ exemplars aremisidentified as /e/, a pattern of vowel harmony might emerge.That is, the language will show a prevalence of mid vowels be-fore /i/ and low vowels before /a/. On the other hand, a hyper-corrective sound change (e.g., dissimilation) emerges when thelistener erroneously attributes intended phonetic properties ascontextual variation (i.e. an increase in false negative in soundidentification). In this case, an ambiguous /e/ might be mis-classifed as /a/ in the context of a following /i/. If enough /e/exemplars are misidentified as /a/ when followed by /i/, a dis-similatory pattern of only low vowels before high vowels andnon-low vowels before low vowels might emerge. While much

effort has gone into identifying the likely sources of such errors[10, 11, 8, 12, 9], little is known about the source of regularityin listener misperception that leads to the systematic nature ofsound change. That is, why would random and haphazard mis-perception in an individual’s percept lead to systematic reorga-nization of the sound system within the individual and withinthe speech community? The present study demonstrates thatthe likelihood of listeners adjusting their categorization patterncontextually (i.e. perceptual compensation) may be affected bythe frequencies of the sound categories occurring in the specificcontexts. In particular, the present study expands on Beddoret al.’s work [7] on the perceptual compensation for vowel-to-vowel coarticulation in English, showing that the way Englishlisteners compensate perceptually for the effect of regressivecoarticulation from a following vowel (either /i/ or /a/) dependson the relative frequency of the coarticulated vowels (i.e. therelative frequency of /a/ and /e/ appearing before /i/ or /a/). Theidea that category frequency information affects speech percep-tion is not new. Research on selective adaptation has shown thatrepeated exposure to a particular speech sound, say /s/, wouldshift the identification of ambiguous sounds, say sounds thatare half-way between /s/ and /S/, away from the repeatedly pre-sented sound towards the alternative [13, 14, 15]. In perceptuallearning studies, repeated exposure to an ambiguous sound, saya /s/-/f/ mixture, in /s/-biased lexical contexts induces retunedperception such that subsequent sounds are heard as /s/ evenin lexically neutral contexts [16, 17]. The experiments reportbelow extend Beddor et al. [7]’s findings by presenting threegroups of participants with the same training stimuli but vary-ing the frequency with which they hear each token. The purposeof ths present study is to demonstrate that contextually-sensitivecategory frequency information can induce selective adaptationeffects in perceptual compensation and to examine the implica-tions of such effects on theories of sound change.

2. Methods

2.1. Stimuli

The training stimuli consisted of CV1CV2 syllables where Cis one of /p, t, k/, V1 is either /a/ or /e/, and V2 is either /a/or /i/. To avoid any vowel-to-vowel coarticulatory effect in thetraining stimuli, a phonetically-trained native English speaker(second author) produced each syllable of the training stimuli inisolation (/pa/ /pe/, /pi/, /ta/, /te/, ti/, /ka/, /ke/, /ki/). The train-ing disyllablic stimuli were assembled by splicing together theappropriate syllable and were resynthesized with a consistentintensity and pitch profile to avoid potential confound of stress.The test stimuli consisted of two series of /pV1pV2/ disyllableswhere V2 is either /a/ or /i/. The first syllable, pV1, is a 9-stepcontinuum resynthesized in PRAAT by varing in F1, F2, and F3

Page 2: Paper diomede

in equidistant steps from the abovementioned speaker’s /pa/ and/pe/ syllables. The original /pa/ and /pe/ syllables serve as theend points of the 9-step continuum.

2.2. Participants and procedure

The experiment consists of two parts: exposure and testing.Subjects were assigned randomly to three exposure conditions.One group was exposed to CeCi tokens four times more of-ten than to CeCa tokens and to CaCa tokens four times moreoften than to CaCi ones (the HYPER condition). The secondgroup was exposed to CeCa tokens four times more often thanto CeCi tokens and to CaCi tokens four times more often thanto CaCa ones (the HYPO condition). The final group was ex-posed to an equal number of /e/ and /a/ vowels preceding /i/ and/a/ (the BALANCED condition). See Table 1 for a summary offrequency distribution of exposure stimuli. The exposure stim-uli were presented over headphones automatically in randomorder in E-Prime in a sound-proof booth. Subjects performed aphoneme monitoring task during the exposure phase where theywere asked to press a response button when the word contains amedial /t/. Each subject heard 360 exposure tokens three times;a short break follows each block of 360 tokens. A total of forty-eight students at the University of Chicago, all native speakersof American English, participated in the experiment for coursecredit or a nominal fee. Eleven subjects took part in the HYPO

condition, sixteen subjects each participated in the HYPER con-dition and the BALANCED condition.

During the testing phase, subjects performed a 2-alternativeforce-choice task. The subject listened to a randomized set oftest stimuli and were asked to decide whether the first vowelsounds like /e/ or /a/.

3. AnalysisSubject’s responses (i.e. subject’s /a/ response rates) were mod-eled using a mixed-effect logistic regression. The model con-tains four fixed variables: TRIAL (1-180), CONTINUUM STEP

(1-9), EXPOSURE CONDITION (balanced, hyper, hypo) andVOCALIC CONTEXT (/a/ vs. /i/). The model also includes threetwo-way interactions: VOCALIC CONTEXT x STEP, STEP xCONDITION, and VOCALIC CONTEXT x CONDITION. In addi-tion, the model includes a by-subject random slopes for TRIAL.A likelihood ratio test comparing a model with a VOCALIC

CONTEXT x STEP x CONDITION as a three-way interactionterm and one without it shows that the added three-way interac-tion does not significantly improve model log-likelihood (χ2 =3.253, df = 2, Pr(> χ2) = 0.1966). Table 2 summarizes the pa-rameter estimate β for all fixed effects in the model, as well asthe estimate of their standard error SE(β), the associated Wald’sz-score, and the significance level. To eliminate collinearity,scalar variables were centered, while the categorical variableswere sum-coded.

Consistent with Beddor et al.’s findings, continuum step and

Table 1: Stimuli presentation frequency during the exposurephase. C = /p, t, k/

Type BALANCED HYPER HYPO

CeCi 90 144 36CeCa 90 36 144CaCi 90 36 144CaCa 90 144 36

2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Step

Pro

babi

lity

of ’a

a

v2

i

Vocalic Context x Step

Figure 1: Interaction between VOCALIC CONTEXT and STEP.The predictor variables were back-transformed to their originalscales in the figure.

vocalic contexts are both significant predictors of /a/ response.That is, listeners reported hearing less and less /a/ from the /a/-end of the continuum to the /e/-end of the continuum and theyheard more /a/ when the target vowel is followed by /i/ thanwhen it is followed by /a/. Specifically, the odds of hearing /a/before /i/ is 1.3 times that before /a/. The significant interac-tion between STEP and VOCALIC CONTEXT suggests that thevocalic context effect differs depending on where the test stimu-lus is along the /a/-/e/ continuum. As illustrated in Figure 1, theeffect of vocalic context is largest around steps 4-6 while identi-fication is close to ceiling at the two endpoints of the continuumregardless of vocalic contexts. Of particular interest here is thesignificant interaction between the exposure condition and vo-calic contexts. Figure 2 illustrates this interaction clearly; theeffect of vocalic context on /a/ response is influenced by thenature of the exposure data. When the exposure data containsmore CaCa and CeCi tokens than CaCi and CeCa tokens (i.e.the hyper condition), listeners report hearing more /a/ in the /i/context than in the /a/ context, compared to the response rate af-ter the balanced condition where the frequency of CaCa, CeCi,CaCi, and CeCa tokens are equal. On the other hand, in the hypocondition where listeners heard more CaCi and CeCa tokensthan CaCa and CeCi ones, listeners reported hearing less /a/ inthe /i/ context than in the /a/ context, the opposite of what is ob-served in both the balanced and hyper conditions. The modelalso shows a significant interaction between CONDITION andSTEP. As illustrated in Figure 3, the slope of the identificationfunction is the steepest after the hyper condition, but shallowestin the hypo condition.

4. Discussion and conclusion

The present study shows that the classification of vowels in dif-ferent prevocalic contexts is influenced by the relative frequencydistribution of the relevant vowels in specific contexts. For ex-ample, when /a/ frequently occurs before /a/, listeners are lesslikely to identify future instances of ambiguous /a/-/e/ vowelsas /a/ in the same context; listeners would report hearing more/e/ before /a/ if CaCa exemplars outnumber CeCa exemplars.Likewise, when /a/ occurs frequently before /i/, listeners wouldreduce their rate of identification of /a/ in the same context; lis-

Page 3: Paper diomede

Table 2: Estimates for all predictors in the analysis of listener response in the identification task.Predictor Coef.β SE(β) z pIntercept -0.0096 0.0674 -0.14 0.8867TRIAL -0.0008 0.0009 -0.87 0.3825STEP -0.9260 0.0185 -49.98 < 0.001 ***VOCALIC CONTEXT = a -0.2690 0.0316 -8.51 < 0.001 ***CONDITION = hyper -0.0594 0.0934 -0.64 0.5251CONDITION = hypo 0.0126 0.0950 0.13 0.8942STEP x VOCALIC CONTEXT = a 0.0372 0.0170 2.19 < 0.05 *STEP x CONDITION = hyper 0.0481 0.0247 1.95 0.0514STEP x CONDITION = hypo -0.2631 0.0296 -8.89 < 0.001 ***VOCALIC CONTEXT = a x CONDITION = hyper 0.0311 0.0432 0.72 0.4717VOCALIC CONTEXT = a x CONDITION = hypo -0.4146 0.0477 -8.70 < 0.001 ***

0.0

0.2

0.4

0.6

0.8

1.0

Condition

Pro

babi

lity

of ’a

Balanced Hyper Hypo

a

v2

i

Vocalic Context x Condition

Figure 2: Interaction between VOCALIC CONTEXT and CON-DITION.

teners would report hearing more /e/ before /i/ if CaCi tokens aremore prevalent than CeCi tokens. These results suggest that lis-teners exhibit selective adaptation when frequency informationof the target sounds varies in a context-specific fashion. That is,the repeated exposure to an adaptor (the more frequent variant)results in heighten identification of the alternative. This find-ing has serious implications for models of sound change thatafford a prominent role to listener misperception to account forsources of variation that lead to change.

To begin with, subjects in the hyper exposure condition ex-hibit what can be interpreted as hypercorrective behavior. Thatis, speech tokens that were classified as /a/ in the balancedcondition were being classified as /e/ in the hyper conditionwhen V2 = /a/; likewise, sounds that were classifed as /e/ inthe banalced condition were treated as /a/ in the hyper condi-tion when V2 = /i/. If this type of hypercorrective behaviorpersists, the pseudo-lexicon of the made-up language our sub-jects experienced would gradually develop a prevalence of di-syllabic “words” that do not allow in consecutive syllables twolow-vowels or two non-low vowels. This would represent a stateof vocalic height dissimilation, not unlike the pattern found inthe Vanuatu languages [18]. On the other hand, listeners in thehypo exposure condition exhibit what could be interpreted ashypocorrective behavior. That is, tokens that were classified as/e/ in the balanced condition were being classified as /a/ in theHYPO condition when V2 = /a/; likewise, vowels heard as /a/ in

2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Step

Pro

babi

lity

of ’a

Balanced

Con

ditio

n

Hyper

Hypo

Condition x Step

Figure 3: Interaction between CONDITION and STEP.

the balanced condition were heard as /e/ in the HYPO conditionwhen V2 = /i/. If this type of reclassification persists, listenersin the hypo condition would develop a pseudo-lexicon wherevowels in disyllabic words must agree in lowness and a stateof vocalic height harmony would obtain, similar to many casesfound in the Bantu languages of Africa [19].

Another ramification the present findings have for listener-misperception models of sound change concerns the role ofthe conditioning environment. That is, such models of soundchange often attribute misperception to listeners failing to detectthe contextual information properly and thus failing to prop-erly normalize for the effect of context on the realization ofthe sound in question. Here, our findings establish that system-atic “failure” of perceptual compensation take place despite thepresence of the coarticulatory source; perceptual compensation“failure” is interpreted here as whenever the context-specificidentification functions deviate from the canonical identificationfunctions observed in the balanaced condition. This findingechoes early findings that perceptual compensation may only bepartial under certain circumstances. Taken together, these find-ings suggest that failure to compensate perceptually for coar-ticulatory influence need not be the result of not detecting thesource of coarticulation. Listeners may exhibit behaviors of nottaking into account properly the role of coarticulatory contextshave on speech production and perception.

It is worth pointing out in closing that selective adaptationeffects have generally been attributed to adaptors fatiguing spe-cialized linguistic feature detectors [13], which suggests that the

Page 4: Paper diomede

neural mechanism that subserves speech perception may even-tually recuperate from adaptor fatigue and the selective adap-tation might dissipate. There is some evidence that selectiveadaptation effects are temporarily [20]. The lack of durativ-ity of selective adaptation raises doubt about its implication forsound change since sound change necessitates the longevity ofthe influencing factors. Additional research is underway to as-certain the longitudinal effects of selective adaptation. Suchdata will provide much needed information regarding the sig-nificance of selective adaptation effects on speech perceptionand sound change.

5. AcknowledgementsThis work is partially supported by National Science Founda-tion Grant BCS-0949754.

6. References[1] V. Mann, “Influence of preceding liquid on stopconsonant percep-

tion,” Perception & Psychophysics, vol. 28, no. 5, p. 40712, 1980.

[2] V. A. Mann and B. H. Repp, “Influence of vocalic context on per-ception of the [ ]-[s] distinction,” Perception & Psychophysics,vol. 28, pp. 213–228, 1980.

[3] J. S. Pardo and C. A. Fowler, “Perceiving the causes of coartic-ulatory acoustic variation: consonant voicing and vowel pitch.”Perception & Psychophysics, vol. 59, no. 7, pp. 1141–52, 1997.

[4] A. Lotto and K. Kluender, “General contrast effects in speech per-ception: effect of preceding liquid on stop consonant identifica-tion,” Perception & Psychophysics, vol. 60, no. 4, p. 60219, 1998.

[5] P. Beddor and R. A. Krakow, “Perception of coarticulatory nasal-ization by speakers of English and Thai: Evidence for partial com-pensation,” Journal of the Acoustical Society of America, vol. 106,no. 5, pp. 2868–2887, 1999.

[6] C. Fowler, “Compensation for coarticulation reflects gestureperception, not spectral contrast,” Perception & Psychophysics,vol. 68, no. 2, p. 161177, 2006.

[7] P. S. Beddor, J. Harnsberger, and S. Lindemann, “Language-specific patterns of vowel-to-vowel coarticulation: acoustic struc-tures and their perceptual correlates,” Journal of Phonetics,vol. 30, pp. 591–627, 2002.

[8] J. Ohala, “The phonetics of sound change,” in Historical Linguis-tics: Problems and Perspectives, C. Jones, Ed. London: Long-man Academic, 1993, pp. 237–278.

[9] J. Blevins, Evolutionary Phonology: the emergence of sound pat-terns. Cambridge: Cambridge University Press, 2004.

[10] J. Ohala, Sound change is drawn from a pool of synchronic varia-tion. Berlin: Mouton de Gruyter, 1989, pp. 173–198.

[11] ——, “The phonetics and phonology of aspects of assimilation,”in Papers in Laboratory Phonology I: Between the Grammar andthe Physics of Speech, J. Kingston and M. Beckman, Eds. Cam-bridge: Cambridge University Press, 1990, vol. 1, pp. 258–275.

[12] ——, “Towards a universal, phonetically-based, theory of vowelharmony,” ICSLP, Yokohama, vol. 3, pp. 491–494, 1994.

[13] P. Eimas and J. Corbit, “Selective adaptation of linguistic featuredetectors,” Cognitive Psychology, vol. 4, pp. 99– 109, 1973.

[14] P. D. Eimas and J. L. Miller, “Effects of selective adaptation ofspeech and visual patterns: Evidence for feature detectors,” in Per-ception and Experience, H. L. Pick and R. D. Walk, Eds. N.J.:Plenum, 1978.

[15] A. G. Samuel, “Red herring detectors and speech perception: Indefense of selective adaptation,” Cognitive Psychology, vol. 18,pp. 452–499, 1986.

[16] D. Norris, J. M. McQueen, and A. Cutler, “Perceptual learning inspeech,” Cognitive Psychology, vol. 47, no. 2, pp. 204–238, 2003.

[17] A. G. Samuel and T. Kraljic, “Perceptual learning for speech,”Attention, Perception, & Psychophysics, vol. 71, no. 6, pp. 1207–1218, 2009.

[18] J. Lynch, “Low vowel dissimilation in Vanuato languages,”Oceanic Linguistics, vol. 42, no. 2, pp. 359–406, 2003.

[19] F. B. Parkinson, “The representation of vowel height in phonol-ogy,” PhD dissertation, Ohio State University, 1996.

[20] J. Vroomen, S. van Linden, M. Keetels, B. de Gelder, and P. Ber-telson, “Selective adaptation and recalibration of auditory speechby lipread information: dissipation,” Speech Communication,vol. 44, p. 5561, 2004.