12
Speech masking. II: Simultaneous masking thresholds under "naturalistic" listening conditions Murray Spiegel Bellcore, Room2E-252, 445 South Street, Box 1910,Morristown, New Jersey07960 (Received18February 1988; accepted for publication11August 1988) This article investigates the role of listening conditions in determining thresholds for probe tones masked by natural speech. These thresholds are of interest because they are a sensitive probeof the activity profile,or spectrum, of sounds such as speech in the auditory system. Most human performance tests are carried out under highly artificial listening conditions, which may not reflecthow people listento speech in commonlistening environments. In this study,reference conditions (similar to minimal uncertaintylistening conditions usedin many performance tests) were compared to a "naturalistic"listening condition and to another, intermediate, condition. In the naturalistic listening condition, listeners did not know the frequency or the position of probetones; additionally,they were requiredto attend to the semantic content of sentences. In the reference condition, listeners knew the frequency and position of probe tones masked by single syllables. Average thresholds wereelevated by 4 dB in the naturalistic listening condition with respect to the reference condition, and thresholds tended to be elevated more for higher-frequency probetones. The results providepreviously unknowninformationaboutthe resolution of speech sounds in the auditory system during speech comprehension. PACS numbers: 43.71.Cq, 43.71.Es, 43.66.Dc INTRODUCTION We most often listen to speech under conditions very different from those of the laboratory: In addition to the acousticvariability inherent to natural listening environ- ments, the speech we usually hear consists of sentences that we try to understand but cannotpredict perfectlyand also contains distortions and interference that we try to ignore.In the laboratory, many performance tests reverse the condi- tionsof everyday experience. Listeners often hear repeated samples of thesame utterance andarenot required to extract the meaning of the speech eachtime, they often know the characteristics of the sounds that will be used in a block of trials, and they frequently are askedto attend to the very same distortions or interference that may be ignoredduring everyday speech processing. Of course, laboratoryrestric- tionspermit the studyof subtle effect without confounding. One topicof continuing interest is the masking properties of speech (see Spiegel, 1987a, for a recent review).Masking isa phenomenon in which one sound (the speech)interferes with the perception of another (e.g., a probe tone). The threshold for a tone, as a function of frequency, is usedas a sensitive probe of the activity profilein the auditorysystem created in response to speech; that is, thresholds for probe tones reflect aspects of the internal spectrum of speech. Is it possible, however,that some of the resultsof laboratory studies may not accurately reflecttheir actual values during real-lifespeech perception? Could something asbasic asthe internal spectrum of speech, which one hypothesizes is re- flected in its masking effects, be influenced by listening con- ditions? There is evidence that some of the differences between most laboratory conditionsand everyday experience may indeed affect our estimates of humanperformance. Many of these observations come from studies in which the amount of masking increases when listeners cannotaccurately predict the stimuli. Studies of the effects of uncertainty,including signal-frequency uncertainty (e.g., Tanner et al., 1956; Green, 1961; Greenberg and Larkin, 1968; Spiegel and Green, 1982; Buus et al., 1986), temporal uncertainty (Green and Weber, 1980), and masker uncertainty (e.g., Watson et al., 1976;Watson and Kelly, 1978;Spiegel et al., 1981; Neff and Green, 1987; Watson, 1987) show that ' thresholds can be elevated by 3 dB to over 30 dB, the exact amountdepending on the amountand type of uncertainty. While the effects of uncertaintyare likely to be an insuffi- cient model for the differences between laboratoryand real- life conditions, these results suggest that some measure- ments of auditory speech profiles may not accurately reflect, at least quantitatively, the processing used by a listener en- gaged in speech understanding. Thereis a paucity of dataon the influence of other factors that distinguish the laboratory from everyday listening conditions. The current study assessed the effects of differences in listening conditions on speech masking thresholds. The goal was to measure simultaneous masking thresholds for brief probetones that were "incidentally"detected by subjects while they werelistening to, understanding, and makingde- cisions about themeaning of complete sentences. This meth- od necessitated a compromise between the usual conditions of both laboratory studiesand realistic listening environ- ments. On the one hand, the requirement to measure a thresholdmeant that stimuli had to be repeated, and be re- peated often, a condition unlike everyday experience. On the other hand, the requirement that the task include under- 2033 J. Acoust. Soc. Am.84 (6), December 1988 0001-4966/88/122033-12500.80 @ 1988Acoustical Society of America 2033 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

  • Upload
    murray

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

Speech masking. II: Simultaneous masking thresholds under "naturalistic" listening conditions

Murray Spiegel Bellcore, Room 2E-252, 445 South Street, Box 1910, Morristown, New Jersey07960

(Received 18 February 1988; accepted for publication 11 August 1988)

This article investigates the role of listening conditions in determining thresholds for probe tones masked by natural speech. These thresholds are of interest because they are a sensitive probe of the activity profile, or spectrum, of sounds such as speech in the auditory system. Most human performance tests are carried out under highly artificial listening conditions, which may not reflect how people listen to speech in common listening environments. In this study, reference conditions (similar to minimal uncertainty listening conditions used in many performance tests) were compared to a "naturalistic" listening condition and to another, intermediate, condition. In the naturalistic listening condition, listeners did not know the frequency or the position of probe tones; additionally, they were required to attend to the semantic content of sentences. In the reference condition, listeners knew the frequency and position of probe tones masked by single syllables. Average thresholds were elevated by 4 dB in the naturalistic listening condition with respect to the reference condition, and thresholds tended to be elevated more for higher-frequency probe tones. The results provide previously unknown information about the resolution of speech sounds in the auditory system during speech comprehension.

PACS numbers: 43.71.Cq, 43.71.Es, 43.66.Dc

INTRODUCTION

We most often listen to speech under conditions very different from those of the laboratory: In addition to the acoustic variability inherent to natural listening environ- ments, the speech we usually hear consists of sentences that we try to understand but cannot predict perfectly and also contains distortions and interference that we try to ignore. In the laboratory, many performance tests reverse the condi- tions of everyday experience. Listeners often hear repeated samples of the same utterance and are not required to extract the meaning of the speech each time, they often know the characteristics of the sounds that will be used in a block of

trials, and they frequently are asked to attend to the very same distortions or interference that may be ignored during everyday speech processing. Of course, laboratory restric- tions permit the study of subtle effect without confounding. One topic of continuing interest is the masking properties of speech (see Spiegel, 1987a, for a recent review). Masking is a phenomenon in which one sound (the speech) interferes with the perception of another (e.g., a probe tone). The threshold for a tone, as a function of frequency, is used as a sensitive probe of the activity profile in the auditory system created in response to speech; that is, thresholds for probe tones reflect aspects of the internal spectrum of speech. Is it possible, however, that some of the results of laboratory studies may not accurately reflect their actual values during real-life speech perception? Could something as basic as the internal spectrum of speech, which one hypothesizes is re- flected in its masking effects, be influenced by listening con- ditions?

There is evidence that some of the differences between

most laboratory conditions and everyday experience may

indeed affect our estimates of human performance. Many of these observations come from studies in which the amount of

masking increases when listeners cannot accurately predict the stimuli. Studies of the effects of uncertainty, including signal-frequency uncertainty (e.g., Tanner et al., 1956; Green, 1961; Greenberg and Larkin, 1968; Spiegel and Green, 1982; Buus et al., 1986), temporal uncertainty (Green and Weber, 1980), and masker uncertainty (e.g., Watson et al., 1976; Watson and Kelly, 1978; Spiegel et al., 1981; Neff and Green, 1987; Watson, 1987) show that ' thresholds can be elevated by 3 dB to over 30 dB, the exact amount depending on the amount and type of uncertainty. While the effects of uncertainty are likely to be an insuffi- cient model for the differences between laboratory and real- life conditions, these results suggest that some measure- ments of auditory speech profiles may not accurately reflect, at least quantitatively, the processing used by a listener en- gaged in speech understanding. There is a paucity of data on the influence of other factors that distinguish the laboratory from everyday listening conditions.

The current study assessed the effects of differences in listening conditions on speech masking thresholds. The goal was to measure simultaneous masking thresholds for brief probe tones that were "incidentally" detected by subjects while they were listening to, understanding, and making de- cisions about the meaning of complete sentences. This meth- od necessitated a compromise between the usual conditions of both laboratory studies and realistic listening environ- ments. On the one hand, the requirement to measure a threshold meant that stimuli had to be repeated, and be re- peated often, a condition unlike everyday experience. On the other hand, the requirement that the task include under-

2033 J. Acoust. Soc. Am. 84 (6), December 1988 0001-4966/88/122033-12500.80 @ 1988 Acoustical Society of America 2033

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 2: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

standing of the speech and not be primarily a detection task is unusual for a psychoacoustic experiment. These goals were achieved by adopting the procedure of split-attention tasks: In addition to the detection response, listeners also made a true/false decision about the sentences they heard, which guaranteed that a modicum of cognitive processing was being applied to each sentence. Highly accurate true/ false responses ensure that the listeners do not ignore the speech stimuli in their search for the probe tone. Because the true/false task requires only a simple binary choice, it prob- ably causes little motor and decision interference with the detection task. Furthermore, a split-attention task is not un- natural for speech perception. We may notice a speaker's emotion, state of health, or other aspects of vocal quality, yet our observation of these features rarely interferes with our process of speech understanding. The task is unnatural mainly in the requirement to respond to an "extralinguistic" factor, i.e., the presence of a probe tone.

These experiments are an extension of a recently pub- lished study that measured masking effects in natural speech (Spiegel, 1987a). In that study, single vowel-consonant- vowel (VCV) syllables were used to explore the relative roles of simultaneous and nonsimultaneous masking. The current s[udy measured primarily the effect of listening con- ditions on simultaneous masking thresholds and did not fo- cus on nonsimultaneous masking effects. (For one masker in the reference condition for experiment 1, several probe-tone positions were tested to demonstrate some of the nonsimul- taneous masking effects of a moving formant peak. )

More importantly, the current study is an extension of the earlier study to more realistic listening conditions. In the previous study, the listeners were knowledgeable about the signal frequency, its temporal position, and the speech mask- er---conditions quite similar to most psychophysical experi- ments and termed reference conditions in this article. In the present study, listeners were presented with laboratory sim- ulations of the listening conditions of everyday experience. Our first attempt at this simulation, described in experiment 1 as an intermediate level of uncertainty, did not obtain ap- preciable elevation of thresholds. Because the first experi- ment offered the listeners many opportunities to learn the test variables, a second experiment was run to study a higher level of uncertainty (and slightly better simulation of natu- ralistic listening). Those conditions obtained an average threshold elevation of 4 dB from the reference condition.

Compared to the actual thresholds operating when people listen to speech in natural environments, those measured in the current study are likely to be lower because of the con- straints required of the current study to obtain reliable per- formance estimates.

TABLE I. The 48 true/false sentences used in both experiments. Sentences (a) and (b) and their italicized words were used in the control experiments. The italicized vowels in the main 48 sentences represent the 110 potential probe-tone locations in experiment 2 and do not correspond to the locations chosen in experiment 1 (see Fig. 1 caption).

(1) (2) (3) (4) (5) (6) (7) (8) (9)

(10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) (46) (47) (48)

Sneaky crooks do rob ordinary homeowners. Sneaky crooks don't rob ordinary homeowners. Sneaky crool(s are robbed by ordinary homeowners. Sneaky crooks aren't robbed by ordinary homeowners. Ordinary homeowners do rob sneaky crooks. Ordinary homeowners don't rob sneaky crooks. Ordinary homeowners are robbed by sneaky crooks. Ordinary homeowners aren't robbed by sneaky crooks. Disobedient kids do devour our greasy candies. Disobedient kids don't devour our greasy candies. Disobedient kids are devoured by our greasy candies. Disobedient kids aren't devoured by our greasy candies. Our greasy candies do devour disobedient kids. Our greasy candies don't devour disobedient kids. Our greasy candies are devoured by disobedient kids. Our greasy candies aren't devoured by disobedient kids. Black cats do usually chase grey dogs. Black cats don't usually chase grey dogs. Black cats are usually chased by grey dogs. Black cats aren't usually chased by grey dogs. Grey dogs do usually chase black cats. Grey dogs don't usually chase black cats. Grey dogs are usually chased by black cats. Grey dogs aren't usually chased by black cats. Rich executives do boss around poor workers. Rich executives don't boss around poor workers. Rich executives are bossed around by poor workers. Rich executives aren't bossed around by poor workers. Poor workers do boss around rich executives.

Poor workers don't boss around rich executives.

Poor workers are bossed around by rich executives. Poor workers aren't bossed around by rich executives. Healthy swimmers do use olympic pools. Healthy swimmers don't use olympic pools. Healthy swimmers are used by olympic pools. Healthy swimmers aren't used by olympic pools. Olympic pools do use healthy swimmers. Olympic pools don't use healthy swimmers. Olympic pools are used by healthy swimmers. Olympic pools aren't used by healthy swimmers. Fruit trees do grow sweet apples. Fruit trees don't grow sweet apples. Fruit trees are grown by sweet apples. Fruit trees aren't grown by sweet apples. Sweet apples do grow fruit trees. Sweet apples don't grow fruit trees. Sweet apples are grown by fruit trees. Sweet apples aren't grown by fruit trees.

Control sentences

(a) Trains do travel rapidly. (b) Trains don't travel rapidly.

I. METHOD

A. Stimuli

Forty-eight true/false sentences were constructed along the guidelines of Levitt (1984). The sentences, shown in Ta- ble I, are based on six topics and differ within topic merely in word order, negation, and verb voice (active versus passive). With this design, listeners cannot simply hear the sentence

topic (or word order) and from that automatically garner its truth value. Subjects had to listen to most of each sentence to respond accurately. Using similar sentences, Levitt found little learning after the first presentation of the entire set of sentences. The listeners were shown the complete list of sen- tences immediately prior to their use in experiment 1. The

2034 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2034

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 3: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

experiment proceeded only after the truth value of all sen- tences was obvious to each listener.

All sentences were recorded by a native male speaker of English during a single recording session. The sentences were read with a conscious effort to keep intonation con- tours similar. This made the sentences monotonous to listen

to, but the goal was to eliminate any extraneous or artifac- tual confounding of intonation and semantics. Since listen- ers in this study heard each sentence at least 500 times, the experimental design would be undermined if the true/false response could be determined by, for instance, an unusual starting pitch.

The sentences were recorded in an acoustically isolated chamber, low-pass filtered at 4 kHz, and sampled with a 16- bit A/D converter at 10 000 samples per second. The probe tones were created digitally. Speech and probe tones were output through separate channels of a 16-bit D/A converter and low-pass filtered at 4 kHz; the two channels were mixed after the probe tone passed through a computer-controlled attenuator. Because speech utterances, probe-tone frequen- cy, and probe-tone position were randomized for some con- ditions in this study, the computer had to merge the separate probe tone and speech files into a stereo signal in real time. When no signal was called for, the computer merged a silent file with the speech utterance to avoid any timing artifacts associated with signal presentation. The sentences were pre- sented at an average level of 65-70 dBA SPL.

Thresholds were determined for a sinusoid with 15-ms

duration between the half-power points. The total probe- tone dura•tion of 20 ms included 5-ms quarter-cycle sine- shaped onset and offset ramps. The frequency of the tone burst varied from 300 to 2800 Hz. The specific frequencies selected for the probe tones depended on the spectra of the particular vowel maskers; the number of test frequencies de- pended on the listening conditions, which are described be- low. The starting phase of the probe tone was randomized between presentations to eliminate phase effects (Spiegel, 1987a). Starting phases varied from 0 to 300 deg in 60-deg increments.

B. General procedure

The three listeners, ranging in age from 17-37 years, all were employed by Bellcore; one was highly experienced in psychoacoustics experiments (the author), while the others had several weeks' training before data collection. All had audiometrically normal hearing below 4 kHz in their test

ears. Individual listeners sat in a sound-isolated booth and

listened to the speech utterances (maskers) plus probe tone (signal) through the left headphone of Yamaha YH- 100 headsets. Each listener sat before a multiple-key response box and entered detection responses using one pair of but- tons and true/false responses I with a separate pair of buttons located beneath the other pair. After each trial, feedback was provided using lights located above each button to indicate the correct responses. Total trial length was determined by the length of the speech utterances, loading and clearing times for the converter buffer, and response and feedback time. The sentences averaged 3.0 s in length, with a mini- mum of 2.2 s and a maximum of 3.9 s. The trial duration was about 9 s when sentences were used and less than 3 s when

single. syllables were used. Rests were offered every 4.5-5 min, and sessions lasted about 1/2 h. Typically two or three sessions were run each day for each listener.

1. Sit•gle-interval procedure

Because of an unusual response bias described in Ap- pendix A, the data for the main experiments were obtained with an adaptive, single-interval procedure. Each trial con- tained only one masker, and a signal was added randomly on only one-half of the thais. The listeners responded by select- ing one of the buttons labeled "yes" or "no" to indicate whether they thought the trial contained a signal. An adap- tive single-interval (yes/no) procedure began by seeking thresholds with an initial step size of 5 dB, which changed to the final step size of 1.5 dB after the first two reversals. Changes in signal level were based on responses to signal trials (probability = 0.5) only: Two correct responses ("yes") in a row dropped signal level for a particular condi- tion, one incorrect (a "no") raised signal level. The signal level attained after the second reversal at the final step size was then used as the middle level for a method of constant

stimuli, using signal levels of d- 6, 0, and -- 6 dB re: this level. These initial signal levels were determined by an adap- tive procedure that ignored the false-alarm rate to seek levels rapidly (i.e., the responses to no-signal trials were discard- ed). At this point, false-alarm rates were monitored to deter- mine if additional adjustments of signal levels were required. After 5 signal trials had been presented at each signal level (approximately 30 trials), a heuristic based on the d' for each signal level adjusted the signal levels, if necessary. The most successful adjustment rules are shown in Table Ii and are dependent on psychometric function slope and the inter-

TABLE II. Signal level adjustment rules. FAR stands for false-alarm rate.

Increase levels when: Decrease levels when:

d•i (1 •2 •4 >4 I >1.6 ....... d•ed ...... •0.5 >2 I > 4a >4 .-- d •ow ß ........ • 1.6 > 1.6 • > 3

by: by: +10 dB +6 dB +3 dB --3 dB --6 dB --10 dB--10*(FAR --0.4)

a This rule was required because nonmonotonic psychometric functions occasionally resulted from too-few trial performance estimates.

2035 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2035

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 4: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

val between signal levels. A test run was terminated when each condition obtained at least 15 signal trials at each of the (adjusted) three signal levels. 2 Thresholds for d'-- 1.09 [ P(c) max = 70.7 % ] were estimated from a linear fit of d' to signal level; automatic heuristics based on the average psy- chometric function slope handled special cases with fewer than three data points within the range 0 • d' • 4.

Typically, the averages for three-four runs across all three listeners will be shown. Unmasked thresholds were at

least 10 dB below the lowest masked thresholds obtained for

each listener in both experiments.

C. Conditions

There were three listening conditions in this study: an intermediate-uncertainty condition (experiment 1 ), a high- er uncertainty condition (experiment 2), and, for each ex- periment, a reference condition. The refere,nce conditions used single syllables extracted digitally from the 48 sen- tences. One of these syllables was used as the masker for a session. In the reference sessions for experiment 2, only one frequency was used for the probe tone. The procedures used in this condition have been termed "minimal uncertainty" (Watson et al., 1976). Thresholds measured under these conditions are representative of the best performance listen- ers can attain and are the least contaminated by cognitive factors. The reference conditions for experiment 1 tested up to six or seven probe-tone frequencies per block, although some thresholds were checked with minimal uncertainty conditions.

In the other listening conditions, a subset of the 48 sen- tences was active for each experimental run. In the interme- diate uncertainty condition (experiment 1), the adaptive procedure tracked thresholds for probe tones in six sen- tences simultaneously within a single run. One vowel in each sentence was selected for threshold testing with one of the probe-tone frequencies (typically, near F 1, F 2, or the valley between the first two formants). Different sets of six sen- tences and/or probe-tone frequencies were used in succes- sive runs. To give the appearance of more variation in maskers, the sentences not in the active set in each run could also be selected during extraneous trials. During an extran- eous trial, the probe-tone frequency and its position within the sentence were sampled randomly with uniform probabil- ity. The level of the probe tone in extraneous trials was set to be clearly audible, using a heuristic based on the levels used for the active sentence set. Extraneous trials occurred 40%

of the time in experiment 1. Responses to extraneous trials were not used by the adaptive procedure to modify signal

level and thus had no effect on the data collection for other

conditions.

The third listening condition was studied in experiment 2, where nearly all of the sentences were in the active set. Each sentence contained up to four vowels tested for mask- ing effects. As described below, all similar vowels were equalized for overall power and were considered to be differ- ent exemplars of the same vowel for the purposes of thresh- old testing. Five or six vowels were tested in this fashion within each experimental run. Extraneous trials, in which the nontested sentences could also be drawn, occurred 10% of the time. The power-normalized versions of the vowels were used in the reference condition for experiment 2.

A control experiment, which preceded the main experi- ment, assessed the effect of the experimental method on probe-tone thresholds. Beginning with single syllables, no randomization of maskers, probe tones of constant frequen- cy, and no true/fals,e responses, the control experiment gradually introduced more variability of stimulus param- eters until the blocks of trials contained full sentences, ran- domization of maskers (between the two control sentences

shown in Table I), multiple (six) probe-tone frequencies, and the requirement of true/false responses. The control ex- periment was run using the 2IFC procedure (see Appendix A), and the results were corroborated with the single-inter- val procedure. The effect of all the experimental manipula- tions was small, about 2 dB. A major concern is the effect of the extra (true/false) responses required in the main experi- ments. In the control experiment, only 1 dB of the threshold elevation was due to this task. Furthermore, there was some indication that listeners had improved from these values with additional training.

Specific details of the procedures will be discussed with- in the sections for each experiment. Table III shows a sum- mary of the experimental conditions.

II. EXPERIMENT I

A. Reference condition

The reference conditions were run after the main experi- ments, to avoid the possibility of teaching the listeners where to focus their attention. However, a clearer picture of the results emerges by first considering the thresholds obtained in the reference condition. One syllable was used for the masker within each block, but probe-tone frequency was chosen randomly from six or seven values. The probe tone was located midvowel or midway within stop consonants.

Figure 1 shows thresholds averaged for the three listen- ers for all probe-tone frequencies and maskers used in the

TABLE III. Summary of conditions.

Masker

type Condition Syllable Sentence One

Probe-tone

location

Multiple

Probe-tone Required frequency answer

One Multiple Detection True/false

Exp 1--reference X X --uncertainty X X

Exp 2mreference X X muncertainty X

(x) x x x x x

x x

x x x

2036 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2036

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 5: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

lil in sentence 38 lil in sentence 43 lil in sentence 47

o

I I I I I I

500 1000 1500 2000 2500 3000

Frequency (Hz)

/•)/in sentence 30

!

o 500

i ! i i i

1000 1500 2000 2500 3000

Frequency (Hz)

/u/in sentence 9

i i i i i i

1000 1500 2000 2500 3000

Frequency (Hz)

/i/in sentence 36

o

o

o I,,.

o

o

o

o

i

0 500

i i i i

1000 1500 2000 2500 3000

Frequency (Hz)

/d/in sentence 9

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

i

o 500

! i i

1000 1500 2000 2500 3000

Frequency (Hz)

/p/in sentence 47

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

i i i i i i

500 1000 1500 2000 2500 3000

Frequency (Hz)

FIG. 1. Average probe-tone thresholds (circles) and smoothed LPC spectra for eight maskers tested in the reference condition. The top six panels depict the masking obtained midvowel (the/i/vowels in the word "healthy" from sentences 36 and 38, and in the word "sweet" from sentences 43and 47;/8/in the word "don't" from sentence 30; and/u/in the word "do" from sentence 9) and the bottom two depict the masking by stops and their neighboring vowels (the VCV/idi/in the word "disobedient" from sentence 9 and the VCV/a•p!/in the word "apples" from sentence 47). The upper spectral curves in the bottom two panels show the spectrum of the previous vowel, approximately 30 ms prior to the probe-tone onset. Average standard errors of the mean are shown for all thresholds by vertical lines through the data points. Most error bars are smaller than the diameter of the circles.

reference condition. For comparison, smoothed spectra ob- ing well to the spectral peak of 1400 shown in the figure. The tained from linear predictive coefficients (LPC) are shown elevation in thresholds appears to be due to a contribution for 16-ms time slices of the speech centered on the probe- from forward masking from earlier F2 peaks, which are tone midpoint. 3 The relative amplitudes of the spectra are above 1600 Hz for the interval between 20 and 60 ms prior to aligned arbitrarily with respect to the threshold data. In probe-tone onset. This pattern of shift in F2 frequency is most respects, the spectral shape is very roughly preserved in ß typical for a/u/following a dental stop. Thus, even though the profile of the probe-tone thresholds. The threshold func r tion is smoothed considerably, presumably by the broad- band filter characteristics of the peripheral auditory system. For instance, out of four/i/maskers, in only one (/i/in sentence 36) is there a suggestion that the upper two for- mants are resolvbd. However, the frequencies and relative amplitudes of most of the threshold peaks correspond well with the physical spectra, within the limitations of conven- tional measurements of critical bandwidths. These data are

similar to the results of previous studies (e.g., Van Tasell et al., 1987) that measured the vowel masking patterns of syn- thetic vowels. For the two maskers shown in the bottom of

the figure and for the/u/, the less-than-exact correspon- dence may have additional causes. The thresholds for the /u/masker show a peak nearer to 1600 Hz, not correspond-

F2 was stable for at least 20 ms around the probe tone, it appears that formant peaks at different frequencies created nonsimultaneous masking effects.

The bottom two panels of Fig. 1 depict data where the probe tones were centered within the stop consonants, re- gions where the energy decreases by as much as 45 dB rela- tive to the levels midvowel. In these cases, masked thresh- olds represent the influence of nonsimultaneous masking from energy in the neighboring, high-energy vowels, as was found by Spiegel (1987a). In the case of the/d/stop taken from the word "disobedient" in sentence 9, the trend of the thresholds in general (particularly for the 2300-Hz probe tone) shows an influence that is perhaps suggestive of F2 from the neighboring/i/vowels (the figure shows the spec- trum for the preceding vowel, 34 ms earlier). In the/p/stop ,

2037 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2037

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 6: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

taken from the word "apples" in sentence 47, the threshold function may also show slightly better correspondence with some features of the formants of the neighboring vowel than with the spectrum of the stop consonant simultaneous with the probe tone. In particular, thresholds do not follow the upper peaks in the simultaneous spectrum, but instead ap- pear to show some degree ofnonsimultaneous masking from the preceding/•e/vowel, possibly due to peaks in the spec- trum 32 ms earlier near 1400 and 2250 Hz.

Because the correspondence between the threshold data and the neighboring spectra was less than perfect, we also investigated the relationship between the dynamics of natu- i'al speech and nonsimultaneous masking by varying probe- tone location throughout one of the maskers previously de- scribed. The results of the experiment showed that spectra change so rapidly over time that it is difficult to find a single spectrum that matches the threshold data. These results are described in more detail in Appendix B.

B. Intermediate uncertainty condition

In experiment 1 all of the maskers used for the reference condition were tested in mixed fashion to create an interme-

diate level of uncertainty. Also, extraneous trials comprised 40% of each block to prevent the listeners from learning all the probe-tone frequencies and positions. Most of the results

for experiment 1 are compared in Fig. 2 with corresponding data from the reference condition.

Overall, there was little difference between the thresh- olds measured in experiment 1 and those in the reference condition. The thresholds differed in the two conditions by an average of -0.2 dB; i.e., the higher uncertainty condi- tion obtained marginally lower average thresholds. How- ever, one threshold pair appears anomalous--the threshold estimates at 2 kHz for sentence 30 are 10 dB lower for experi- ment 1. The low thresholds can be traced to poor estimates for two listeners caused by unusually low false-alarm rates, which dramatically altered the estimates of d '. (Recall that many estimates of d' were based on only 30 trials/level.) If the data for that probe-tone frequency are disregarded, the two listening conditions differ by an average of 0.7 dB. As is evident from the figure, the conditions differ more near the formants ( 1.7 dB) than in the valleys between them (essen- tially no difference).

One reason the two conditions may have differed by such a small amount is that the method of experiment 1 confounded maskers and probe-tone position: that is, only one temporal position in each sentence contained a probe tone. Thus, listeners may have learned to focus their atten- tion at the proper time as they identified each sentence, thus reducing uncertainty. There is evidence (Spiegel and Wat-

/i/in sentence 38 /i/in sentence 43 /i/in sentence 47

o •0

o

o

o

o

500

i i i i

1000 1500 2000 2500 3000

Frequency (Hz)

/•/in sentence 30

i i i i

1000 1500 2000 2500 3000

Frequency (Hz)

/u/in sentence 9

i i i i i i

500 1000 1500 2000 2500 3000

Frequency (Hz)

o

-o o

o •,•

o •0

o

o

o

o

o

i

0 500

i i i i i

1000 1500 2000 2500 3000

Frequency (Hz)

/d/in sentence 9

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

0 500

i i i !

1000 1500 2000 2500 3000

Frequency (Hz)

/p/in sentence 47

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

FIG. 2. Average probe-tone thresholds ( + symbols) for the intermediate-uncertainty condition and the reference condition (open circles) obtained in experiment 1. The lines show the smoothed LPC spectrum at the probe-tone midpoint.

2038 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel' Speech masking 2038

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 7: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

son, 1981 ) that listeners can learn the relation between pat- terns and test positions. When they do so, randomly selected well-learned patterns are heard with a resolution that is as good as that found for the individual patterns under minimal uncertainty conditions. The present study is similar in that only one location per masker was tested. 'So, although the conditions in experiment 1 allowed relatively easy measure- ment of probe-tone thresholds, the level of uncertainty was probably too low to serve as a simulation of natural listening conditions. The next experiment addressed this problem.

III. EXPERIMENT 2

A. Method

In the second experiment, the level of uncertainty was increased by measuring the masked thresholds for several locations in each sentence within a session. The italicized

vowels in the sentences in Table I show the 110 potential probe-tone locations. 4 Many vowels underwent an adjust- ment procedure aimed at equalizing their masking effective- ness. After adjustment, these vowels were considered in ex- periment 2 to be different exemplars of the same masker for the purposes of the threshold testing.

To equalize all maskers for masking effectiveness, all instances of the vowels/i/,/ae/,/u/,/v/, and/:•/near stable formant trajectories in the 48 sentences were ana- lyzed. Productions of each vowel class that differed by less than 2 dB in the amplitude ratio of F l/F2 were clustered. For example, in 15 productions of the vowel/i/, the F l/F2 ratio ranged between 12 and 14 dB; in the 14 productions, the ratio ranged between 15 'and 17 dB; and, in 11 produc- tions, the ratio ranged between 18 and 20 dB. Eleven groups of similar productions (herein termed an ensemble) were identified for the five vowel classes, with an average of about ten instances per ensemble. In each ensemble, the vowel nearest to the average overall power was defined as the pro- duction reference. If the formant trajectory or the overall

power differed greatly from the reference, or if the vowel duration was less than 100 ms, the vowel was removed from the test ensemble. To equalize all productions remaining in each ensemble, the level of the vowels was then adjusted by a six-segment linear multiplication function according to the following criteria: First, the adjustments attempted to pro- duce a vowel of constant power (within 1.5 dB) for at least 30 ms on either side of the probe-tone midpoint (midvowel) to minimize nonsimultaneous masking effects. Second, the head and tail of the adjustment function were as long as possible (encompassing one or two syllables ) to achieve gra- dual changes from the natural production. After adjust- ment, all vowels in each ensemble differed in F 1 amplitude from the references by less than 1 dB, and, in F2 amplitudes, by 2 dB. Figure 3 shows the LPC spectra for the 15 produc- tions of/i/that had an F l/F2 ratio ranging between 12 and 14 dB after adjustment. All spectra for the modified vowels contained nearly identical F 1 amplitudes and a small range of F2 amplitudes, although differing somewhat in formant frequencies. The "tails" of the adjustment function for about 5% of the conditions were modified further on the basis of

pilot testing? Most of the changes were inaudible, involving ampli-

tude adjustments of less than 2-3 dB. However, some of the larger amplitude adjustments (maximum changes were -- 9.9 and + 8 dB) were audible. Note that intensity vari-

ation occurs naturally (head turns, transmission variations, etc.) and is the least linguistically meaningful variation; stress, for instance, is more often conveyed through tempo- ral or fundamental frequency changes (Fry, 1955; Streeter, 1978). The experimental design attempted to minimize one potential problem with the intensity-adjusted sentences; that is, the probe-tone locations could be cued by the intensity variation. 6 To avoid this problem, syllables containing the most noticeable intensity variations were not in the active set of potential probe-tone locations in the first set of runs. This strategy helped disassociate intensity change with probe-

I I I I I •

500 1000 1500 2000 2500 3000

Frequency (Hz)-

FIG. 3. LPC spectra for the vowels (at probe-tone midpoint) comprising one of the ensembles for/i/after normaliza-

tion for overall amplitude differences in Fl.

2039 J Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2039

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 8: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

tone locations for the listeners.

Thresholds were collected for a minimum of three vow-

els from each ensemble. The frequency of the probe tones on data trials was selected to be near either F 1, F 2, or the valley between the formants for each individual vowel in each en-

semble. Extraneous trials comprised 10% of each block. In the reference listening conditions (minimal uncertainty) for experiment 2, thresholds were obtained with speech sylla- bles taken from the set of intensity-adjusted sentences for comparison. Only one probe-tone frequency was tested per block.

B. Naturalistic condition: Results

Average thresholds for each vowel ensemble are shown in Fig. 4, which compares thresholds obtained in the natural- istic listening condition and the reference condition. The spectrum for one of the vowels in each ensemble, the produc- tion reference defined during the adjustment process, is shown for comparison purposes. Note that this is not an average spectrum for the ensemble, since the individual pro- ductions of each vowel in the ensembles contain slightly dif- ferent formant frequencies. Thresholds were elevated in the uncertainty condition by an average of 3.9 dB over the refer- ence condition. Standard error, averaged across listeners,

was 1.2 dB. The elevation in threshold was maintained for all

vowels and probe-tone frequencies, except for the masking at F 1 by the first ensemble set of/i/. There were no significant differences in elevation of masking obtained for different vowels or for the masking by formants versus the valleys between them.

One unexpected and potentially useful trend in the cur- rent data is that thresholds for probe tones were not elevated by listening conditions uniformly for all probe-tone frequen- cies. Figure 5 plots the difference in threshold between the two conditions as a function of probe-tone frequency. There are no significant differences between the different vowel ensembles. The trend line shows that high-frequency probe tones were elevated in the naturalistic listening conditions by a slightly greater amount than the elevations for low-fre- quency probe tones. The result must be viewed as prelimi- nary because of the paucity of data above 1500 Hz and the significant variability of threshold differences found across frequency. The effect appears to be an enhancement of basic masking effects. The explanation for this result is not clear. One explanation may be based on possible difference in the relative variability of the energy in the different frequency bands, although a preliminary analysis finds more variabil- ity of levels for lower frequencies than for higher frequen-

/i/set 1 /i/set 2 /i/set 3 lael set 1

0 500 1500 2500

Frequency (Hz)

/ae/set 2

0 500 1500 2500

Frequency (Hz)

/u/set 1

0 500 1500 2500 0 500 1500 2500

Frequency (Hz) Frequency (Hz)

/3/set 2 /3'/set 1

0 500 1500 2500 0 500 1500 2500

Frequency (Hz) Frequency (Hz)

/u/set 2 /o/set 1

0 500 1500 2500 0 500 1500 2500

Frequency (Hz) Frequency (Hz)

/ 3'/set 2

0 500 1500 2500

Frequency (Hz)

0 500 1500 2500 0 500 1500 2500

Frequency (Hz) Frequency (Hz)

FIG. 4. Average probe-tone thresholds ( + symbols) for the naturalistic listening condition and the reference condition (open circles) obtained in experiment 2. (The ordinate values for the data show the average probe-tone frequencies, although each probe-tone frequency tested was near the individ- ual formants and valleys for each vowel. ) In each panel, the solid line shows the LPC spectra at the probe-tone midpoint of the reference production for each ensemble.

2040 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel' Speech masking 2040

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 9: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

o 500

Threshold difference between conditions

3' ae u u i

3' 3

u 3')

I I

1000 1500

Frequency of probe tone (Hz)

I

2000

i

i

2500

FIG. 5. Relation between threshold ele-

vation found in experiment 2 and aver- age probe-tone frequency for each en- semble. Some data points have been moved horizontally to prevent overlap. Solid line shows function from locally weighted linear regression.

cies. Another unexpected trend, shown in Fig. 6, is that the naturalistic listening conditions obtain thresholds that are elevated more for wider formant separations. The median threshold difference for the widest formant separations is nearly twice that for the smallest formant separation. This is not just an enhancement of basic masking effects, but rather a more complex relationship. Note that larger elevations are obtained for masking measured at the formants as well as the valleys between them. This result may be partially related to the pattern shown in Fig. 5, as wide formant separations necessarily require high F2 frequencies. A fuller investiga-

tion would compare narrow formant separations at various absolute frequencies.

IV. DISCUSSION

The present results provide previously unknown infor- mation about the representation of speech sounds in the au- ditory system during speech comprehension. Under "natu- ralistic" listening conditions, where listeners had to respond to the truth value of 48 sentences, average thresholds were elevated by about 4 dB over their values in the reference,

Threshold difference between conditions

•1 2V

V 2

v

1 1

2OO

I I

400 600

I I I I

800 1000 1200 1400

Difference in formant frequencies, F2-F1 (Hz)

1

I

1600 1800

FIG. 6. Relation between threshold ele-

vations found in experiment 2 and aver- age formant separation for each ensem- ble. Data for probe tones located at F 1 and F2 are shown by 1 and 2, respective- ly, and by V for the valleys. Solid line shows function from locally weighted linear regression.

2041 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2041

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 10: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

minimal uncertainty, condition. This value represents a con- servative measure of the level of tones that may be added to speech imperceptibly. It is likely that listeners hearing a sen- tence a first time (e.g., conditions of masking uncertainty as investigated by Spiegel et al., 1981; Neff and Green, 1987) and being required to comprehend it, would exhibit thresh- olds that are a fair amount higher than the elevations implied by the present results. The listeners in the current study lis- tened to the same sentences hundreds of times. Previous re-

search with well-learned, patterns has shown that familiarity with a pattern mitigates many of the effects of uncertainty (Spiegel and Watson, 1981 ). In addition, the listeners were highly trained and thus had some knowledge of the probable locations of the probe tones.

Do these experiments provide data relevant to the inter-, nal spectrum of speech? The answer hinges somewhat on one's definition of internal spectrum. It has long been ac- knowledged that the best performance is obtained when sig- nal and masker parameters are held constant anti are reason- ably well known by listeners. Under these conditions, masker thresholds probably most closely approximate the pattern of activity in the auditory periphery. The present experiment adds to the growing list of studies by Green, Watson, and many coworkers that show that thresholds may be additionally elevated by task and cognitive factors. These reflect an "effective" internal spectrum, that is, the represen- tation of the activity available when making decisions. The current study measured the effects of the task of monitoring speech for meaning and provides a picture of its internal representation.

ACKNOWLEDGMENTS

Harry Levitt provided remarkably good advice for the adaptive single-inte•rval procedure and Dan Kahn suggested the need for power-normalization vowels for experiment 2.

APPENDIX A: MASKER RESPONSE BIAS

The study began with an adaptive procedure based on two-alternative-forced-choice (2IFC) trials, a method com- mon to most current detection experiments. After a series of control experiments, it became necessary to change the ex- perimental method to a single-interval (yes/no) adaptive procedure. Some conditions of the main experiment re- quired different speech maskers in each interval of the 2IFC trials to prevent listeners from comparing directly the two intervals. Some listeners developed a bias for particular

maskers, which they thought contained a signal. The bias caused these listeners to choose the correct interval in partic- ular trials, even though the signal was inaudible or even not present. This elevated their level of chance responding on some conditions. These bias effects can occur only when dis- tinguishable maskers are paired randomly within trials. ß

Table AI shows an example of experimental conditions sensitive to masker response bias. Two maskers (A and B) are randomly paired within trials. Thresholds are tracked adaptively for both maskers (conditions 1 and 2). If a listen- er exhibits a response bias for masker A, thinking it sounds like it contains a signal even when it does not, then, when the masker is paired with masker B, the listener will always choose the interval in which A occurred. When a signal actu- ally was presented in that interval (trial type 2), the listener chooses the correct interval, even when it is subaudible.

The table shows that a listener with a masker response bias can get an average of 75% correct on inaudible-signal trials in condition 1. Becatlse the adaptive 2IFC procedure seeks a performance level close to this elevated chance level, the signal level can spend significant amounts of time below actual threshold. Thresholds computed from averages of n reversals for some conditions in the beginning of this study were artifactually low for this reason. In extreme cases, lis- teners appeared to be responding perfectly to signals that were, in reality, 50 dB below threshold. Spiegel (1987b) has shown through simulations that adaptive 2IFC procedures are sensitive to masker response bias up to very high perfor- mance criteria•onsiderably above the 70%-85% range usually employed in psychophysical experiments. The masker response bias is possible only when distinguishable maskers are presented randomly within trials; it occurs when one masker sounds to the listener as though it contains the signal. Adaptive forced-choice procedures implicitly as- sume that maskers are drawn from the same underlying (perceptual) distribution. Some experiments violate this as- sumption by using complex maskers that are perceptually different.

APPENDIX B: EFFECT OF A VARYING PROBE-TONE POSITION

A complete picture of nonsimultaneous masking is diffi- cult to obtain from masking of a probe tone at single tempo- ral positions. Although not germane to the purposes of the main experiments, it was thought that a small experiment which varied probe-tone position throughout a syllable that is dynamically changing would be an appropriate contribu-

TABLE AI. Trial types in the 2IFC mixed-condition experiment.

Condition Trial type Signal

Interval

No signal % trials % correct

1 A A

1 2 A B 3 B A

2 4 B B Chance

50 50

50 100

75

2042 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2042

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 11: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

LPC spectra for/idi/in sentence 9

0 500 1000 1500 2000 2500

Frequency (Hz)

Probe-tone thresholds

o _

0 500 1000 1500 2000 2500

Frequency of probe tone

FIG. B 1. LPC spectra (top panel) and probe-tone thresholds (bottom pan- el) for several temporal positions during and after the stop of/idi/from sentence 9. Temporal positions represented by the symbols are: (1) cen- tered within the stop; (2) onset of second vowel, 25 ms from position 1; ( 3 ) 26 ms from onset of second vowel; (4) 132 ms from onset of second vowel; (5) 161 ms from onset of second vowel; (6) 191 ms from onset of second vowel. Some data points in the bottom panel have been moved horizontally to prevent overlap.

tion to the body of literature surveying the masking effects of natural speech. The same speech segment of/idi/from sen- tence 9 investigated in experiment 1 was selected for the masker.

The top of Fig. B 1 shows the spectra at the midpoints of the probe-tone positions. The lowest curve represents the middle of the stop and corresponds to the spectrum shown for the/d/stop in Fig. 2. The Other probe tones were posi- tioned successively later into the second vowel: coincident with vowel onset, 25 ms later, and three more positions that sampled approximately equal movement of the vowel's sec- ond formant. The data, shown in the bottom of the figure, only crudely approximate the movement of the second for- mant. Instead, the thresholds show influence from neighbor- ing time slices. As was mentioned, thresholds for probe tones centered in the stop (shown as l's in the figure) show eleva- tions that correspond with the position of the second for- mant about 20-35 ms away. Thresholds coincident with the onset of the second vowel (2's) correspond better with peaks

about 25 ms later (position 3 ). Thresholds for each position show the effects ofnonsimultaneous masking--at least in the location of the threshold peaks. Thresholds for probe-tone positions labeled 5 and 6 (about 160 and 190 ms into the vowel) appear to peak between the formant peaks occurring at those times, with the data represented by 5's showing stronger masking from the higher frequency formant peaks occurring about 30 ms earlier. A clear picture of nonsimul- taneous masking is obscured by the fact that natural speech stimuli do not "sit still"--it is much easier to investigate nonsimultaneous masking with laboratory-generated, static maskers than with naturally occurring, dynamic sounds. However, the data in the reference conditions can be sum- marized simply: Nonsimultaneous masking appears to influ- ence thresholds over a time span of (roughly) 30 ms--a time course consistent with the previous study of speech masking effects (Spiegel, 1987a).

•A specific performance level was not required in this study, since listeners obtained over 95% correct on the true/false task on a large majority of sessions.

2Threshold estimates were obtained excruciatingly slowly for the high-un- certainty conditions (at least 30 min of listening time per threshold deter- mination). This was due to the long trial duration (9 s), the use of extran- eous trials, the relative inefficiency of the adaptive single-interval procedure, and a stopping rule that required a minimum number of trials for all conditions. Thus, for these conditions, only one threshold determin- ation was obtained per condition from each listener.

3The general shape of the spectral envelope (but not the absolute amplitude of the formants and valleys) is typically well-modeled by LPC. The use of LPC is merely a convenience to compare spectral envelopes; similar con- clusions are reached whether LPC, FFT, or filter bank analyses are made.

4The table shows that certain sentences have four test locations, while oth- ers have only one. However, some sentences were not four times as likely to be chosen as others. The vowels were tested using a sampling procedure that minimized the differences in sampling probabilities between sen- tences.

5The author spot-checked thresholds for most of the vowels. The most trou- blesome were sentence-final vowels; their steadily declining overall levels in the natural production could not be adjusted adequately into constant- energy stimuli by the six-segment function. In this case, the offsets of some of the vowels (the decaying portion of the sentence-final vowel) had to be additionally increased by 2-3 dB to equalize thresholds.

6An alternative procedure for equalizing thresholds would have adjusted probe-tone level instead of the masker level. However, this alternative would not have minimized nonsimultaneous masking effects, which the current procedure accomplished by modifying the energy before and after the probe-tone location.

Buus, S., Schorer, E., Florentine, M., and Zwicker, E. (1986). "Decision rules in detection of simple and complex tones," J. Acoust. Soc. Am. 80, 1646-1657.

Fry, D. B. (1955). "Duration and intensity as physical correlates of linguis- tic stress," J. Acoust. Soc. Am. 27, 765-768.

Green, D. M. (1961). "Detection of auditory sinusoids of uncertain fre- quency," J. Acoust. Soc. Am. 33, 897-903.

Green, D. M., and Weber, D. L. (1980). "Detection of temporally uncer- tain signals," J. Acoust. Soc. Am. 67, 1304-1311.

Greenberg, G. Z., and Larkin, W. D. (1968). "Frequency-response charac- teristics of auditory observers detecting signals of a single frequency in noise: The probe-signal method," J. Acoust. Soc. Am. 44, 1513-1523.

Levitt, H. (1984). "Procedural factors in speech-recognition testing," ASHA Rep. No. 14.

Neff, D. L., and Green, D. M. (1987). "Masking produced by spectral un- certainty with multicomponent maskers," Percept. Psychophys. 41,409- 415.

Spiegel, M. F. (1987a). "Speech masking. I. Simultaneous and nonsimul- taneous masking within stop and flap closures," J. Acoust. Soc. Am. 82, 1492-1502.

2043 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel' Speech masking 2043

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24

Page 12: Speech masking. II: Simultaneous masking thresholds under ‘‘naturalistic’’ listening conditions

Spiegel, M. F. (1987b). "Adaptive forced-choice procedures can yield con- taminated thresholds: Masker response bias," J. Acoust. Soc. Am. Suppl. 1 82, S93.

Spiegel, M. F., and Green, D. M. (1982). "Masker and signal uncertainty with noise maskers of varying duration, bandwidth, and center frequen- cy," J. Acoust. Soc. Am. 71, 1204-1210.

Spiegel, M. F., Picardi, M. C., and Green, D. M. (1981). "Signal and mask- er uncertainty in intensity discrimination," J. Acoust. Soc. Am. 70, 1015- 1019.

Spiegel, M. F., and Watson, C. S. (1981). "Factors in the discrimination of tonal patterns. III. Frequency discrimination with well-learned pat- terns," J. Acoust. Soc. Am. 69, 791-795.

Streeter, L. A. (1978). "Acoustic determinants of phrase boundary percep- tion," J. Acoust. Soc. Am. 64, 1582-1592.

Tanner, W. P., Jr., Swets, J. A., and Green, D. M. (1956). "Some general

properties of the hearing system," University of Michigan: Electronics Defense Group, Tech. Rep. No. 30.

Watson, C. S. (1987). "Uncertainty, informational masking, and the capac- ity of immediate auditory memory," in Auditory Processing of Complex Sounds, edited by W. A. Yost and C. S. Watson (œrlbaum, Hillsdale, NJ), pp. 267-277.

Watson, C. S., and Kelly, W. J. (1978). "Informational masking in audi- tory patterns," J. Acoust. Soc. Am. Suppl. 1 64, S39.

Watson, C. S., Kelly, W. J., and Wroton, H. W. (1976). "Factors in the discrimination of tonal patterns. II. Selective attention and learning un- der various levels of stimulus uncertainty," J. Acoust. Soc. Am. 60, 1176- 1186.

Van Tasell, D. J., Fabry, D. A., and Thibodeau, L. M. (1987). "Vowel identification and vowel masking patterns of hearing-impaired subjects," J. Acoust. Soc. Am. 81, 1586-1597.

2044 J. Acoust. Soc. Am., Vol. 84, No. 6, December 1988 Murray Spiegel: Speech masking 2044

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.111.164.128 On: Sat, 20 Dec 2014 07:39:24