Organizational factors in the effect of irrelevant speech

Memory & Cognition1995,23 (2), 192-200

Organizational factors in the effect of irrelevantspeech: The role of spatial location and timing

DYLAN M. JONES and WILLIAM J. MACKENUniversity ofWales CoUege ofCardiff, Cardiff, Wales

Typically, hearing a repeated syllable produces minimal disruption of serial recall of visual lists,but a sequence of different syllables impairs performance markedly. Two conditions for presentingan identical sequence of three syllables are compared: one, in which, by means of stereophony, eachsyllable is assigned to the left, center, or right auditory locus (three streams not changing in state), andanother, in which the same syllable sequence occurs in one location only (one stream with changingstate). Disruption was significantly less in the stereophonic than in the monophonic condition. Therewas a joint effect of changing state and location, not an effect of the number of locations alone. InExperiment 2, temporal predictability was used to manipulate changing state. The disruptive effectof regular presentation of a repeated syllable was markedly increased when it was presented irregularly. The results are discussed in the context of organizational factors in short-term memory.

Even though subjects are explicitly instructed to ignoreit, irrelevant speech present during a test of short-termmemory for visually presented items results in a markedreduction in efficiency in serial recall (see, e.g., Colle,1980; Colle & Welsh, 1976; Jones, Miles, & Page, 1990;Morris & Jones, 1990a, 1990b; Salame & Baddeley, 1982,1986a, 1986b, 1987, 1989; see Jones & Morris, 1992, fora review). Several features of this effect are now understood. The meaning of the speech is not an important determinant of the effect, because disruption is roughly thesame for English narrative speech as it is for speech in aforeign language (Colle, 1980; Colle & Welsh, 1976;Salame & Baddeley, 1982) or indeed for reversed speech(Jones et al., 1990). Moreover, the effect does not occurat encoding-that is, it is not an effect of interference byspeech as the visual items are being presented-but ratheroccurs in memory as the items are rehearsed (Jones,1994; Jones & Macken, 1993; Miles, Jones, & Madden,1991; Salame & Baddeley, 1986b). Despite these advances, we still do not have a satisfactory account of theattributes of speech that determine the disruption.

Change in the composition of the irrelevant sound, orchanging state, has emerged as an important factor. A repeated sound, such as a single syllable, shows little disruption of serial recall, but if there is some degree ofvariability between the discrete events in the sequence,disruption becomes marked (Jones, 1994; Jones & Macken,1993, in press; Jones, Madden, & Miles, 1992). For ex-

These experiments are part of a program of work funded by theUnited Kingdom's Economic and Social Research Council. Thanks aredue Alison Murray, Andrew Bridges, and Philip Tucker for their comments on work reported here. Edward Buck helped by running subjectsfor Experiment IC. Requests for reprints should be sent to D. Jones,School of Psychology, University of Wales College of Cardiff,P.O Box 901, Cathays, CardiffCFI 3YG, U.K.

ample, a sequence of syllables formed of the regularlyrepeated letter-name "c"-a steady-state sequenceproduces little or no disruption, but a sequence made upof the syllables based on the letter names "c, h,j, u"-achanging state sequence-produces a more substantialdecrement in performance (Jones, 1994; Jones &Macken, 1993; Jones et al., 1992). In the present seriesof experiments, we were concerned with refining ourunderstanding of the process of interference from irrelevant speech by focusing on the effect of organizationalfactors in the auditory stream.

The present experimental series capitalizes on therules oforganization used for auditory perception, knowngenerically as streaming, to manipulate the dimension ofchanging state in the irrelevant speech (see Bregman,1990). Auditory streaming refers to the result ofperceptual processes involved in the organization of sound,particularly by temporal integration of auditory stimuliinto coherent wholes. So, for example, a fast sequence ofsounds coming from the same spatial locus tends to beperceived as a single, temporally extended stream. Similarly, sounds in temporal proximity may be grouped together into related wholes. These two techniques ofstreaming were used to test the idea that, if the organization of speech into streams modifies the degree ofdisruption by irrelevant speech, then the interference cannot be due merely to the similarity ofphonology betweenthe speech and the to-be-remembered material, as Salameand Baddeley (1982) have suggested. The present experiments were designed to supplement those in whichthe relation of the phonology of the heard stream to thephonology of the to-be-remembered items has been manipulated systematically (Jones & Macken, in press). Allthese studies are at variance with the Salame and Baddeley (1982, 1989) hypothesis by showing that the similarity of the heard and remembered material is relativelyunimportant. Rather, the similarity in phonology within

Copyright 1995 Psychonomic Society, Inc. 192

PERCEPTUAL ORGANIZATION OF IRRELEVANT SPEECH 193

the irrelevant stream is the important factor, such that increasing levels of phonological dissimilarity betweenthe discrete units within the auditory material tend to increase disruption (Jones & Macken, in press).

The approach adopted in the present experiments wasto hold the phonological contents of the irrelevantspeech constant and to use different methods of organization to manipulate the degrees ofchanging state. Twoconvergent techniques were employed; in one, a sequence of three different syllables was used. By allocating each syllable to one of three spatial locations, auditory streaming is induced so as to result in the perceptionof a number of steady-state streams and thereby decrease the disruption relative to that found with a singlechanging-state stream comprising the same three syllables. In the other technique, the interstimulus interval(lSI) was used to alter the organization of a sequencethat consisted of a single repeated (steady-state) syllable. Variable lSI was used to produce a changing-statesequence, which should in turn increase the degree ofdisruption relative to fixed lSI. Although grouping bylocation and by temporal proximity is well documented(see Bregman, 1990; Handel, 1989), nearly all observations of it are based on techniques in which the sound isone on which the subject is instructed to focus. In theexperiments reported here, the subjects were instructedspecifically to ignore the sound and to attend only to thememory task. Arguably, therefore, any effects of attention observed in these studies should arise when thematerial is out of the main attentional focus and maytherefore be said to arise preattentively (see also Jones,Macken, & Murray, 1993).

EXPERIMENTIA

In this experiment, streaming by spatial location wasused as a means of modifying the changing-state effect.Three conditions were compared: quiet, stereophonicpresentation, and monophonic presentation. Ifa syllableis heard in one ear, after which a different syllable is heardin the other ear, and if a third syllable is then played toboth ears, the stimuli will appear to come from threeseparate locations. If this cycle of events is repeated,particularly if the rate of presentation is high, then phenomenally, three separate streams corresponding to thethree spatial locations are heard. Moreover, from thepoint of view of the changing-state hypothesis, if a particular syllable is associated with each of the locations,the content of each of the streams (one associated witheach ear and one located in the center of the head) willnot be changing in state. However, only one stream isformed if the same sequence of syllables is presented toboth ears; in this case the stream will appear to comefrom the same point in auditory space (subjectively located in the middle of the head). Since the successivesyllables are different within this single stream, conditions of changing state will now be met. In Experiment lA, we used a fixed sequence of three syllables;Jones et al. (1992) had shown that a fixed repeated se-

quence of a small set of syllables was as disruptive asrandom sequences of syllables from the same set. If organizational factors were at play, the changing sequenceat the single locus should be significantly more disruptive than three unchanging streams.

MethodSubjects. Twenty undergraduate and graduate students (14 fe

males and 6 males) were paid for participating in the experiment.Apparatus and Materials. Lists of to-be-remembered items

were constructed from the consonants F, K, L, M, Q, R, and Y.They were presented one at a time (on for 0.8 sec, off for 0.2 sec)in a random order on the screen of a Macintosh Ilcx computerusing HyperCard.

Two types of auditory material were prepared. In the first ofthese, the monaural presentation, the three utterances "C," "U,"and "0" (i.e., the letter-names "see," "you:' and "oh") were recordedin that order in a female voice digitized to 8-bit resolution at a sampling rate of22 kHz, using a MacRecorder analogue-to-digital converter. These utterances were then edited with SoundEdit digitalediting software in order to produce successive utterances of200msec duration. A monaural recording ofapproximately 20-sec duration was created by repeatedly looping this sequence of three utterances (using digital editing techniques), and this was thenstored for use in a HyperCard environment.

A second type ofauditory material was created for stereophonicpresentation. The same three utterances were used in the same sequence as before, but in this case each utterance was assigned differently in relation to the two audio channels: the syllable "U" wasplayed on the left channel, the syllable "0" was played on theright channel, and the syllable "C" was played on both channels.Again, this sequence was recorded as an apparently seamless loopto provide a recording of approximately 20-sec duration. In thisway, a recording was created, which, when played over headphones, led to the perception of three individual unvarying streamsa stream of repeating "U"s, a stream of repeating "O"s, and astream of repeating "C"s---each stream corresponding to a different location, and each syllable within each stream repeating at a400-msec offset-to-onset interval. Note that the syllables were presented sequentially (as in the monaural condition) but in three different locations. This recording was also stored in digital form tobe used in the HyperCard program. The sequence of syllables andthe timing of events was therefore Identical for both conditions,only their assignment to channels was manipulated. Timing of thestimuli meant that there was no period of silence between successive utterances, but this should make no material difference in theoutcome since prior work has shown that continuously voiced speech(produced by a synthesizer) produces effects almost identical tospeech with syllables separated by silence (Jones et aI., 1993).

Sound was played back at approximately 65 dB(A) as measuredby a sound level meter from an artificial ear. Only air-conditioningnoise was present during the quiet conditions, and after taking intoaccount the attenuation by the ear-cup ofthe headphones, we estimatethat this noise was at a level approaching the threshold of hearing.

Procedure. Subjects used a mouse to initiate each trial by clicking on a HyperCard "button" on the screen of the computer. Afterthe seven consonants had been presented, the word "wait" flashedon the screen for 10 sec, after which the subjects were promptedto recall the list by writing it in Its initial order ofpresentation. Auditory conditions (mono-one varying stream; stereo-three unvarying streams; and quiet) were randomly allocated to each trial.Sound was "on" during presentation and rehearsal and "off" during recall. There were 60 trials in all-20 for each auditory condition. The subjects were instructed to ignore any sound theyheard, and they were reassured that they would not be tested on itscontents and that the sound would not contain any useful information. Before the expenment proper, the subjects were given five

194 JONES AND MACKEN

50

60-,-------------------,

practice trials in quiet. The experimental session lasted 35-40 nunfor each subject.

EXPERIMENTIB

number of sound sources and their contents. These subjects undertook 10 trials with the stereo and the monomaterial. Without exception, they judged the stimuli appropriately.

The results show that a disruptive effect arising fromthe presence of phonological change can be attenuatedby splitting the sound into (perceptually) separate unchanging streams. The relation ofphonological contentsof the speech to the material being rehearsed is alone insufficient to account for all the disruption of serial recall; some account needs to be given ofthe way in whichsound is organized.

Before proceeding to Experiment 2, a number ofcheckswere undertaken on the results of Experiment I A.

Experiment I A suffered from an unfortunate choiceofsyllables, in that it was possible that the subjects heardthe meaningful sequence as "oh-see-you" in the monaural condition. Arguably, stereophonic presentation wouldbreak up this meaningful sequence so that the differences between the two methods ofpresentation could beattributable to changes in meaning rather than to organization in memory. Given what is known about the roleof meaning in the irrelevant speech effect, this seemedhighly unlikely. From its first demonstration by Colleand Welsh (1976), several studies have shown that meaning is not important (Colle, 1980; Salame & Baddeley,1982; Jones et aI., 1990). Nevertheless, on the remotepossibility of an effect of meaning arising in this case,another experiment was conducted in which the syllables chosen were unlikely to make up a meaningful sequence. In addition, the supplementary study was a checkon the possibility that streaming by spatial locationmight have an effect not in memory but at encoding. InExperiment l A, speech was played during both the presentation and the rehearsal phases of the task, whichmeans that any disruption could have arisen from thedisruption ofencoding, not memory. Again, the fact thatdisruption does not occur at encoding but after the itemsare in memory and being rehearsed is well established(Jones & Macken, 1993; Miles et aI., 1991; Salame &Baddeley, 1986b), but on the remote possibility that theconditions of Experiment I were an exception to thisrule, we restricted exposure to irrelevant speech to theperiod following the presentation ofthe visual lists. Ourexpectation was that this procedure would produce effects qualitatively similar to those in Experiment I A.

MethodSubjects. Twenty undergraduate and graduate students (13 fe

males and 7 males) were paid for participating. All reported normal hearing and normal (or corrected-to-normal) vision.

Apparatus and Materials. These were the same for Expertment 1A, but with three exceptions. First, the three utterances "Y,""J," and "X" (that is, the letter sounds "vee," "jay," and "ecks")were used, again recorded in a female voice. Second, the exposureto sound was restricted to the period following presentation of theto-be-remembered lists, instead ofthe whole ofthe trial pnor to re-

__ Quiet

__ Stereo (3 steady state streams)

___ Mono (1 changing state stream)

10

20

40

Results and DiscussionA 3 (auditory condition) X 7 (serial position) analy

sis of variance (ANOVA) was carried out on the data.There were significant main effects of both auditorycondition [F(2,38) = 9.63, MS. = 9.40,p < .01] and serial position [F(6,1 14) = 9.24, MS. = 17.36, P < .01].The interaction of the two factors was also significant[F(l2, 228) = 1.873, MS. = 2.24, p < .05]. The serialposition curves for each condition are presented in Figure I, which shows that the interaction was due to thefact that disruption increased with serial position.Planned comparisons indicated that performance in themono condition was significantly worse than that in thequiet condition [F(I,38) = 4.48, MS. = 9.40, p < .05],and the difference between the mono and stereo condition was also significant [F(I,38) = 5.06, MS. = 9.40,p < .05]. The very considerable difference between thestereophonic and quiet conditions was highly significant [F(l,38) = 19.04, MS. = 9.40,p < .01].

As a check that streaming did occur in the appropriate manner with these stimuli, subsequent to the experiment an additional 5 subjects were asked to judge the

345Serial Position

Figure 1. Experiment lA: Serial position curves show the contrastbetween three auditory streams (one to the left, right, and center ofthe head), one stream (to the center of the head), and quiet. Error barsindicate standard error.


50

O-'-----,-----,---,-----,---,---r--,---'

60.--------------------.

MethodSubjects. Twenty undergraduate and graduate students (12

males and 8 females) were paid for participating. All reported normal hearing and normal (or corrected-to-normal) vision.

Apparatus and Materials. This experiment was identical toExpenrnent IB, except for the addition of the following conditions: stereo-steady state, in which a single syllable ("J") was repeated every 200 msec and successive items were assigned to different spatial positions in the order left-center-right (theoffset-to-onset interval being 400 msec per stream) as in thestereophonic conditions ofExperiments lA and IB; mono-steadystate, in which the same sequence was presented monaurally.These two conditions were contrasted with the stereo changingstate, mono changing-state, and control conditions of Experi-

EXPERIMENT ic

In Experiment 1C, we entertained yet a further possibility about the results of Experiment 1A-this time,that the effect of monaural presentation arises not because of its changing-state content, but from the factthat one location of irrelevant speech simply is more potent than three, all else being equal. This view, an alternative to the effect of streaming, predicts that even asingle repeated syllable should be less distracting ifit isdivided into three locations rather than presented at one.To test this possibility, in Experiment 1C two sets offactors were manipulated. As before, stereophonic presentation of a changing-state sequence ("V;" "J," "X") wascontrasted with its monaural presentation. This time,however, in addition to the contrast between a changingstate sequence presented monaurally and the same sequence presented stereophonically in order to give riseto three steady-state streams, a further two conditionswere included, in which a steady state stream ("J") waspresented either monaurally or stereophonically. If theresults ofExperiments IA and IB were due to the number of sound sources, with a single source giving rise tomore disruption than multiple sources, then the monaurally presented steady-state sequence should be moredisruptive than the stereophonically presented steadystate sequence. If, however, the effect is one of modulation of changing state by streaming, one would expectthat the two methods of presenting steady-state sequences should give broadly comparable results.

In summary, if the effect is related to the number ofsources of sound, only the method of presentation, notthe sequences of syllables, will be important. Specifically, by this account both the monaural conditions (witheither steady-state or changing-state sequences) shouldproduce appreciably more disruption than both conditions of stereophonic presentation (again with steadystate or changing-state conditions). The streaming explanation makes different predictions-namely, that thedegree ofdisruption is jointly determined by the methodofpresentation and by the content ofthe syllable sequence.Specifically, we predicted that the monaural changingstate condition would be significantly more disruptivethan all others, since the other three conditions all contained steady-state streams.

762

10

20

call. Third, the retention interval was Increased from 10 to 18 secso that the subject's exposure to sound on each trial would beroughly equated In the two experiments.

Procedure. As for Experiment IA.

40

___ Quiet

-+-- Stereo (3 steady state streams)

--.- Mono (1 changing state stream)

ResultsAs before, a 3 (auditory condition) X 7 (serial position)

ANOVA was carried out on the error data. Once againthere were main effects of auditory condition [F(2,38) =16.48, MSe = &.14,p < .01]and serialposition [F(6,II4) =27.77, MSe = 5.83,p < .01]. These two factors interactedsignificantly [F(l2,228) = 3.37, MSe = 2.03, p < .01],and the pattern of interaction (see Figure 2) is strikinglysimilar to that found in Experiment 1A. Planned comparisons revealed that each of the conditions was significantlydifferent from the others [quietvs. stereo, F(I,38) =8.73,MSe = 8.14; quietvs. mono,F(l,38) = 32.95,MSe =8.14; stereo vs. mono, F(l,38) = 7.76, MSe = 8.14; forall comparisons,p < .01].

The results are in line with those of Experiment IA.They show that the effect was not due to the particularsyllables used, as well as that the effect occurred afterthe material had been encoded and was being rehearsedin memory. We undertook a further check on the resultsin Experiment 1C.

3 4 5Serial Position

Figure 2. Experiment IB: Replication of the conditions of Experiment lA using diffferent syllables. Serial position curves show thecontrast between three auditory streams (stereo), one stream (mono),and a quiet control condition. Error bars indicate standard error.


ment IB. There were therefore five conditions in all, with conditions changing for each trial.

Procedure. This was the same as in Experiment IA, except thatthere were 16 trials for each condition.

ResultsAs in other experiments of this series, the error scores

were subject to an ANOVA with two factors, serial position (with seven levels) and auditory conditions (thistime with five levels). As before, there were main effectsof auditory condition [F(4,76) = 3.64, MS. = 7.22,p <.01] and serial position [F(6,114) = 13.82, MS. = 10.81,P < .01]. However, unlike in Experiments lA and lB,these two factors failed to interact significantly (F < 1). Inother studies we have found that the interaction of irrelevant speech with serial position is not always consistent(see Jones & Macken, in press; Jones et aI., 1993), butany inconsistency of the present data in this respect isnot material to the hypothesis under test and may safelybe disregarded. The mean data for the five conditions arepresented in Figure 3. Planned comparisons revealedthat only one contrast was significant-that betweenquiet and mono [changing state [F(l,76) = 7.49, MS. =7.22, P < .01). All the other contrasts with quiet w~renonsignificant (all Fs < 1). Thus, these results also dif-

fer slightly from those ofExperiments IA and IB in thatthe changing-state stereo condition did not increase disruption vis-a-vis quiet. However, the previous findingthat changing-state mono presentation is more disruptive than changing-state stereo was replicated [F(l, 76) =4.75, MSe = 7.22,p < .05].

This pattern of results is the one predicted by thestreaming hypothesis. It seems unlikely, therefore, thatthe results of Experiments 1A and IB arose because ofan effect of increasing the number of sources of soundalone; rather, the spatial distribution of sound must actin concert with the effect of changing state. As a series,the component parts ofExperiment 1 reveal a consistentand replicable effect of auditory streaming. In Experiment 2, we moved on to explore other ways in whichother types of organization of material can be used tomodulate the changing-state effect. In general terms, thepurpose ofExperiment 2 was the same as that ofExperiment I, but this time the sequence consisted of a singlerepeated syllable and an auditory streaming phenomenon was used to induce a changing-state effect on visualserial recall.

EXPERIMENT 2

Experiment 2 tested the possibility that an auditorystream composed ofa single repeated syllable might stillgive rise to disruption of serial recall ifperceptual organization based on temporal grouping should come intoplay. Typically, when a single sound, b~ it a tone or ~ syllable, is repeated at a regular rate, It forms a singlesteady-state stream. If, however, the same stimulus is repeated at an irregular rate, what is perceived in some circumstances is not a sequence of individual identicalstimuli but rather stimuli clustered together to formhigher order stimuli. The effect is based on such facto~s

as the temporal proximity of groups of sounds. ThISprocess is referred to as unitization (see,. e.g., Royer ~Robin, 1986). In this way, a temporally Irregular auditory input, although made up of identical phonologicalunits at one level, may be perceived as being composedof units of different length and composition at a higherlevel of organization. Hence, from the point of view ofstreaming, we hypothesized that even though the auditory input might contain a repeated syllable (and hen~e

show no irrelevant speech effect based on change inphonological terms alone), by changing the regularity ofthe syllable sequence, different sizes of phonologicalunit might be formed that would then qualify for conditions of changing state.

MethodSubjects. Twenty volunteers-undergraduates and postgradu

ates-were paid an honorarium for participating in the study. Thegroup was made up of II females and 9 males. All. reported normal hearing and normal (or corrected-to-normal) vision,

Apparatus and Materials. The lists were constructed and presented in the same way as for Experiment 1.

Auditory materials were prepared by using digital signal processing techniques to sample and edit a single example of the ut-

JMono

Quiet

r+-

r+ + + ri-

II II II IIo

30

10

20

25

5

"VJX" "VJX" JStereo Mono Stereo

AUditory Condition

Figure 3. Experiment 1C: Recall proportions in each of five conditions, showing the joint action of changing-state auditory sequen~es

(steady-state and changing-state) and the method of presentation(stereophonic vs, monophonic). Error bars indicate standard error.


40

35

45

1.92,p> .05]. Figure 4 gives the values of the serial position X treatments interaction.

Planned comparisons revealed that the irregular rateand quiet conditions were significantly different [F(1 ,38)= 6.84, MSe = 5.32, p < .01], as was the contrast between regular and irregular conditions [F(1,38) = 4.51,MSe = 5.32, p < .05]. The difference between quiet andregular presentation was not significant [F( 1,38) = 0.24,MSe = 5.32,p > .05].

As for Experiment 1, a test of the perception of the auditory material was undertaken with a fresh group of 5subjects. They were asked to make a forced-choice classification of either "regular" or "irregular" for each ofthe two sounds used in the experiment (half were regular and half were irregular). There was no case of misclassification in 20 trials.

The results complement those of Experiment 1 byshowing the effect of streaming (here on the basis oftemporal regularity rather than localization), this time byaugmenting the small disruptive effect usually foundwith repeated syllables. They also suggest an importantqualification to an earlier assertion that processes associated with the detection of syllable boundaries are inpart responsible for the irrelevant speech effect. Joneset al., (1993) argued that only when the auditory streamis broken up into separate segmented entities is there amarked disruption ofserial recall, and they further speculated that the unit of analysis for such segmentationmay correspond to the syllable. The results of Experiment 2 point to the fact that suprasyllabic units createdby temporal conjunction may also serve as the basis fora changing-state effect. However, organization based onhigher order analysis such as the meaning of the elements in a sound stream is an unlikely basis for the disruption. The fact that speech played backwards produced effects as potent as forward narrative speech (cf.Jones et al., 1990) suggests that higher order organization imposed by the meaning of the irrelevant soundplays no major role in the disruption. Again, that the predictability of the phonological content of the syllablestream does not influence the degree of disruption(Jones et al., 1992) further suggests that changing stateis being signified by a very low level of analysis, beforethe phonemic content of the sound is fully analyzed. Thefact that nonspeech sounds produce disruption similar inform and extent to speech again supports this point(Jones & Macken, 1993). However, the results of Experiment 2 suggest that suprasyllabic organization ispossible, but as a result of temporal organization. Takenwith the other evidence just reviewed, these results tendto confirm the generalization that organization of the irrelevant stream may be based solely on representationsat the precategoricalleve1; that is, the organization seemsto be independent of language. But this is not the onlypossibility. Another is that the disruptive effect occurs atthe point ofgreatest change in the irrelevant stream, thisbeing when a relatively long silence is broken or at theend ofa set ofmultiple stimuli.' This explanation wouldnot involve the idea of higher order objects, but instead

76

___ Quiet

__ Regular lSI

__ Irregular lSI

345Serial Position

2

5

10

O-L--r---...--------.---,----.---,..-----.---'

50,------------------,

Figure4.Experiment 2: Serial positioncurves showingthe contrastbetween a sequence of repeated syllables presented regularly(500 msec offset to onset) and irregularly (with a mean of 500 msecoffset to onset), and quiet. Error bars indicate standard error.

20

30

15

terance "ah" in a male voice recorded at a pitch of approximatelyC3 (131 Hz). The editing software was then used to edit digitizedcopies of the utterance into a sequence. Twotypes ofmaterial werepresented: unvarying, with the "ah" (300 msec long) repeated witha fixed 500-msec interval (offset to onset); and varying, with themterutterance interval randomly selected from the range 0, 100,200, 800,900, and 1,000 msec (thus giving a sequence with an average offset-to-onset interval of 500 msec). Pilot trials had indicated that a bimodal distribution of intervals was superior to a rectangular one in conveying a subjective impression of a brokenIrregular stream.

Procedure. The general procedure was identical to that of Expenment I. As before, conditions were allocated randomly to trials and the average level of sound was 65 dB(A). Again subjectswere asked to ignore any sound that they heard and they were toldthat they would not be tested on the contents of the auditory material. A typical experimental session lasted 35-40 min for eachsubject.

Results and DiscussionA two-factor repeated measures ANOVA was under

taken with auditory conditions (varying rate, unvaryingrate, and quiet) and serial position (1-7) as factors. Asusual, the effect of serial position was highly significant[F(6,114) = 12.76, MSe = 9.45,p < .01]. The effect ofauditory treatments was also significant [F(2,38) = 3.87,MSe = 5.32, p < .05], but serial position and treatmentdid not interact significantly [F(12,228) = 1.27, MSe =


the perception of the sharpest transience, and it has theattendant advantage of parsimony. Moreover, the resultwould harmonize well with findings ofJones et al. (1993)that cues to segmentation based on sharp transitions inpitch or loudness serve as the basis for segmenting theunattended stream.

In one respect, the results of Experiments 1 and 2 appear to be discrepant, since the degree ofdisruption produced by the steady-state conditions is less in Experiment 2 than it is in Experiment 1 (see Figures 1 and 4).At this juncture, it is worth recollecting that the maindifference between Experiments I and 2 is that in Experiment I there were up to three streams, whereas therewas always only one stream in Experiment 2. The discrepancy between conditions can be explained by supposing that there is a small residual degree ofdisruption,even in a single steady-state stream. By further supposing that this is additive across streams, we would thenpredict that as the number of streams increases, so willthe magnitude of this residual disruption. So, for thestereo (steady-state) conditions of Experiment 1 inwhich three streams were formed we would expect thiseffect to be larger than in the regular condition of Experiment 2 in which only one steady state stream wasformed. The degree of disruption brought about by thechanging-state conditions of Experiments 1 and 2 werealso appreciably different. This is not too surprising inview of the qualitative differences in the way in whichchanging state was induced in each case; we have to assume that such qualitative differences will lead to quantitative changes.

GENERAL DISCUSSION

The results, in summary, are the following: (I) Theusual disruption by changing-state stimuli of serial recall may be attenuated by assigning each syllable in thesequence to a different spatial location, so that they become steady-state streams (Experiment IA). (2) This effect is replicated with auditory stimuli that cannot beformed into a meaningful sequence and when the speechis presented only during a retention interval (Experiment IB). (3) The effects found in Experiments IA and1B are not due to a confounding of the number of locations. Ifwe co-vary the number of sources with the degree ofchanging state, we then find that disruption is dependent on both changing state and spatial location andthat maximal disruption occurs with changing-statestimuli from one spatial location, but there is no difference between mono and stereo steady-state conditions.(4) With sequences that consist ofa single repeated syllable, presentation of the syllable at regular intervalsproduces minimal disruption of serial recall, but thesame syllable, when presented irregularly, produces significant degrees of disruption. Amendments to Salameand Baddeley's (1982) account of the irrelevant speecheffect may be necessary in light of these findings. In addition, the results inform our own theoretical account ofthe irrelevant speech effect. Finally, the results carry

with them certain implications for the study of auditorystreaming, particularly in relation to whether streamingcan occur preattentively.

Implications for Models ofthe IrrelevantSpeech Effect

Without modification, the model of the irrelevantspeech effect offered by Salame and Baddeley (1982,1989) cannot account for the effects of streaming demonstrated here. In essence their model assumes that interference between what is heard and what is rehearsed isbased largely on the phonological similarity of the twostrands of material. In each of the present experiments,the relation between the phonology oflists' contents andthe irrelevant speech was fixed, yet it was possible toaugment or diminish the degree ofdisruption by streaming. Of course, the Salame and Baddeley (1982, 1989)position could be modified to incorporate the effects described here. One modification would be that nonchanging-state stimuli would not enter memory (thephonological store) and that organizational factors govern the streaming of sound to create either steady-stateor changing-state streams. However, even this modifiedapproach fails to explain why the processing undertakenwithin memory also partly determines the degree ofdisruption-in particular, that disruption by irrelevantspeech is minimal in tasks not requiring serial recall(Jones & Macken, 1993; Morris & Jones, 1990b; Salame& Baddeley, 1990).

Ofcourse, the present results are not sufficient inthemselves to overturn the Salame and Baddeley (1982, 1989)account. However, the weight of evidence, to which thepresent experiments contribute, is accumulating againsttheir position. There is only one clear demonstration ofthe phonological similarity effect with the irrelevantspeech paradigm (Salame & Baddeley, 1982, Experiment 5). In that study, irrelevant digits or words thatsounded like the to-be-remembered digits (e.g., tun forone, woo for two) disrupted serial recall more than didirrelevant dissimilar words (e.g., tennis, wicket, jelly).However, subsequent attempts to reproduce this effecthave been unsuccessful. In three experiments, Jones andMacken (in press) failed to find effects of phonologicalsimilarity similar to those of Salame and Baddeley (1982).Rather, this recent work suggests than within-streamphonological similarity is the important factor, so thatsequences ofirrelevant words that rhymed (e.g., sea,flea,key) interfered less than did those that did not rhyme(e.g., hat, cow, nest). Arguably, rhyming sequences contain less changing-state information than do those thatdo not rhyme.

The object-oriented episodic record model (O-OER;see Jones, 1993) attempts to explain the changing-stateeffect by supposing that auditory and visual event sequences are represented within memory as object!pointer combinations. An object is the representation ofa discrete event in the auditory environment. A pointerassociated with that object indicates, stochastically, thelocation of the next discrete object in a stimulus se-


quence. If an event is repeated, as in a steady-state sequence of sounds, then the pointer is self-referential;that is, only a single representation of the event is heldin memory, coupled to a single pointer that refers to theobject itself. If the sequence is made up of stimuli thatare different, as in a changing-state sequence, then theobjects and pointers form a related series. For example,in the case ofauditory stimuli, if a sequence arises froma single source such as the same speaker in a single spatiallocation, then the objects are joined by pointers thatencode the sequential organization of successive stimuli.

In the auditory case, it is assumed that objects are formedand linked according to the same principles as those ofauditory streaming. Therefore it is possible for severalstreams to be represented within memory, each made upofobject/pointer combinations. Interference ofserial recall by irrelevant speech arises because the process ofrehearsal also involves such objects and their associatedpointers. During rehearsal, the presence ofother object/pointer combinations such as those from changing-stateirrelevant speech makes the retracing of the sequence ofrepresentations of candidates for rehearsal more difficult. If the irrelevant sound consists ofa repeated syllable, only a single object with a single self-referentialpointer is set up. Here the retracing ofthe object/pointersequence for serial recall is less difficult, simply becausethere are fewer competing objects and pointers.

Implications for Auditory AttentionAt first blush, the results seem to fit with the idea that

factors related to auditory attention can provide a full account of the pattern of disruption found in the presentexperiments. Although we think that on balance thisseems to be a reasonable conclusion, a number ofcautionary notes need to be sounded. We can be reasonably confident that the results of Experiment I arise from thejoint action ofstreaming and changing state. The resultsof Experiment lC are particularly comforting in this regard by showing that a possible confounding effect ofthe mere number of locations is not responsible for theeffect. But we can be less sure about the results of Experiment 2, Here there is a possibility that the dominantfactor is not the suprasyllabic organization of sounds,but some factor related to the irregularity of the point ofgreatest change in the irrelevant stream, such as when arelatively long silence is broken, or at the end of a cluster of syllables. Here, the dominant factor is energy transition, rather than any higher order organizational factor.

Notwithstanding these reservations, the present results have a number ofgeneral implications for the studyof auditory streaming as well as for the study of memory, because we have been able to demonstrate that theeffects of streaming may be made manifest even whenthe subject's attention is directed away from the sound.Unlike previous work on auditory streaming in whichsubjective report was the only dependent variable, the irrelevant speech paradigm allows us to infer that streaming occurs when the sound is not in the attentional focus(but see Bregman & Rudnicky, 1975). Speculatively, we

may regard this organization as occurring at the preattentive level.

The advantage of irrelevant speech studies is that theindex ofstreaming is taken when the subject is instructedto focus attention away from the sound. It is thereforepossible to assert with more confidence than has hitherto been possible that auditory streaming is governedby preattentive processes (see also Jones et al., 1993, fora discussion). However, even in the irrelevant speechparadigm we cannot rule out the possibility that subjectsswitch attention momentarily to the irrelevant material,and thus it may only be safe to argue that the maintenance of the percept can occur without attention directed toward the auditory modality.

One way ofconstruing the different effects ofchangingstate and steady-state sequences on serial recall is to regard them as an effect of habituation. The notion of habituation seems to be inappropriate to the effects foundby us. First, it does not explain why the effect of speechdepends very much on the nature of the task undertakenby the subject; for example, only tasks requiring memoryfor serial order are disrupted by irrelevant speech (seeMorris & Jones, 1990b; Salame and Baddeley, 1990). Inaddition, it fails to predict or explain why the effect of irrelevant speech is greater for phonologically dissimilarlists than it is for phonologically similar lists (Colle &Welsh, 1976; Jones & Macken, in press; Salame & Baddeley, 1986a). Although habituation can explain the reduction in response to a repeated stimulus, it cannot,without considerable elaboration, be modified to accountfor the interaction of the effects of the repeated stimuluswith activities at a postcategoricallevel within memoryas the items from the to-be-remembered list are being rehearsed. Second, at an empirical level, there is a range offindings that do not fit neatly within an explanationbased on habituation. For example, in an examination ofthe possible effects of habituation over successive trialswith irrelevant speech conditions randomized over trials,there was no diminution in the irrelevant speech effect(Jones et al., 1992). Third, the typical exposure to soundin the present experiments is about 18 sec per trial, whichis a relatively short interval for habituation effects to develop. Habituation-like effects have been found (Morris& Jones, 1990a) for prior exposure to speech for the irrelevant speech effects, but for continuous exposures of20 min. Unpublished studies show little habituation over10 min (Bridges & Jones, 1992).

In reviewing the association between attention andmemory, Cowan (1988) notes:

A lingering doubt about all of the effects of selective attention is that they might occur in memory rather than perception; that is, subjects might initially encode attended andunattended stimuli equally well, but they might translatethe attended target into a response sooner while allowinginformation from an unattended or poorly attended channel to decay. (p. 175)

Although it may be premature to convert this doubtinto certainty on the basis of present research alone, we


argue that it is gradually being established that p.erceptual organization is an inherent property of what IS usually regarded as short-term storage, not a precursor to it,and we claim that phenomena of auditory attention maybe explained with reference to these organizational factors (see also Penney, 1989, for a discussion of factorsrelating to streaming in memory).

REFERENCES

BREGMAN, A. S. (1990). Auditory scene analysis. Boston, MA: MITPress.

BREGMAN, A. S., & RUDNICKY, A. (1975). Auditory segregation:Stream or streams? Journal of Experimental Psychology: HumanPerception & Performance, 1,263-267.

BRIDGES, A., & JONES, D. M. (1992). Habituation and the irrelevantspeech effect: The role ofduration. Unpublished manuscript.

COLLE, H. A. (1980). Auditory encoding in visual short-term recall:Effects of noise intensity and spatial location. Journal of VerbalLearning & VerbalBehavior, 19,722-735.

COLLE, H. A., & WELSH, A. (1976). Acoustic masking in primarymemory. Journal ofVerbalLearning & VerbalBehavior, 15, 17-32.

COWAN, N. (1988). Evolving conceptions ofmemory storage, selectIveattention, and their mutual constraints within the human information-processing system. Psychological Bulletin, 104,163-191.

HANDEL, S. (1989). Listening: An introduction to the perception ofau-ditory events. Cambridge, MA: MIT Press. .

JONES, D. M. (1993). Objects, streams and threads of auditory attention. In A. D. Baddeley & L. Weiskrantz (Eds.), Attention, awarenessand control (pp. 87-104). Oxford: Oxford University Press.

JO~ES,D. M. (1994). Disruption of memory for lipread lists by irrelevant speech: Further support for the changing state hypothesis.Quarterly Journal ofExperimental Psychology, 47A, 143-160.

JONES, D. M., & MACKEN, W. J. (1993). Irrelevant tones also producean irrelevant speech effect: Implications for phonological coding inmemory. Journal ofExperimental Psychology: Learning, Memory,& Cognition, 19, 369-381.

JONES, D. M., & MACKEN, W.J. (in press). Phonological similarity in theirrelevant speech effect: Within- or between-stream similarity? Journal ofExperimental Psychology: Learning, Memory, & Cognition.

JONES, D. M., MACKEN, W.J., & MURRAY, A. C. (1993). Disruption ofvisual short-term memory by changing-state auditory stimuli: Therole of segmentation. Memory & Cognition, 21, 3 I 8-328.

JONES, D. [M.], MADDEN, C., & MILES,C. (1992). Privileged access byIrrelevant speech to short-term memory: The role ofchanging state.Quarterly Journal ofExperimental Psychology, 44A, 645-669.

JONES, D. M., MILES, C, & PAGE, J. (1990). DisruptIon ofproof-readingby irrelevant speech: Effects of attention, arousal or memory? Applied Cognitive Psychology, 4, 89-108.

JONES, D. M., & MORRIS, N. (1992). Irrelevant speech and senal recall:Implications for theories ofattention and working memory. Scandinavian Journal ofPsychology, 33, 212-229.

MILES, C, JONES, D. M., & MADDEN, C. (1991). Locus of the irrelevantspeech effect in short-term memory. Journal ofExperimental Psychology: Learning, Memory, & Cognition, 17,578-584.

MORRIS, N., & JONES, D. M. (1990a). Habituation to irrelevant speech:Effects on a visual short-term memory task. Perception & Psycho-physics, 47, 291-297. .. .

MORRIS, N., & JONES, D. M. (I 990b). Memory updating In workingmemory: The role of the central executive. British Journal ofPsychology, 81, I II-121.

PENNEY, C. G. (1989). Modality effects and the structure ofshort-termverbal memory. Memory & Cognition, 17,398-422.

ROYER, F. L., & ROBIN, D. A. (1986). On the perceived unitizationof repetitive auditory patterns. Perception & Psychophysics, 39,9-18.

SALAME, P., & BADDELEY, A. D. (1982). Disruption of short-termmemory by unattended speech: Implications for the structure ofworking memory. Journal of Verbal Learning & Verbal Behavior,21, 150-164.

SALAME, P., & BADDELEY, A. D. (1986a). Phonological factors in STM:Similarity and the unattended speech effect. Bulletin ofthe Psychonomic Society, 24, 263-265.

SALAME, P., & BADDELEY, A. D. (l986b). The unattended speech effect: Perception or memory? Journal ofExperimental Psychology:Learning, Memory, & Cognition, 12, 525-529.

SALAME, P., & BADDELEY, A. D. (1987). NOise, unattended speech andshort-term memory. Ergonomics, 30,1185-1194.

SALAME, P., & BADDELEY, A. D. (1989). Effects of background musicon phonological short-term memory. Quarterly Journal ofExperimental Psychology, 41A, 107-122.

SALAME, P., & BADDELEY, A. [D.] (1990). The effects of irrelevantspeech on immediate free recall. Bulletin ofthe Psychonomic Society, 28, 540-542.

NarE

I. Thanks are due Nelson Cowan for making this suggestIon duringthe review process.

(Manuscript received August 26, 1993;revision accepted for publication March 14, 1994.)

Documents

Organizational factors in the effect of irrelevant speech