11
Research Report Electrophysiological attention effects in a virtual cocktail-party setting Thomas F. Münte a,b, , Dörte K. Spring a , Gregor R. Szycik a,c , Toemme Noesselt d a Department of Neuropsychology, University Magdeburg, P.O. Box 4120, D-39016 Magdeburg, Germany b Center for Behavioral Brain Sciences, University Magdeburg, D-39016 Magdeburg, Germany c Department of Psychiatry, Hannover Medical School, D-30623 Hannover, Germany d Department of Neurology, University Magdeburg, P.O. Box 4120, D-39016 Magdeburg, Germany ARTICLE INFO ABSTRACT Article history: Accepted 16 October 2009 Available online 22 October 2009 The selection of one of two concurrent speech messages for comprehension was investigated in healthy young adults in two event-related potential experiments. The stories were presented from virtual locations located 30° to the left and right azimuth by convolving the speech message by the appropriate head-related transfer function determined for each individual participant. In addition, task irrelevant probe stimuli were presented in rapid sequence from the same virtual locations. In experiment 1, phoneme probes (/da/ voiced by the same talkers as attended and unattended messages) and band- pass filtered noise probes were presented. Phoneme probes coinciding with the attended message gave rise to a fronto-central negativity similar to the Nd-attention effect relative to the phoneme probes coinciding with the unattended speech message, whereas noise probes from the attended message's location showed a more positive frontal ERP response compared to probes from the unattended location resembling the so-called rejection positivity. In experiment 2, phoneme probes (as in exp. 1) and frequency-shifted (+ 400 Hz) were compared. The latter were characterized by a succession of negative and positive components that were modulated by location. The results suggest that at least two different neural mechanisms contribute to stream segregation in a cocktail-party setting: enhanced neural processing of stimuli matching the attended message closely (indexed by the Nd- effect) and rejection of stimuli that do not match the attended message at the attended location only (indexed by the rejection positivity). © 2009 Elsevier B.V. All rights reserved. Keywords: Auditory Attention Virtual reality Spatial information Event-related potential 1. Introduction One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it the cocktail party problem.No machine has been constructed to do just this, to filter out one conversation from a number jumbled together(Cherry, 1957). This classic observation by Cherry illustrates adequately the exquisite ability of human listeners to actively discern one speaker in the presence of one or multiple competing sound sources. Indeed, at a cocktail-party a listener must perceptually BRAIN RESEARCH 1307 (2010) 78 88 Corresponding author. Department of Neuropsychology, University of Magdeburg, P.O. Box 4120, D-39016 Magdeburg, Germany. Fax: +49 391 6711947. E-mail address: [email protected] (T.F. Münte). 0006-8993/$ see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.brainres.2009.10.044 available at www.sciencedirect.com www.elsevier.com/locate/brainres

Electrophysiological attention effects in a virtual cocktail-party setting

Embed Size (px)

Citation preview

B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

ava i l ab l e a t www.sc i enced i r ec t . com

www.e l sev i e r . com/ loca te /b ra i n res

Research Report

Electrophysiological attention effects in a virtualcocktail-party setting

Thomas F. Müntea,b,⁎, Dörte K. Springa, Gregor R. Szycika,c, Toemme Noesseltd

aDepartment of Neuropsychology, University Magdeburg, P.O. Box 4120, D-39016 Magdeburg, GermanybCenter for Behavioral Brain Sciences, University Magdeburg, D-39016 Magdeburg, GermanycDepartment of Psychiatry, Hannover Medical School, D-30623 Hannover, GermanydDepartment of Neurology, University Magdeburg, P.O. Box 4120, D-39016 Magdeburg, Germany

A R T I C L E I N F O

⁎ Corresponding author. Department of Neuro391 6711947.

E-mail address: [email protected]

0006-8993/$ – see front matter © 2009 Elsevidoi:10.1016/j.brainres.2009.10.044

A B S T R A C T

Article history:Accepted 16 October 2009Available online 22 October 2009

The selection of one of two concurrent speech messages for comprehension wasinvestigated in healthy young adults in two event-related potential experiments. Thestories were presented from virtual locations located 30° to the left and right azimuth byconvolving the speech message by the appropriate head-related transfer functiondetermined for each individual participant. In addition, task irrelevant probe stimuli werepresented in rapid sequence from the same virtual locations. In experiment 1, phonemeprobes (/da/ voiced by the same talkers as attended and unattended messages) and band-pass filtered noise probes were presented. Phoneme probes coinciding with the attendedmessage gave rise to a fronto-central negativity similar to the Nd-attention effect relative tothe phoneme probes coinciding with the unattended speechmessage, whereas noise probesfrom the attended message's location showed a more positive frontal ERP responsecompared to probes from the unattended location resembling the so-called rejectionpositivity. In experiment 2, phoneme probes (as in exp. 1) and frequency-shifted (+400 Hz)were compared. The latter were characterized by a succession of negative and positivecomponents that weremodulated by location. The results suggest that at least two differentneural mechanisms contribute to stream segregation in a cocktail-party setting: enhancedneural processing of stimuli matching the attended message closely (indexed by the Nd-effect) and rejection of stimuli that do not match the attended message at the attendedlocation only (indexed by the rejection positivity).

© 2009 Elsevier B.V. All rights reserved.

Keywords:AuditoryAttentionVirtual realitySpatial informationEvent-related potential

1. Introduction

“One of our most important faculties is our ability to listen to,and follow, one speaker in the presence of others. This is sucha common experience that wemay take it for granted; wemaycall it “the cocktail party problem.” No machine has been

psychology, University of

i-magdeburg.de (T.F. Mün

er B.V. All rights reserved

constructed to do just this, to filter out one conversation froma number jumbled together” (Cherry, 1957).

This classic observation by Cherry illustrates adequatelythe exquisite ability of human listeners to actively discern onespeaker in the presence of one or multiple competing soundsources. Indeed, at a cocktail-party a listenermust perceptually

Magdeburg, P.O. Box 4120, D-39016 Magdeburg, Germany. Fax: +49

te).

.

1 Localization in the vertical plane (elevation) and front– backdiscrimination requires an analysis of spectral shape cues thaarise from direction-dependent reflections within the pinna(described by so-called head-related transfer functions [HRTFs](Blauert, 1996). The latter mechanism essentially constitutes amonaural localization cue (Van Wanrooij and Van Opstal, 20042005). As anatomical properties of pinnae differ from individual toindividual, HRTFs are also subject-specific. By convolving anauditory signal with the individual HRTF and presenting theresulting individualized signal via headphones a realistic 3Dvirtual auditory environment with precise locations of stimuli canbe created.

79B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

segregate the sequences of speech sounds (e.g., syllables,words) spoken by different individuals into separate streams.This implies two things: first, sounds emitted at different timesby a given source must be treated as part of the same ongoingstream. Second, temporally adjacent or overlapping soundsfrom other sources must be segregated (Carlyon, 2004).

A plethora of studies has dealt with the question whichauditory features are used for the streamingprocesses requiredfor listening at a cocktail-party in humans (Bronkhorst, 2000;Yost, 1997) and in similar situations in animals (Bee andMicheyl, 2008; Langemann and Klump, 2005). Likely cues forstream segregation include spatial information (e.g., inter-aural timing and level differences and information imprintedon the signal by the individual properties of the outer earstructures, see below), fundamental frequency (F0) andharmonic relationships, timbre, and patterns of amplitudemodulation (Bregman, 1990; Bregman, 1993; Cusack et al.,2004; Moore and Gockel, 2002).

Whereas early electrophysiological investigations showthat spatial information (i.e., sound source localization) canlead to pronounced selection effects in the event-relatedpotential (ERP; Hansen et al.,, 1983; Hansen and Hillyard, 1980;Hillyard et al.,, 1973; Woods et al.,, 1984), the majority ofpsychophysical studies in humans have implied, however,that spatial cues might be of less importance for theperceptual integration and segregation of concurrent speechsounds than, e.g., harmonicity or onset synchrony (Cullingand Summerfield, 1995; Darwin, 2006; Hukin and Darwin,1995; but see Arbogast et al., 2002, 2005; Drennan et al., 2003;Kidd et al., 2005 for different views). This literature has beenreviewed recently by Darwin (2008).

Indeed the channelling theory of stream segregation(Beauvois and Meddis, 1996) posits that sounds that exciteoverlapping sets of peripheral filters tend to be heard as thesame streamand thus suggests that theharmonic organizationof a sound source is of paramount importance. This has beencalled into question, however, by findings that even soundsthat excite the same peripheral channels can be separatedbased on timbre or modulation rate differences (Grimault,Bacon, and Micheyl, 2002; Vliegen and Oxenham, 1999).

Taken together, no clear picture emerges and it appearsthat the auditory systemmay be very flexible in its use of cuesto separate sounds into different streams (Moore and Gockel,2002). Also, many of the psychophysical and neurophysio-logical studies on stream segregation processes have usedstimulus scenarios that were only remotely reminiscent of atrue cocktail-party situation with two or more simultaneousspeech messages.

We therefore set out to test auditory selection in a moredemanding situation and to obtain the neural signatures ofthe selection processes involved using event-related brainpotentials. In a previous study (Nager et al., 2008), building onearlier work (Hink and Hillyard, 1976; Morrell, 1969; Shucardet al., 1981; Woods et al., 1984), we had created a virtualacoustic environment by recording stories read from locations−70, 0 and 70° azimuth using an artificial (“dummy”) headwith microphones embedded in the “auditory canals” tocapture the interaural time and level differences as well assome of the filter properties of the outer ear structures(Damaske and Wagener, 1969; Minnaar et al., 2001; Møller et

al., 1999; Wersényi, 2007). When replayed via headphonesstories were perceived as being located outside of the head atthe locations specified during the recording. The participants'task was to attend either to the rightmost or the leftmostmessage in order to comprehend the story. Superimposed onthe speech messages task irrelevant probe stimuli (syllablessharing spatial and spectral characteristics with the speechmessages, 4 probes/second) were presented that were used forthe generation of ERPs. ERPs to probe stimuli were character-ized by a negativity starting at 250 ms with a contralateralfrontal maximum for the probes sharing spatial/spectralfeatures of the attended story relative to those for theunattended message. This study extended earlier findingsusing presentation of speechmessages to the left and right earrather than to both ears at virtual locations. For example, Hinkand Hillyard (1976) found that syllable probes within theattended message showed an increased negativity similar toattention effects obtained to similar stimuli when they weretask-irrelevant (Hansen et al., 1983; Hillyard et al., 1973). Againusing a dichotic presentation mode, Woods et al. (1984) usedsyllables (“but”, “a”) and tone bursts at the mean fundamentaland second formant frequencies of the speaker's voice asirrelevant probes superimposed on speech-messages. Thisstudy revealed an early onset (approximately 100 ms), longlasting negative shift for the syllable probe “but”, whereas forthe tone-burst probes an early positivitywas observed between200 and 300 ms. Woods et al. (1984) concluded that stimulusselection during attention to speech messages is specificallydirected to speech sounds rather than simply to the attendedear or to constituent frequencies of the speech message.

While delivering important information, these previousstudies have left a number of questions unanswered. Wetherefore conducted two further experiments employing theprobe stimulus approach with the aim to (a) delineate theauditory features that areused for selectionof a relevantwithinseveral speechmessages and (b) to specifically re-examine therole of spatial information for these selection processes.

Improving over our previous study (Nager et al., 2008), wecreated a virtual auditory environment by convolving stimu-lus materials with individually determined head-relatedtransfer functions (HRTF).1 The precision of localization ofsound sources is improved over the dummy head approach(Minnaar et al., 2001) and over the use of non-individualizedcanonical HRTFs (Wenzel, Arruda, Kistler, and Wightman,1993). In both experiments, participants were exposed to twoconcurrent and continuous speech messages presented viaheadphones at virtual locations coming from 30° to the leftand 30° to the right with the task to attend to either the left or

t

)

,

Fig. 1 – Experiment 1. Group average ERPs for irrelevantphoneme probe stimuli coinciding with the attended(blue line) and unattended (red line) speech messages. ERPdata are collapsed for left and right probe stimuli, thuslateralized electrode pairs are displayed with regard tostimulus presentation (ipsi-/contralateral). “Attended” probestimuli show a more negative ERP from 250 ms onwardswhich was slightly more pronounced at the contralateralfrontal electrode. Spline-interpolated topographical map(“attended” minus “unattended” difference wave, meanamplitude 300–500 ms) shows a frontal distribution of this

80 B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

the right speech message to answer questions about theparticular story afterwards. Rapid sequences of probe stimuliwere superimposed on the speechmessages and were used toobtain ERPs.

In the first experiment we used two kinds of probes:phoneme probes (/da/) that were prepared from utterances ofthe speakers and thus coincided with the attended (andunattended) speech messages with regard to both, locationand spectral content, and noise probes. The latter werecreated by band-pass filtering white noise. This probestimulus was created to follow up on the observation ofWoods et al. (1984) who obtained a positivity, rather than annegative “Nd”-attention effect, for probe stimuli coming fromthe attended location but not possessing the spectral featuresof the attended message. Later work has suggested theexistence of a so-called rejection positivity (RP) with is thoughtto be associatedwith the rejection of sounds that do notmatcha neural representation (a.k.a. attentional trace) of theattended sound features (Alho et al., 1987; Alho et al., 1994;Degerman et al., 2008; Melara et al., 2002; Michie et al., 1990;Michie et al., 1993). We thus expected to see an enhancednegativity for phoneme probes coinciding with the attendedmessage relative to phonemeprobesmatching the unattendedmessage, whereas a more positive ERP response for attendedlocation noise probes relative to unattended location noiseprobeswas predicted. In the second experimentwe contrastedphoneme probes cut from the speakers utterances (as inexperiment 1) with phoneme probes that were shifted infrequency (+400 Hz). Importantly, if a modulation of the ERPresponse to the frequency-incongruent phoneme probe as afunction of attended vs. unattended location is seen, thiswould suggest a role for spatial information in messageselection in this realistic cocktail-party situation.

attention-related effect for the phoneme probes.

2. Results, experiment 1

2.1. Behaviour

On average, 71% of the questions were answered correctly(range 50% to 92%). As the questions were open-type (e.g.,What was Andy's job? or From what disease did he suffer?)requiring written answers by the participants rather than ayes/no response, it can be concluded that the participants hadactively attended to the designated story.

2.2. Event-related potentials

Fig. 1 displays group average ERPs to phonemes on theattended and unattended side for a set of anterior electrode-sites. After an initial positivity peaking at around 130 ms, thewaveforms for phonemes coinciding with the attended andthe unattended story started to diverge at around 200 ms: Thephonemes coinciding with the attended story were consider-ably more negative for the rest of the epoch.

At the bottom of Fig. 1 the distribution of the attended–unattended phoneme difference wave (mean voltage between300 and 500 ms) is displayed. A maximum at the Cz electrodeand a slightly contralateral (with respect to the eliciting probestimulus) distribution becomes apparent.

An ANOVA on the mean voltage between 300 and 500 msfor a set of frontocentral electrodes (F1/2, F3/4, F5/6, Fc1/2,Fc3/4, Fc5/6) for which the attentional effect was maximalrevealed a main effect of attention on the probe stimuluswaveforms (F(1,16)=11.35; p=0.003). The attention×lateralityinteraction was non-significant (F(1,16)=2.1).

The ERPs to the noise probes are displayed in Fig. 2.Compared to thephonemeprobesa ratherdifferentmodulationas a function of attention was seen. Those noise probe stimuliwhich coincided with the attended speechmessage in locationwere associated with a more positive waveform starting ataround 250 ms. Again, this effect was largest for frontocentralelectrode sites. Statistically (same electrodes and time-windowas above) this was reflected by amain effect of attention on thenoise probe ERPs (F(1,16)=9.3; p<0.01). The attention×lateralityinteraction was non-significant (F(1,16)=1.2).

3. Discussion, experiment 1

Behavioural data indicated that focussing on the relevantspeech message effectively takes place in this complex

Fig. 3 – Experiment 2. Group average ERPs for irrelevantphoneme probe stimuli coinciding with the attended (black

Fig. 2 – Experiment 1. Group average ERPs for irrelevant noiseprobe stimuli coinciding with the attended (black line) andunattended (grey line) speech messages. ERP data arecollapsed for left and right probe stimuli, thus lateralizedelectrode pairs are displayed with regard to stimuluspresentation (ipsi-/contralateral). “Attended” probestimuli show a more positive ERP from 250 ms onwards.This contrasts sharply with the effects found for phonemeprobes.

81B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

everyday listening situation. ERPs to both types of probestimuli were modulated as a function of whether or not theycoincided with the attended speech message with a strikinglydifferent pattern, however. Whereas the phoneme probescoinciding with the attended message showed an enhancednegativity much like the standard Nd attention effect as it hasbeen described for task-relevant tone (Degerman et al., 2008;Hansen and Hillyard, 1980; Hansen and Hillyard, 1983), noise(Nager et al., 2003; Teder-Salejarvi et al., 1999), or phoneme(Hansen et al., 1983; Szymanski et al., 1999) stimuli, the noiseprobes coinciding with the attended speech message's loca-tion were rather associated with a positive shift relative to theunattended location. We interpret this effect as an instance ofthe rejection positivity (Alho et al., 1987; Alho et al., 1994;Degerman et al., 2008; Melara et al., 2002; Michie et al., 1990;Michie et al., 1993) that is seen for stimuli which do not matchthe relevant features of an attentional trace (Näätänen et al.,2002). It is of note that a difference between attended andunattended location noise probe ERPs was obtained, as thelocation (simulated by convolving the identical stimuli by theappropriate HRTF) and thus the spatial relation to theattended speech message was the only difference betweenthe two conditions. This suggests that in this realistic cocktail-party situation spatial information is used for the separationof relevant and irrelevant information. We will return to thispoint in the general discussion.

line) and unattended (grey line) speech messages. ERP dataare collapsed for left and right probe stimuli, thus lateralizedelectrode pairs are displayed with regard to stimuluspresentation (ipsi-/contralateral). “Attended” probe stimulishow a more negative ERP from 450 ms onwards.Spline-interpolated topographical map (“attended” minus“unattended” difference wave, mean voltage in the 390- to530-ms interval) shows a centroparietal maximum of thisattention-related effect for the phoneme probes.

4. Results, experiment 2

4.1. Behaviour

On average, 79% of the questions were answered correctly(range 54% to 98%). Again, this indicates that the participantshad actively attended the designated story.

4.2. Event-related potentials

Fig. 3 displays group average ERPs to frequency congruentphoneme probes. As in experiment 1 a more negativewaveform is observed for the probe stimuli coinciding withthe attended story in location. The effect occurred consider-ably later (approximately 400 ms to 600 ms) and had a moreposterior distribution than a typical Nd attention effect. Priorto this negative effect, there was an enhanced positivitybetween 250 and 390 ms. This was quantified by a meanamplitude measure (frontocentral electrodes, F3/4, Fc5/6) inthe time-window 250–390 ms. A main effect of attention wasobtained (F(1,23)=9.16, p=0.006). For the subsequent negativity(mean amplitude measure 400–600 ms) the main effect ofattention was marginally significant for the frontocentralelectrode set (F(1,23)=3.9, p=0.06). Because of the negativity'srather posterior distribution, an additional ANOVA wasperformed for a set of parietal electrode sites (Cp5/6, P3/4,same time-window) which revealed a main effect of attention(F(1,23)=7.9, p<0.01) as well as an attention×laterality inter-action (F(1,23)=8.3, p<0.01).

Fig. 4 illustrates the effects for frequency incongruentprobe stimuli for which a quite different pattern than for thefrequency matching probes was seen. Those probe stimulithat emanated from the same location as the attended story

Fig. 4 – Experiment 2. Group average ERPs for irrelevantphoneme probes that were shifted in frequency by 400 Hz(upwards) coinciding with the attended (blue line) andunattended (red line) speech messages. ERP data arecollapsed for left and right probe stimuli, thus lateralizedelectrode pairs are displayed with regard to stimuluspresentation (ipsi-/contralateral). “Attended” probe stimulishow an enhanced negativity peaking at 180 ms followed bya positivity peaking at 260 ms. The topographic mapillustrates the distribution of the positivity (mean amplitudeof “attended” minus “unattended” difference wave intime-window 240–290 ms).

82 B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

were associated with a phasic negativity peaking at 180 msfollowed by more posterior positivity peaking at 260 ms.

For the negativity (time-window 150–230 ms, F3/4, Fc5/6) amain effect of attention (F(1,23)=4.95, p<0.05) was seenwhichwasmorepronounced ipsilaterally (attention×laterality,(1,23)=11.26, p<0.003). The subsequent positivity (240–290 ms,Cp5/6, P3/4) showed amain effect of attention as well (F(1,23)=8.50, p<0.01; attention×laterality, F(1,23)=3.1, n.s.).

2 Whereas in experiment 2 ERPs to phoneme probes weremodulated as a function of whether or not they coincided withthe attended message in fundamental frequency and location,the modulation effect showed a different latency and topographycompared to experiment 1. We will return to this issue later in thediscussion.

5. Discussion, experiment 2

The effects seen for the frequency-congruent phoneme probesagain showed an (albeit late) modulation as a function of theirrelation to the attended speech message that resembled theNd effect. In our previous study (Nager et al., 2008) using threedifferent speech messages placed at −70, 0, and 70°, welikewise observed a rather late attention-related effect on thephoneme probe ERPs which we attributed to the complexity ofthe auditory scene used in that experiment.

More importantly, the ERPs to the frequency-incongruentprobes showed a marked modulation as a function of theirspatial relation to the attended speech message. For probesoccurring at the location of the attended message a biphasiceffect with an early negativity (peak 180 ms) and a subsequentpositivity (peak 260 ms) was observed. We suggest that thismay reflect a mismatch negativity followed by P3a typeresponse. As with the noise probes in experiment 1, thisfinding suggests that spatial information is used in theseparation of relevant and irrelevant auditory information atthe cocktail-party.

6. General discussion

We report two ERP experiments simulating a cocktail-partysetting in virtual auditory space with one of the speechmessages being relevant to the listener. By employing task-irrelevant probe stimuli we demonstrate different selectionmechanisms: ERPs to phoneme probe stimuli that coincidedwith the attended speaker's voice and location were associ-ated with amore negative ERP relative to when they coincidedwith the unattended speechmessage (experiments 1 and 2). Inexperiment 1, this negativity was similar to the Nd component(Hansen and Hillyard, 1980; Hillyard et al., 1998) that has beendescribed for selective attention experiments which requiredthe attend one stream of stimuli while ignoring a concurrentstream.2 Noise probes on the other hand were associated witha positive shift, which likely is an instance of the rejectionpositivity (Alho et al., 1987; Alho et al., 1994; Degerman et al.,2008; Melara et al., 2002; Michie et al., 1990; Michie et al., 1993).Finally, frequency shifted phoneme probes in experiment 2showed a biphasic negative-positive modulation as a functionof whether or not they coincided with the attended speechmessage's location.

These results are remarkable for two reasons: First, on ageneral level, they underscore the utility of ERPs, as theyallow, unlike purely behavioural methods, to study how theauditory system and the brain deal with stimuli that are nottask relevant (Nager et al., 2003). Second, and more specifi-cally, we will argue below that the results suggest that spatialinformation is used for the segregation of attended speechwhich is at odds with a number of behavioural/psychophys-ical studies.

6.1. Role of spatial information: behavioural findings

As pointed out in the introduction, themajority of behaviouralstudies have suggested a rather minor role of spatial cues insegregation processes underlying perception in cocktail-party-like situations. An often cited example is the study byCulling and Summerfield (1995). These researchers presented

83B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

listeners with four narrowband noises at center frequenciesappropriate for the first two formants of, e.g., the vowels /i/(bands 1 and 2) and /a/ (bands 3 and 4). If these noise-bandpairs were presented to opposite ears, participants readilyidentified the appropriate vowel in their left or right ear.However, if interaural time delays (ITD) as natural spatial cueswere manipulated rather than presenting the noise-bands todifferent ears, participants' performance greatly declined. Bytransferring the Culling and Summerfield paradigm into afreefield listening situation with presentation via loud-speakers, Drennan et al. (2003) could demonstrate a greatimprovement of participants' ability to perceive the vowelsand concluded that “the full range of cues that normallydetermine perceived spatial location provided sufficientinformation for segregation.” Whereas Drennan et al.'s resultsuggests that spatial features may play a role, the paradigmused by both Culling and Summerfield (1995) and Drennanet al. (2003) only captures one particular aspect of the cocktail-party problem.

Darwin (2008) pointed out that for a mixture of two voicesat roughly equal levels the main problem for the auditorysystem is not to detect features but to allocate each localspectrotemporal feature to the appropriate voice (Cooke andEllis, 2001). In this situation, the importance of spatialproperties should depend on whether there are other differ-ences between the two talkers. To investigate this issue, oftenthe so-called Coordinate Response Measure (CRM) task isused, inwhich stereotyped sentences (such as: Ready Baron goto green three now) are presented with the participant's taskbeing to identify either the color or the digit of the targetsentence. Using a variant of the CRM, Arbogast et al. (2002)studied the effects of spatial separation of masking sentences(synthesized from the original CRM materials) on the intelli-gibility of a speech signal for three types of maskers. Thecritical sentence was played from straight ahead and themasking sentencewas played either from the same location orfrom 90° to the right. Signals andmaskers were filtered into 15frequency bands, and the envelopes from each band wereused to modulate pure tones at the center frequencies of thebands. The critical sentence was generated by summingtogether 8 of the 15 frequencies. Maskers were generatedfrom sentences (a) by summing together 6 frequency bandsnot present in the critical sentence (resulting in an intelligiblesentence), (b) by convolving a sentence produced as specifiedin (a) with Gaussian noise (resulting in an unintelligibleamplitude-modulated multi-tone complex sharing the spec-tral properties of (a)), and (c) by applying the same procedureas in (b) but to a sentence sharing the 8 frequencies with thecritical sentence. Importantly, for sentence maskers of condi-tion (a) the effect of spatial separation averaged 18 dB (at the51% correct level). As there was no spectral overlap betweenthe critical and masking sentence, this suggests that spatialcues are used in an obligatory fashion even in circumstancesthat would allow separation on the basis of spectral cues (seealso, Arbogast et al., 2005). Again using a variant of the CRMtask, Kidd et al. (2005) studied the role of focused attention tospatial location in a multi-talker situation. The task of thelistener was to identify key words from a target talker in thepresence of two other simultaneous messages. When spatiallocation information was provided before the trial, the

performance was greatly improved implying that the focusof attention along the spatial dimension can play a role insolving the qcocktail-partyq problem.

6.2. Current study

Taken together, these recent behavioural investigationssuggest that spatial information might be exploited in tasksthat relatively closely mimic the cocktail-party situation butthey do not speak to the mechanisms that might be used forthe separation of the relevant speechmessage by the auditorysystem. Our current studymay give first hints as to the natureof these mechanisms. Phoneme probes coinciding with theattended speech message with regard to both, location andtalker identity, were associated with a negative shift akin theNd effect. This attention-related negativity has receiveddifferent interpretations over the years. For example, Hillyardet al. (1998) have suggested that the Nd reflects a selectionprocess based upon easily discriminable information thatdistinguishes the relevant from the irrelevant “channel,” suchas frequency and/or location. The later part of the Nd isthought to reflect the selection of targets from non-targetswithin the attended channel based on less discriminable cues.An alternative to this feature-based account has beenproposed by Näätänen and colleagues (Näätänen et al., 2002)who suggested that the Nd reflects a selection process basedupon a gradual comparison between the sensory input and anattentional trace, which is thought of as a temporary neuronalrepresentation of the distinctive features of the task-relevantstimuli and which is actively formed and maintained duringselective listening. According to Näätänen, all incomingstimuli are compared to the attentional trace but thecomparison process is aborted at some point for mismatching(task-irrelevant) stimuli. The onset of the Nd thus reflects thetime needed to stop comparing the task-irrelevant stimuli. Athird, object-based hypothesis has been introduced by Alainand Arnott (2000). According to this hypothesis, attention isallocated to an auditory object rather than a feature. As anauditory object may be defined by its physical features,attention to a particular auditory object would thus lead toattention-related changes in those brain areas involved inprocessing the properties of the object, such as frequency andlocation.

From the fact that only the frequency-congruent phonemeprobes but not to the noise probes in experiment 1 led to anNd-like negativity we conclude that speechmessage selectionin the current experiment cannot be solely driven by feature-based attention directed to the message's location, as thisshould have led to an Nd-like enhancement for all probescoming from the attended message's location, i.e. also for thenoise probes. Whether the Nd in the frequency congruentprobes is driven by matching them with the attentional tracesensu Näätänen or by object-based selection mechanismscan not be determined from the current set of data.

On the other hand, the fact that both, noise probes(experiment 1) and frequency-incongruent phoneme probes(experiment 2) showed a differential ERP response as afunction of whether the probes matched the attendedmessage's location implies that spatial information plays animportant role in segregation processes.

84 B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

For the noise probes a positive shift was seen for thoseprobes coinciding with the attended message in locationrelative to probes coinciding with the other message'slocation. If only a spectral mismatch with the attendedmessage would drive the rejection of these probes, this shouldapply similarly to the noise probes coinciding with theattended message location and those coinciding with theother message's location (in fact, HRTF derived spatial cuesare the only ones different for the two different noise probes).Such positive modulations with anterior distributions havebeen described in the literature as reflecting the rejection ofunattended sounds that do not match the attentional trace(Alho et al., 1987; Alho et al., 1994; Melara et al., 2002; Michieet al., 1990; Michie et al., 1993). In fact, one of the probe classesin an earlier dichotic listening task using the probe approach(Woods et al., 1984) also was associated with a positivity.Based on detailed mapping of magnetoencephalographicresponses obtained in tasks requiring attention to eitherpitch and location, Degerman et al. (2008) recently concludedthat the Nd and rejection positivity are generated by at leastpartially separate processes. Because a rejection positivity wasseen for the noise probes at the attended message's locationonly, it appears that such an active rejection mechanism wasengaged only for those noise probes that came from the rightlocation. Thus, the pattern for the noise probes is consistentwith a two-step segregation process: noise probes coincidingwith the location of the attended message (first step: spatialselection) are subsequently rejected based on their spectralproperties by a process indexed by the rejection positivity(second step: selection based on spectral properties).

The frequency-incongruent phoneme probes in experi-ment 2 coming from the location of the attended messagewere associated with a different ERP compared to thosecoming from the unattended message's location, as well. Inthis case, however, no positive shift was seen but rather abiphasic response with an initial negativity followed by asubsequent positivity. We propose that these correspond to amismatch negativity and P3a component (Näätänen et al.,2007), respectively. The exact identification of these effects(which will be addressed in a subsequent study) notwith-standing, the fact that the same physical stimuli contributedto both waveforms means that spatial information has beenused in the segregation processes required by the currentexperiment, as the spatial coincidence with the attendedspeech message was the only difference between the two. Asan interim summary, the present results thus clearly showthat spatial and spectral properties contribute to the messageselection.

We would now like to address two additional issues ofconcern. First, there was a marked difference between themodulations for phonemeprobes coincidingwith the speaker'sidentity and location between experiments 1 and 2. Whereasthe modulation seen in experiment 1 was quite similar withregard to distribution and morphology to the Nd effect seen inexperiments requiring the identification of targets within anattended stream of phoneme or tone stimuli in the presence ofanother concurrent stream, theeffect in the secondexperimentoccurred considerably later, showed a more posterior distribu-tion and was preceded by a positivity. As pointed out in thediscussion of experiment 2, a previous study from our group

using a similar design but non-individualized stimuli had alsorevealed a rather late negativity (Nager et al., 2008) but with amore typical anterior distribution. In fact, the latency of the Ndhas been shown to vary considerably as a function of thedifficulty of the selection process. One factor that possiblycontributed to the later onset of the negativemodulation of thephoneme probes in experiment 2 is the fact that experiment 2used two female voices whereas in experiment 1 male voiceswere employed, which might have been more easy to discern.However, differential task difficulty could only explain alatency shift but not a change in topography between the twoexperiments and thus further studies are required to assess thetopographies of probe stimuli ERPs more systematically.

The second issue concerns the question whether and towhat extent the presence of the probe stimuli affected theprocessing of the speech message. In both experiments,participants answered questions regarding the attendedstory with about 70 to 80 accuracy and were unaware of thecontent of the unattended story. Presenting the stories ofexperiment 1 without the probe stimuli to a new group ofstudents (n=6) resulted in a mean percentage of correctanswers to the comprehension questions of 86%. Thus,expectedly, the probe stimuli reduce comprehensibility ofthe stories, as will, for example, the clinging of glasses or thepresence of music at a cocktail party. It can be further asked,however, whether the presence (and the specific nature) of theprobe stimuli also changes the cognitive and neural processesengaged in stream segregation and speaker selection. Becausethe probe stimuli in the present study were the only means toinfer the underlying selection processes, it is impossible toanswer this question from the current results alone. In therealm of social psychology similar situations have beenalluded to by Lindsay and Anderson (2000, page 544) asexamples for a “psychological uncertainty principle” whichaccording to these authors is akin to the Heisenberg uncer-tainty principle in quantum mechanics. This principle gener-ally states that measuring one observable quantity increasesthe uncertainty with which other quantities can be known,becausemeasurement of one variable disturbs (i.e., influencesthe values of) other related variables. Teder et al. (1993) havedescribed an experimental set-up which allows to test theprocessing of the speech message itself. In their experimenttwo concurrent stories were delivered via separate left andright loudspeakers with the task to attend to one messagewhile ignoring the other. The innovation of this study was theuse of phonemes inherent in the speechmessage (word initialFinnish /k/) to generate the ERPs. The use of such a paradigmwould allow to assess the influence of the presence of probeson the processing of the speech message but it would notallow to assess the nature of the selection processes involvedin stream segregation, however.

7. Conclusion

In conclusion, the present study demonstrates that severaldifferent neural mechanisms appear to contribute to theprocesses that lead up to the successful selection of a speechmessage. The probe-ERPs illustrate that the neural processingof stimuli that share location and talker-identity with the

85B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

attended message is enhanced (Nd component), whereasstimuli that do not share talker-identity but occur at theattended message's location are actively rejected, reflected bythe rejection positivity. This underscores the role of bothspatial and non-spatial information in this realistic setting. Tofurther address the role of spatial information, probe stimulisharing speaker identity but varywith respect to their distanceto the attended message's location would be useful as ERPshave been shown to reliably reflect the spatial gradient ofattention (Nager et al., 2003; Teder-Salejarvi and Hillyard,1998; Teder-Salejarvi et al., 1999).

Fig. 5 – Schematic illustration of the experimental set-up.Two speech messages were presented from two locations invirtual auditory space. Superimposed on these messagesphoneme probes and noise probes were presented.

8. Experimental procedures

All procedures were cleared by the ethical review board andinformed consent was obtained by all participants.

8.1. Experiment 1

8.1.1. ParticipantsSeventeen subjects (14women,meanage 22.5, range 20–26)wererecruited from the student population of the University ofMagdeburg. They participated for 7 € per hour. All subjectswere healthy, had normal or corrected to normal vision andnormal hearing. The data of seven additional subjects had to bediscarded, because of too many artifacts (rejection rate >33% ofthe single trial epochs) or technical problems.

8.1.2. Determination of individual head related transferfunctionsFor each participant an extensive set of head-related transferfunction measurements was conducted using small micro-phones (Sennheiser KE4 Elektret-microphone-cartridges,connected to a custom-built amplifier) inserted in theparticipant's ear canal. Measurements were performed in theSchinkel-auditorium Magdeburg (a large concert hall with ahigh ceiling for fewer reflections). The subject was placed on aswivel chair in the middle of the room. With a fixedloudspeaker (azimuth, 0°), different sound angles werecaptured by changing the orientation of the subjects. Differentangles were marked on the floor and the subjects were askedto orientate with the swivel chair to one of those markers. AMackie HR 824 Studiomonitor loudspeaker mounted 3.66 mfrom the subject was used to present the test sounds (shiftregister-noise; 100 ms duration; presented for 80 successivetimes) for the acquisition of the HRTFs. Maximum length (ML)pseudo-random binary sequences (80 dB SPL) were used toobtain the impulse responses at a sampling rate of 44.1 kHz.The measurements comprised impulse responses from theleft and right ear. A total of 13 different positions separated by15° interval (from −90° (left) to 90° (right)) was used.

The measured ML-sequences were transformed by aHadamard-transformation implemented in a matlab-scriptresulting in the impulse responses (IR). These IR consisted ofthe head-related impulse response (HRIR/HRTF; henceforthHRTF will be used for simplicity) and the electro-acousticimpulse response (EAIR; impulse response of the AD/DAconverter [RME Cardbus & Multiface/ambient system]). In areference measurement, the EAIR was captured such that the

HRTFs could be determined by unfolding of the IR by the EAIR.Because of a huge amount of floor reflectionswithin the IR, theIRs were cut separately manually and centered at their edgesprior to transformation. HRTFs corresponding to the variouslocations were then extracted.

8.1.3. StimuliAll stimuli presented during the experiments were filteredwith the HRTFs obtained from the individual participant.Thus, the participant's perception was that of externalizedsound sources located at the original recording position usedduring the determination of the HRTFs.

For this experiment each of several storieswas filteredwithan individual participant's HRTF yielding virtual locationscoming from either 30° to the left or 30° to the right.Thereafter, the stories were mixed together in one stereo-audio-file. The stories were taken from audio-books and wereread by popular German actors (Ernest Hemingway—The oldman and the sea [German: Der alte Mann und das Meer],narrator: Rolf Boysen; Antoine de Saint-Exupéry-Vol de Nuit[German: Nachtflug], narrator: Gert Westphal).

Sequences of task-irrelevant probe stimuli, either pho-nemes uttered by the narrators who also produced the storiesor noise bursts, were presented superimposed on the twoconcurrent stories as shown schematically in Fig. 5.

Additionally, instances of the phoneme /da/ cut from thestories and thus spoken by the same actor as the respectivespeech message (100 ms duration, also convolved withindividual HRTFs), were presented as probe stimuli. Moreover,noise bursts (100 ms, 5 ms rise and fall time, white noisebandpass filtered 200–5000 Hz) were similarly convolved withthe individualized HRTFs to serve as an additional class ofprobe stimuli. Phonemes coinciding in location and speakeridentity to the left or right stories and noise-bursts coincidingwith the left or right stories in location were presented inrandomized order with an interstimulus interval (ISI) of 250 to

86 B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

750 ms (uniform distribution). The probe sequence waspresented using Presentation (Neurobehavioral systems) soft-ware. The continuous stories and the probe sequence weremixed using a stereo-mixer/amplifier unit for delivery to theparticipants using closed headphones.

8.1.4. ProcedureThe experiment took place in a sound-insulated experimentalchamber with participants seated comfortably in a recliningchair. To reduce eye movements, subjects were required tofixate on a cross (1.5° visual angle) ona computer screen locatedin front of them during the entire recording. During eachexperimental run the participants were confronted with twodifferent stories, one coming from a virtual location 30° to theleft, the other one from a location 30° to the right, as well as therandomized sequence of probe stimuli coming from the samelocations. Inaparticular run, either the leftor the right storywasdeclared to be relevantwith the subject's task being to listen forcomprehension. No button presses were required. Each runlasted about 11 min and was followed by a questionnaire withquestions pertaining to the content of the attended story. Therewere two “attend right” and two “attend left” runs administeredto the participants in a counterbalanced order.

A total of 4000 probe stimuli (500 probes for each of theeight conditions: attended/unattended×stimulus locationleft/right×probe-type: phoneme/noise) was presented.

8.1.5. EEG-Recording and data analysisThe electroencephalogram was recorded by using an elasticcap fitted with tin 52 electrodes (positions: Fp1, Fp2, F1, F2, F3,F4, F5, F6, F7, F8, Ft7, Ft8, C1, C2, C3, C4, C5, C6, P1, P2, P3, P4, P5,P6, P7, P8, Fpz, Fz, Cz, Pz, T7, T8, Tp7, Tp8, Fc1, Fc2, Fc3, Fc4, Fc5,Fc6, Cp1, Cp2, Cp3, Cp4, Cp5, Cp6, Po3, Po4, Po7, Po8, O1, O2).The horizontal/vertical electrooculogram (EOG) was recordedusing a bipolarmontage between the left external canthus anda position located below the left eye. The EOG was registeredto allow off-line rejection of ocular artifacts. All scalpelectrodes were referenced to the left mastoid electrode. TheEEG was amplified (time-constant 10 s, low pass filters 30 Hz,high pass filter .05 Hz), digitized on-line with 4 ms resolution(sampling rate of 250 Hz) and stored for further processing onhard disk.

After off-line artifact rejection that excluded trials con-taminated with ocular and other artifacts using individualizedamplitude criteria (determined by inspecting the amplitude ofblink artifacts in eye- and frontal channels), ERPs wereobtained for epochs of 1024 ms including a 100-ms intervalbefore the onset of the stimulus used as baseline.

The ERPs were averaged separately for attention condition(attended or unattended probes), probe classe (phoneme/noise), and location (left/right). After preliminary analyseshad indicated no difference between effects to left and right-sided stimuli, ERPs to left and right-sided probe stimuli werecollapsed to yield waveforms for electrode positions ipsi- andcontralateral with regard to the location of the probes (e.g.,ERPs from electrode F3 for the left stimuli were averagedtogether with ERPs from electrode F4 for the right stimuli toyield the frontal ipsilateral channel F3/4i).

The ERPs were quantified by mean amplitude measures(see result-section for time-windows and electrode sets) and

the resulting data were subjected to repeated measuresanalyses of variance (ANOVAs). The Huynh–Feldt correctionwas applied on the data to correct for possible violations of thesphericity assumption as necessary (Huynh and Feldt, 1976).

8.2. Experiment 2

The methods used in experiment 2 were highly similar toexperiment 1. We note only the pertinent changes.

8.2.1. ParticipantsThe results of twenty-four subjects (20 women, 4 men; meanage 26.6, range 20–35) out of 32 were included in statisticalanalyses. The data of the remaining 8 subjects had to bediscarded, because of toomany artifacts (more than 30% of thetrials rejected) or technical failures. For all participantsindividual HRTFs had been determined in a separate sessionprior to the experiment.

8.2.2. StimuliAgain, stimuli were prepared using the individual HRTFsobtained from the particular participant. Participants wereexposed to two simultaneous stories coming from virtuallocations at 30° to the left or 30° to the right (stories: AngelikaSchrobsdorff-Von der Erinnerung geweckt, narrator: AngelikaSchrobsdorff; Noëlle Châtelet-La dame en bleu [German: DieDame in Blau], narrator: Marlen Diekhoff).

Additionally, instances of the phoneme /da/ (100 msduration; also convolved with individual HRTFs) of either thesame frequency as the particular talker's voice or 400 Hzhigher in fundamental frequency (frequency shift performedusing Adobe Audition® software) were presented as probestimuli. Phonemes coinciding in location and speaker identityto the left or right stories were presented in randomized orderwith an interstimulus interval (ISI) of 250 to 750 ms (uniformdistribution). The probe sequence and the continuous storieswere mixed using a stereo-mixer/amplifier unit for delivery tothe participants using closed headphones.

During each of four experimental runs (11 min duration)participants had to attend to either the left or the right story inorder toansweraquestionnairepresentedafterwards.A total of4000 probe stimuli (500 probes for each of the eight conditions:attended/unattended×stimulus location left/right×probe-type: phoneme at speaker's F0/F0+400 Hz) was presented.

8.2.3. EEG-recording and data analysisRecording and analysis was performed as in experiment 1except for the fact that only a 28 channel montage (Fp1, Fp2,F3, F4, F7, F8, C3, C4, Fpz, Fz, Cz, Pz, T7, T8, Fc5, Fc6, Fc1, Fc2,Cp5, Cp6, P7, P8, P3, P4, Po1, Po2, O1, O2) was used.

Acknowledgments

Funded by Deutsche Forschungsgemeinschaft SFB TR 31 andby BMBF grant 01GO0202 (Center for Advanced Imaging,Magdeburg). We thank Birger Kollmeier and the scientists ofthe “Haus des Hörens, Oldenburg” for performing the HRTFdetermination. Special thanks to Kimmo Alho for usefulsuggestions.

87B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

R E F E R E N C E S

Alain, C., Arnott, S.R., 2000. Selectively attending to auditoryobjects. Front. Biosci. 5, 202–212.

Alho, K., Tottola, K., Reinikainen, K., Sams, M., Näätänen, R., 1987.Brain mechanism of selective listening reflected byevent-related potentials. Electroencephalogr. Clin.Neurophysiol. 68, 458–470.

Alho, K., Woods, D.L., Algazi, A., 1994. Processing ofauditory stimuli during auditory and visual attention asrevealed by event-related potentials. Psychophysiology 31,469–479.

Arbogast, T.L., Mason, C.R., Kidd Jr., G., 2002. The effect of spatialseparation on informational and energetic masking of speech.J. Acoust. Soc. Am. 112, 2086–2098.

Arbogast, T.L., Mason, C.R., Kidd Jr., G., 2005. The effect of spatialseparation on informational masking of speech innormal-hearing and hearing-impaired listeners. J. Acoust. Soc.Am. 117, 2169–2180.

Beauvois, M.W., Meddis, R., 1996. Computer simulation of auditorystream segregation in alternating-tone sequences. J. Acoust.Soc. Am. 99, 2270–2280.

Bee, M.A., Micheyl, C., 2008. The cocktail party problem: what is it?How can it be solved? And why should animal behavioristsstudy it? J. Comp. Psychol. 122, 235–251.

Blauert, J., 1996. Spatial Hearing - Revised Edition: ThePsychophysics of Human Sound Localization. MIT Press,Cambridge, MA.

Bregman, A.S., 1993. Auditory scene analysis: Hearing in complexenvironments, Thinking in Sound: The Cognitive Psychology ofHuman Audition, 10-36.

Bregman, A.S., 1990. Auditory scene analysis: The PerceptualOrganization of Sound. MIT Press, Cambridge, MA.

Bronkhorst, A.W., 2000. The cocktail party phenomenon: a reviewof research on speech intelligibility in multiple-talkerconditions. Acustica 86, 117–128.

Carlyon, R.P., 2004. How the brain separates sounds. Trends Cogn.Sci. 8, 465–471.

Cherry, E.C., 1957. On human communication: A review, survey,and a criticism. MIT Press, Cambridge,MA.

Cooke, M., Ellis, D.P.W., 2001. The auditory organization of speechand other sources in listeners and computational models.Speech Commun. 35, 141–177.

Culling, J.F., Summerfield, Q., 1995. Perceptual separation ofconcurrent speech sounds: absence of across- frequencygrouping by common interaural delay. J. Acoust. Soc. Am. 98,785–797.

Cusack, R., Deeks, J., Aikman, G., Carlyon, R.P., 2004. Effects oflocation, frequency region, and time course of selectiveattention on auditory scene analysis. J. Exp. Psychol. Hum.Percept. Perform. 30, 643–656.

Damaske, P., Wagener, B., 1969. Richtungshörversuche über einennachgebildeten Kopf. Acoustica 21, 30–35.

Darwin, C.J., 2006. Contributions of binaural information to theseparation of different sound sources. Int. J. Audiol. 45.

Darwin, C.J., 2008. Spatial hearing and perceiving sources. In: Yost,W.A., Fay, R.R., Popper, A.N. (Eds.), Auditory perception ofsound sources. Springer, Berlin, pp. 215–232.

Degerman, A., Rinne, T., Särkkä, A.K., Salmi, J., Alho, K., 2008.Selective attention to sound location or pitch studied withevent-related brain potentials and magnetic fields. Eur. J.Neurosci. 27, 3329–3341.

Drennan, W.R., Gatehouse, S., Lever, C., 2003. Perceptualsegregation of competing speech sounds: the role of spatiallocation. J. Acoust. Soc. Am. 114, 2178–2189.

Grimault, N., Bacon, S.P., Micheyl, C., 2002. Auditory streamsegregation on the basis of amplitude-modulation rate.J. Acoust. Soc. Am. 111, 1340–1348.

Hansen, J.C., Hillyard, S.A., 1980. Endogenous brain potentialsassociated with selective auditory attention.Electroencephalogr. Clin. Neurophysiol. 49, 277–290.

Hansen, J.C., Hillyard, S.A., 1983. Selective attention tomultidimensional auditory stimuli. J. Exp. Psychol. Hum.Percept. Perform. 9, 1–19.

Hansen, J.C., Dickstein, P.W., Berka, C., Hillyard, S.A., 1983.Event-related potentials during selective attention to speechsounds. Biol. Psychol. 16, 211–224.

Hillyard, S.A., Hink, R.F., Schwent, V.L., Picton, T.W., 1973.Electrical signs of selective attention in the human brain.Science 182, 177–180.

Hillyard, S.A., Teder-Salejarvi, W.A., Münte, T.F., 1998. Temporaldynamics of early perceptual processing. Curr. Opin. Neurobiol.8, 202–210.

Hink, R.F., Hillyard, S.A., 1976. Auditory evoked potentials duringselective listening to dichotic speech messages. Percept.Psychophys. 20, 236–242.

Hukin, R.W., Darwin, C.J., 1995. Comparison of the effect of onsetasynchrony on auditory grouping in pitch matching and vowelidentification. Percept. Psychophys. 57, 191–196.

Huynh, H., Feldt, L.S., 1976. Estimation of the box correction fordegrees of freedom from sample data in randomized block andsplitsplot designs. J. Educ. Statist. 1, 69–82.

Kidd Jr., G., Arbogast, T.L., Mason, C.R., Gallun, F.J., 2005. Theadvantage of knowing where to listen. J. Acoust. Soc. Am. 118,3804–3815.

Langemann, U., Klump, G.M., 2005. Perception and acousticcommunication networks. Animal Communication Networks451–480.

Lindsay, J.J., Anderson, C.A., 2000. From antecedent conditions toviolent actions: a general affective aggressionmodel. Pers. Soc.Psychol. Bull. 26, 533–547.

Melara, R.D., Rao, A., Tong, Y., 2002. The duality of selection:excitatory and inhibitory processes in auditory selectiveattention. J. Exp. Psychol. Hum. Percept. Perform. 28, 279–306.

Michie, P.T., Bearpark, H.M., Crawford, J.M., Glue, L.C.T., 1990. Thenature of selective attention effects on auditory event-relatedpotentials. Biol. Psychol. 30, 219–250.

Michie, P.T., Solowij, N., Crawford, J.M., Glue, L.C., 1993. The effectsof between-source discriminability on attended andunattended auditory ERPs. Psychophysiology 30, 205–220.

Minnaar, P., Olesen, S.K., Christensen, F., Møller, H., 2001.Localization with binaural recordings from artificial andhuman heads. J. Audio Eng. Soc. 49, 323–336.

Møller, H., Hammershøi, D., Jensen, C.B., Sorensen, M.F., 1999.Evaluation of artificial heads in listening tests. J. Acoust. Soc.Am. 47, 83–100.

Moore, B.C.J., Gockel, H., 2002. Factors influencing sequentialstream segregation. Acta Acustica united with Acustica 88,320–333.

Morrell, L., 1969. Discussion. In: Lindsey, D.B., Donchin, E. (Eds.),Evoked potentials: Methods, results and evaluations No.SP-191.. Washington, DC, p. 331.

Näätänen, R., Alho, K., Schröger, E., 2002. Electrophysiology ofattention. In: Pashler, H.,Wixted, J. (Eds.), Steven's Handbook ofExperimental Psychology, Third Edition Volume Four:Methodology in Experimental Psychology. John Wiley, NewYork, pp. 601–653.

Näätänen, R., Paavilainen, P., Rinne, T., Alho, K., 2007. Themismatch negativity MMN. in basic research of centralauditory processing: a review. Clin. Neurophysiol. 118,2544–2590.

Nager, W., Kohlmetz, C., Altenmuller, E., Rodriguez-Fornells, A.,Münte, T.F., 2003. The fate of sounds in conductors' brains: anERP study. Brain Res. Cogn Brain Res. 17, 83–93.

Nager, W., Dethlefsen, C., Münte, T.F., 2008. Attention to humanspeakers in a virtual auditory environment: brain potentialevidence. Brain Res. 1220, 164–170.

88 B R A I N R E S E A R C H 1 3 0 7 ( 2 0 1 0 ) 7 8 – 8 8

Shucard, D.W., Cummins, K.R., Thomas, D.G., Shucard, J.L., 1981.Evoked potentials to auditory probes as indicesof cerebralspecialization of function—replication and extension.Electroencephalogr. Clin. Neurophysiol. 52, 389–393.

Szymanski, M.D., Yund, E.W., Woods, D.L., 1999. Human brainspecialization forphoneticattention.NeuroReport 10, 1605–1608.

Teder, W., Kujala, T., Näätänen, R., 1993. Selection of speechmessages in free-field listening. NeuroReport 5, 307–309.

Teder-Salejarvi, W.A., Hillyard, S.A., 1998. The gradient of spatialauditory attention in free field: an event-related potentialstudy. Percept. Psychophys. 60, 1228–1242.

Teder-Salejarvi, W.A., Hillyard, S.A., Roder, B., Neville, H.J., 1999.Spatial attention to central and peripheral auditory stimuli asindexed by event-related potentials. Brain Res. Cogn. Brain Res.8, 213–227.

Van Wanrooij, M.M., Van Opstal, A.J., 2004. Contribution of headshadow and pinna cues to chronic monaural soundlocalization. J. Neurosci. 24, 4163–4171.

Van Wanrooij, M.M., Van Opstal, A.J., 2005. Relearning soundlocalization with a new ear. J. Neurosci. 25, 5413–5424.

Vliegen, J., Oxenham, A.J., 1999. Sequential streamsegregation in the absence of spectral cues. J. Acoust. Soc. Am.105, 339–346.

Wenzel, E.M., Arruda, M., Kistler, D.J., Wightman, F.L., 1993.Localization using nonindividualized head-related transferfunctions. J. Acoust. Soc. Am. 94, 111–123.

Wersényi, G., 2007. Directional properties of the dummy-head inmeasurement techniques based on binaural evaluation. J. Eng.Comp. Arch. 1, 1–15.

Woods, D.L., Hillyard, S.A., Hansen, J.C., 1984. Event-related brainpotentials reveal similar attentional mechanisms duringselective listening and shadowing. J. Exp. Psychol. Hum.Percept. Perform. 10, 761–777.

Yost, W.A., 1997. The cocktail party problem: Forty years later,Binaural and Spatial Hearing in Real and Virtual Environments,329-347.