17
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved CHAPTER 19 Investigating audiovisual integration of emotional signals in the human brain Thomas Ethofer 1,2, , Gilles Pourtois 3 and Dirk Wildgruber 2 1 Section of Experimental MR of the CNS, Department of Neuroradiology, Otfried-Mu¨ller-Str. 51, University of Tu¨bingen, 72076 Tu¨bingen, Germany 2 Department of General Psychiatry, University of Tu¨bingen, Tu¨bingen, Germany 3 Laboratory for Neurology and Imaging of Cognition, Departments of Neurology and Neurosciences, Centre Me´dical Universitaire, University of Geneva, Geneva, Switzerland Abstract: Humans can communicate their emotional state via facial expression and affective prosody. This chapter reviews behavioural, neuroanatomical, electrophysiological and neuroimaging studies pertaining to audiovisual integration of emotional communicative signals. Particular emphasis will be given to ne- uroimaging studies using positron emission tomography (PET) or functional magnetic resonance imaging (fMRI). Conjunction analyses, interaction analyses, correlation analyses between haemodynamic responses and behavioural effects and connectivity analyses have been employed to analyse neuroimaging data. There is no general agreement as to which of these approaches can be considered ‘‘optimal’’ to classify brain regions as multisensory. We argue that these approaches provide complementing information as they assess different aspects of multisensory integration of emotional information. Assets and drawbacks of the different analysis types are discussed and demonstrated on the basis of one fMRI data set. Keywords: conjunction analysis; connectivity analysis; correlation analysis; emotion; facial expression; interaction analysis; multisensory; prosody Behavioural studies In natural environment, most events generate stimu- lation via multiple sensory channels. Integration of inputs from different modalities enables a unified representation of the world and can provide infor- mation that is unavailable from any single modality in isolation. A compelling example of such merging of information is the McGurk effect (McGurk and MacDonald, 1976) in which a heard syllable / ba/ and a seen syllable /ga/ are perceived as /da/. Moreover, integration of information obtained from different modalities can result in enhanced perceptual sensitivity and shortened response late- ncies on a behavioural level (Miller, 1982, 1986; Schro¨ ger and Widmann, 1998). This is of particular importance for perception of emotionally relevant information which can be simultaneously perceived via the visual modality (e.g. facial expression, body postures and gestures) and the auditory modality (e.g. affective prosody and propositional content). It has been demonstrated that congruency in infor- mation expressed via facial expression and affective prosody facilitates behavioural reactions to such emotion-laden stimuli (Massaro and Egan, 1996; de Gelder and Vroomen, 2000; Dolan et al., 2001). Furthermore, affective information obtained via one sense can alter information processing in an- other, for example a facial expression is more likely Corresponding author. Tel.: +49-7071-2987385; Fax: +49- 7071-294371; E-mail: [email protected] DOI: 10.1016/S0079-6123(06)56019-4 345

Investigating audiovisual integration of emotional signals

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

CHA

Anders, Ende, Junghofer, Kissler & Wildgruber (Eds.)

Progress in Brain Research, Vol. 156

ISSN 0079-6123

Copyright r 2006 Elsevier B.V. All rights reserved

PTER 19

Investigating audiovisual integration of emotionalsignals in the human brain

Thomas Ethofer1,2,�, Gilles Pourtois3 and Dirk Wildgruber2

1Section of Experimental MR of the CNS, Department of Neuroradiology, Otfried-Muller-Str. 51, University of Tubingen,72076 Tubingen, Germany

2Department of General Psychiatry, University of Tubingen, Tubingen, Germany3Laboratory for Neurology and Imaging of Cognition, Departments of Neurology and Neurosciences, Centre Medical

Universitaire, University of Geneva, Geneva, Switzerland

Abstract: Humans can communicate their emotional state via facial expression and affective prosody. Thischapter reviews behavioural, neuroanatomical, electrophysiological and neuroimaging studies pertaining toaudiovisual integration of emotional communicative signals. Particular emphasis will be given to ne-uroimaging studies using positron emission tomography (PET) or functional magnetic resonance imaging(fMRI). Conjunction analyses, interaction analyses, correlation analyses between haemodynamic responsesand behavioural effects and connectivity analyses have been employed to analyse neuroimaging data. Thereis no general agreement as to which of these approaches can be considered ‘‘optimal’’ to classify brainregions as multisensory. We argue that these approaches provide complementing information as they assessdifferent aspects of multisensory integration of emotional information. Assets and drawbacks of thedifferent analysis types are discussed and demonstrated on the basis of one fMRI data set.

Keywords: conjunction analysis; connectivity analysis; correlation analysis; emotion; facial expression;interaction analysis; multisensory; prosody

Behavioural studies

In natural environment, most events generate stimu-lation via multiple sensory channels. Integration ofinputs from different modalities enables a unifiedrepresentation of the world and can provide infor-mation that is unavailable from any single modalityin isolation. A compelling example of such mergingof information is the McGurk effect (McGurkand MacDonald, 1976) in which a heard syllable /ba/ and a seen syllable /ga/ are perceived as /da/.Moreover, integration of information obtainedfrom different modalities can result in enhanced

�Corresponding author. Tel.: +49-7071-2987385; Fax: +49-

7071-294371; E-mail: [email protected]

DOI: 10.1016/S0079-6123(06)56019-4 345

perceptual sensitivity and shortened response late-ncies on a behavioural level (Miller, 1982, 1986;Schroger and Widmann, 1998). This is of particular

importance for perception of emotionally relevantinformation which can be simultaneously perceivedvia the visual modality (e.g. facial expression, body

postures and gestures) and the auditory modality(e.g. affective prosody and propositional content). Ithas been demonstrated that congruency in infor-mation expressed via facial expression and affective

prosody facilitates behavioural reactions to suchemotion-laden stimuli (Massaro and Egan, 1996; deGelder and Vroomen, 2000; Dolan et al., 2001).

Furthermore, affective information obtained viaone sense can alter information processing in an-other, for example a facial expression is more likely

346

to be perceived as fearful if accompanied by afearful (as opposed to a neutral) voice (Massaro andEgan, 1996; Ethofer et al., 2006). Such crossmodalbiases occur even under the explicit instruction toignore information conveyed in the concurrentchannel (de Gelder and Vroomen, 2000; Ethoferet al., 2006) and are unconstrained by the allocationof attentional resources (Vroomen et al., 2001).These findings argue for a mandatory nature ofprocesses underlying integration of facial and vocalaffective information.

Neuroanatomical studies

In animal experiments, several areas with con-verging projections from visual and auditory cor-tices have been identified. Such convergence zones(Damasio, 1989) constitute candidate regions formediation of audiovisual integration and crossmo-dal effects in humans (for a review, see Mesulam,1998; Driver and Spence, 2000; Calvert, 2001).These regions include both cortical structures,such as the banks of the superior temporal sulcus(STS; Jones and Powell, 1970; Seltzer and Pandya,1978), the insula (Mesulam and Mufson, 1982)and the orbitofrontal cortex (Jones and Powell,1970; Chavis and Pandya, 1976), as well as subcor-tical structures comprising the superior colliculus(Fries, 1984), claustrum (Pearson et al., 1982) andseveral nuclei within the amygdala (Turner et al.,1980; Murray and Mishkin, 1985; McDonald,1998; Pitkanen, 2000) and thalamus (Mufson andMesulam, 1984).

At the single neuron level, the most intensivelystudied of these convergence zones is the superiorcolliculus (Gordon, 1973; Meredith and Stein, 1983;Peck, 1987; Wallace et al., 1993, 1996) which playsa fundamental role in attention and orientation be-haviour (for review, see Stein and Meredith, 1993).On the basis of their detailed studies on multisen-sory neurons in deep layers of the superior col-liculus, Stein and colleagues (Stein and Meredith,1993) formulated a series of key ‘‘integration rules’’:First, multimodal stimuli that occur in a close tem-poral and spatial proximity elicit supra-additive re-sponses (i.e. the number of impulses to a bimodalstimulus exceeds the arithmetic sum of impulses

to the respective unimodal stimuli). Second, thestronger these crossmodal interaction effects, theless effective are the unimodal stimuli in generatinga response in the multisensory cell (inverse effec-tiveness rule). Third, spatially disparate crossmodalcues result in pronounced response depression inmultisensory cells (i.e. the response to a stimuluscan be severely diminished by a spatially incongru-ent stimulus from another modality).

A similar behaviour has been described for mul-tisensory convergence sites in the cortex, such asthe banks of the STS (Bruce et al., 1981; Hikosakaet al., 1988; Barraclough et al., 2005) and posteriorinsula (Loe and Benevento, 1969; Fallon et al.,1978). However, neurons in these cortical regionsand the superior colliculus do not project to orreceive afferents from each other (Wallace et al.,1993) and show different sensitivity to spatial fac-tors (Stein and Wallace, 1996). Therefore, it is be-lieved that they fulfil different integrative functions(attention/orientating behaviour in the superiorcolliculus vs. perceptual judgements in the cortex;Stein et al., 1996).

Animal experiments provided insight into possi-ble multisensory integration sites in the brain thatenable definition of regions of interest for analysisof human neuroimaging studies. With a typicalspatial resolution of 3� 3� 3mm3, each data pointacquired in a neuroimaging experiment is attribut-able to the averaged response of several millions ofneurons (Goldman-Racic, 1995). In view of the factthat only about 25% of the neurons in cerebralmultisensory integration sites are sensitive to stimulifrom more than one modality (Wallace et al., 1992),the effects elicited by multisensory integration proc-esses are expected to be small. Restricting the searchregion to small anatomical structures strongly im-proves the sensitivity to identify such integrationsites by reducing the problem of multiple compari-sons (Worsley et al., 1996).

Electrophysiological studies

Recording of electric brain responses over thehuman scalp (event-related potentials or ERPs)has been primarily employed to investigate thetime course of crossmodal binding of affective

347

audiovisual information, given the high temporalresolution of this technique. De Gelder et al.(1999) demonstrated that a facial expression withconflicting emotional information in relation to asimultaneously presented affective voice evokes anearly mismatch-negativity response around 180 msafter its onset. These ERP findings indicate thatauditory processing is modulated by concurrentvisual information. A subsequent ERP study fromthe same group demonstrated that the auditoryN1 component occurring around 110 ms afterpresentation of an affective voice is significantlyenhanced by an emotionally congruent facial ex-pression. This effect occurs for upright but not forinverted faces (Pourtois et al., 2000). Recognitionof emotional facial expressions is substantiallyhindered by face inversion (White, 1999). Thus,the observation that modulation of auditory ERPcomponents is restricted to upright faces suggeststhat this effect is driven by the expressed facialaffect and is not attributable to low-level picturalfeatures of the visual stimuli. Finally, an analysisfocused on the positive deflection following theN1-P1 component around 220 ms poststimulusrevealed a shortened latency of this deflection inemotionally congruent as compared to incongru-ent audiovisual trials (Pourtois et al., 2002). Thesefaster ERP responses parallel behavioural effectsshowing facilitated responses to affective prosodywhen presented simultaneously with a congruentversus an incongruent emotional facial expres-sion (de Gelder and Vroomen, 2000). In summary,electrophysiological studies on audiovisual inte-gration of emotional information indicate thatmultisensory integration occurs at an early stage ofcerebral processing (i.e. around 110–220 ms post-stimulus). The observation that crosstalk betweenthe modalities takes place during the early per-ceptual rather than during late decisional stagesoffers an explanation for the finding that cross-modal biases between the modalities occur irre-spective of attentional resources (Vroomen et al.,2001) and instructions to ignore a concurrentstimulus (de Gelder and Vroomen, 2000; Ethoferet al., 2006). Furthermore, these electrophysio-logical findings point to neuronal structures thatconduct early steps in the processing of externalinformation. However, the low spatial resolution

of ERP data does not allow inference on whichbrain regions are involved in integration of multi-sensory information.

Neuroimaging studies

Positron emission tomography (PET) and func-tional magnetic resonance imaging (fMRI) havebeen used to shed light on the functional neuro-anatomy of multisensory integration of emotionalinformation. However, definition of the appropri-ate analysis in multimodal studies is not trivial andseveral approaches have been applied to modelaudiovisual integration or crossmodal effects inthe brain. These approaches include conjunctionanalyses, interaction analyses, correlation analyseswith effects observed on behavioural level andconnectivity analyses. We demonstrate the appli-cation of these approaches on the basis of one dataset acquired in an event-related fMRI study con-ducted to investigate the neuronal correlates un-derlying audiovisual integration of emotionalinformation from face and voice (Ethofer et al.,2006). Twelve right-handed subjects participatedin this experiment conducted at a 1.5 T scanner(Siemens VISION, Erlangen, Germany) compris-ing two sessions with 36 visual (V) trials, two ses-sions with 36 auditory (A) trials and two sessionswith 36 audiovisual (AV) trials. In the visual ses-sions, every trial consisted of a presentation of afacial expression shown for 1 s. These visual stim-uli were obtained from the Ekman and Friesenseries (1976) and comprised facial expressionsranging from neutral to 100% fear and from neu-tral to 100% happiness in incremental steps of25% created by digital morphing techniques. Inthe auditory sessions, sentences spoken by profes-sional actors in an either happy or fearful voicewere presented. In the bimodal sessions, auditoryand visual stimuli were presented with facial ex-pressions being shown during the last second ofthe spoken sentences. Subjects were instructed tojudge the emotional valence of the stimuli ona nine-point self-assessment manikin scale (SAM;Bradley and Lang, 1994) by pressing buttons intheir right or left hand. In all the sessions, theSAM scale was presented for 4 s 200 ms after

348

stimulus offset (see Fig. 1a). In unimodal sessions,subjects rated the emotional valence of the pre-sented stimuli (facial expressions or prosody). In thebimodal sessions, subjects were instructed to judgethe emotional valence of the facial expression andignore the concomitant affective voice. fMRI datawere analysed using statistical parametric mappingsoftware (SPM2, Welcome Department of ImagingNeuroscience, London, UK, http://www.fil.ion.u-cl.ac.uk/spm). Coordinates of activation clustersare given in MNI space (Montreal NeurologicalInstitute; Collins et al., 1994). Main effects of pres-entation of unimodal (A and V) and bimodal (AV)stimuli are shown in Fig. 1b. As expected, unimodalpresentation of either emotional facial expressionsor affectively spoken sentences engaged bilateralprimary and higher-order visual or auditory corti-ces, respectively, while in bimodal trials both visualand auditory cortices were found active. In allthree conditions, activations were found in bilateral

Fig. 1. Experimental design (a) and brain activation (b) for auditor

(lower panels) as compared to rest. Brain activations are thresholded a

multiple comparisons at cluster level k450, po0.05 (corrected).

motor cortices and cerebellum which are mostprobably attributable to the motor responses madeby the subjects to judge the stimuli. Furthermore,dorsolateral prefrontal cortices of both hemispherespresumably subserving working memory processesshowed responses to stimulus presentation in bothunimodal and bimodal sessions.

Conjunction analyses

Conjunction analyses were originally introducedby Price and Friston (1997). The objective ofthis approach is to test for commonalities in brainactivation induced by different stimuli or tasks.Recently, conjunctions employed in analysis ofneuroimaging data have been revised and dividedinto those that test for a global null hypothesis(H0: No effect in any of components; H1: Signi-ficant effect in at least one of the components;

y (upper panels), visual (middle panels) and audiovisual trials

t a height threshold of po0.001 (uncorrected) and corrected for

349

Friston et al., 1999, 2005) and those that test fora conjunction null hypothesis (H0: No effect in atleast one of the components; H1: Significant effectsin all of the components; Nichols et al., 2005).Since only rejection of the conjunction null hy-pothesis can be taken as evidence for a logicalAND, all conjunction analyses reported here werecarried out on the basis of a conjunction nullhypothesis.

An obvious property of multisensory neuralstructures is their responsiveness to stimuli ob-tained from more than one modality. Thus, astraightforward approach to locate brain regionscontaining such structures is a conjunction anal-ysis on responses to unimodal stimuli of both mo-dalities (Unimodal 1 \ Unimodal 2). Thisapproach has been applied in a PET study dem-onstrating multisensory convergence zones in theleft intraparietal sulcus for spatial attention to vi-sion and touch (Macaluso et al., 2000) and an

Fig. 2. Intersection of brain activation during unimodal auditory and

height threshold of po0.001 (uncorrected) and corrected for multiple

fMRI study identifying audiovisual integrationsites of motion processing in lateral parietal cortex(Lewis et al., 2000). In our study, the intersectionA \ V revealed significant responses in candi-date areas for audiovisual convergence, such as theposterior thalamus extending into the superiorcolliculus and the right posterior temporooccipito-parietal junction (see Fig. 2). However, areas pre-sumably involved in the nonsensory componentsof the task, such as the dorsolateral prefrontalcortex (working memory) and motor cortex, sup-plementary motor area and cerebellum (motor re-sponses), also showed significant responses duringboth unimodal sessions. These findings illustratethat the results provided by the simple intersectionA \ V cannot separate multimodal convergencezones from unspecific activations attributable tononsensory components of the task. Therefore,results produced by such a conjunction do notnecessarily reflect multisensory convergence zones.

visual stimulation (A \ V). Brain activations are thresholded at a

comparisons at cluster level, k450, po0.05 (corrected).

350

Furthermore, brain regions responding exclusivelyto bimodal stimulation or achieving suprathresh-old activations by supra-additive responses in-duced by the simultaneous presence of stimuli oftwo modalities will be missed compromising thesensitivity of this approach.

Both the lack of specificity for multimodal in-tegration sites as well as the impaired sensitivityto detect regions responding exclusively to multi-modal stimuli can be overcome by investigatingbrain areas that show a significantly stronger re-sponse to bimodal stimuli than to unimodal stim-uli of both modalities. This can be achieved bycomputing the conjunction [Bimodal – Unimodal1] \ [Bimodal – Unimodal 2]. This analytic strat-egy has been employed by Grefkes et al. (2002) toinvestigate brain regions subserving crossmodaltransfer of visuotactile information and in a com-parable way by Calvert et al. (1999) to detect neu-ral structures involved in processing of audiovisualspeech. Recently, a more elaborated form of thisapproach was used by Pourtois et al. (2005) in aPET study on audiovisual integration of emotionalinformation. In this study, the experimental designincluded two different AV conditions in whichsubjects were instructed to judge either the infor-mation from the visual (AV(judge V)) or auditorychannel (AV(judge A)). The conjunction of[AV(judge A) – A(judge A)] \ [AV(judge V) –V(judge V)] was computed which represents asound way to remove task-related brain responses.

However, it should be noted that any conjunc-tion analysis based on a conjunction null hypoth-esis (Nichols et al., 2005) is a very conservativestrategy which only gives an upper bound for thefalse positive rate (Friston et al., 2005). While suchconjunction analyses remain valid even in statis-tical worst-case scenarios (Nichols et al., 2005),their conservativeness must be paid for by a loss ofsensitivity (Friston et al., 2005). This is especiallycritical if two differential contrasts that are ex-pected to yield small effects are submitted to suchan analysis. Accordingly, in none of the studiesthat employed this approach (Calvert et al., 1999;Grefkes et al., 2002; Pourtois et al., 2005) brainactivations were significant if corrected for multi-ple comparisons across the whole brain. Therefore,it is decisive to increase the sensitivity of such

conjunctions by correcting the search volume tosmall anatomical regions (small volume correction(SVC); Worsley et al., 1996). Definition of regionsof interest in our analysis relied on knowledge in-ferred from neuroanatomical studies and previousneuroimaging studies and comprised the cortexadjacent to the posterior STS (Jones and Powell,1972; Seltzer and Pandya, 1978; Beauchamp et al.,2004a, 2004b; van Atteveldt et al., 2004), orbito-frontal cortex (Jones and Powell, 1972; Chavis andPandya, 1976), insular cortex (Mesulam and Muf-son, 1982), claustrum (Pearson et al., 1992; Olsonet al., 2002), superior colliculus (Fries, 1984;Calvert et al., 2000), thalamus (Mufson and Me-sulam, 1984) and amygdala (Turner et al., 1980;McDonald, 1998; Pitkanen, 2000; Dolan et al.,2001). The conjunction analysis [AV – V] \ [AV –A] revealed activation clusters in bilateral poste-rior STS, right orbitofrontal cortex, bilateral pos-terior thalamus and right posterior insula/claustrum (Fig. 3). The activation cluster in theleft posterior STS was significant (SVC on a 6mmradius spherical volume of interest centred atx ¼ –50, y ¼ –54, z ¼ 6, a priori coordinatesderived from Beauchamp et al., 2004b). This re-sult is in keeping with reports showing strongerresponses in the posterior STS cortices duringaudiovisual presentation of objects (Beauchampet al., 2004b), letters (van Atteveldt et al., 2004)and speech (Wright et al., 2003; van Atteveldtet al., 2004) than with unimodal presentation ofthese stimuli. Thus, there is converging evidenceimplicating posterior STS cortices in integration ofaudiovisual stimuli irrespective of the type of in-formation conveyed by these stimuli. Further-more, a recent PET study (Pourtois et al., 2005)demonstrated increased cerebral blood flow in theleft middle temporal gyrus during audiovisualpresentation of emotional information as com-pared to isolated presentation of emotional facesor voices. However, it has to be noted that theactivation cluster in our study was localized moreposterior and superior than the cluster describedby Pourtois et al. (2005). Differences in the imag-ing modality and task instructions (gender differ-entiation in the PET study of Pourtois et al., 2005as compared to rating of emotional information inthe fMRI study by Ethofer et al., 2006) might

Fig. 3. Conjunction analysis [AV – A] \ [AV – V] showing activations in (a) bilateral posterior superior temporal sulcus (x ¼ –54,

y ¼ –51, z ¼ 12, Z ¼ 2.90, k ¼ 51 and x ¼ 51, y ¼ –42, z ¼ 12, Z ¼ 2.79, k ¼ 119 for the left and right STS, respectively), right

orbitofrontal cortex (x ¼ 39, y ¼ 24, z ¼ –12, Z ¼ 2.69, k ¼ 64), (b) bilateral posterior thalamus (x ¼ –30, y ¼ –30, z ¼ 0, Z ¼ 2.82,

k ¼ 120 and x ¼ 12, y ¼ –24, z ¼ 9, Z ¼ 2.88, k ¼ 76 for the left and right thalamic cluster, respectively) and (c) right posterior insula/

claustrum (x ¼ 36, y ¼ –3, z ¼ 6, Z ¼ 2.45, k ¼ 124). Brain activations are thresholded at a height threshold of po0.05 (uncorrected).

(d) Event-related responses to unimodal auditory (red), unimodal visual (blue) and bimodal (magenta) stimuli in the left posterior STS.

351

constitute an explanation for the different locali-zation of the activation clusters within the lefttemporal lobe in the two studies.

Another promising approach employed byPourtois et al. (2005) to investigate brain regionssubserving integration of audiovisual emotionalinformation is to define separate conjunction anal-yses for specific emotions, such as [AVhappy –Ahappy] \ [AVhappy – Vhappy] or [AVfear – Afear] \[AVfear – Vfear]. The results of this analysis enableinference on the localization of brain regionsshowing stronger responses if a certain emotionis signalled via two modalities as compared tounimodal presentation of this emotion in eithermodality. We submitted the fMRI data set re-ported here to an analogous conjunction analysis.In this analysis, facial expressions were consideredto express happiness or fear, if they showed at least50% of the respective emotion. The conjunction[AVhappy – Ahappy] \ [AVhappy – Vhappy] revealedstronger responses for bimodal presentation ofhappiness as compared to responses to unimodal

presentation of either happy voices or happy facialexpressions in the right posterior insula/claustrum(x ¼ 39, y ¼ –6, z ¼ –3, Z ¼ 3.66, k ¼ 91, po0.05SVC corrected for right insula). The only brainstructure included in our regions of interest showingenhanced responses to audiovisual fear as com-pared to unimodally presented fear [AVfear – Afear]\ [AVfear – Vfear] was the right amygdala (x ¼ 27,y ¼ –9, z ¼ –24, Z ¼ 2.00, k ¼ 29). However, thisactivation failed to reach significance within thesearch volume comprising the amygdala. In con-clusion, the results from conjunction analyses ofour experiment suggest that neocortical areas in vi-cinity to the STS might be more generally con-cerned with integration of audiovisual signals, whilephylogenetically older structures, such as the pos-terior insula or the amygdala, show additive re-sponses if certain emotions are expressed in acongruent way via different sensory channels.

However, one limitation of all analyses relyingon conjunctions of [AV – V] \ [AV – A] is thatthey have the potential to detect brain regions in

352

which responses to information from auditory andvisual channels sum up in a linear way and mighttherefore simply reflect areas in which both neu-rons responsive to unimodal auditory and unimo-dal visual information coexist without the need ofmultimodal integration in these areas (Calvert,2001; Calvert and Thesen, 2004).

Interaction analyses

Calvert and Thesen (2004) suggested that activat-ions of multisensory integration sites should differfrom the arithmetic sum of the respective activat-ions to unimodal stimuli: If the response to a bi-modal stimulus exceeds the sum of the unimodalresponses [Bimodal4Unimodal 1+Unimodal 2],then this is defined as positive interaction; while ifthe summed responses are greater than the bimo-dal response [BimodaloUnimodal 1+Unimodal2], then this is defined as negative interaction effect(Calvert et al., 2000, 2001). Usually, the most effi-cient way to investigate interactions between twofactors is a 2 � 2 factorial design. Theoretically,such a 2 � 2 factorial design for investigatinginteractions between two sensory modalities wouldinclude one bimodal and two unimodal conditions,in which the subject judges some aspect of thepresented stimuli, and a control condition whichcontains all components of the other conditions(e.g. working memory, judgement and motor re-sponses), but no sensory stimulation. However, forall paradigms including a behavioural task, it ispractically impossible to implement such a controlcondition since it is impossible for the subject tojudge a specific aspect (e.g. gender or conveyedemotion) of a stimulus if it is not presented.Therefore, all imaging studies investigating inter-actions between the modalities omitted this controlcondition and simply compared haemodynamicresponses obtained during bimodal stimulation tothe sum of the responses of the unimodal condi-tions: Bimodal � [Unimodal 1+Unimodal 2].However, this omission of the control conditionhas serious consequences for the interpretation ofboth positive and negative interactions. For ex-ample, brain regions involved in nonsensory com-ponents of the task (e.g. working memory and

motor responses) showing similar positive re-sponses in both unimodal and the bimodal trialswill produce a negative interaction effect. Further-more, brain areas which deactivate in a similar wayin all three conditions will produce a positive in-teraction effect. Positive and negative interactions,as computed by AV – (A+V) are shown in Fig. 5ain red and green, respectively. At first glance, thefinding of a positive interaction in the right inferiorparietal cortex and the right orbitofrontal cortex isan interesting result. Inspection of the event-relatedresponses during unimodal auditory, unimodal vis-ual and bimodal trials, however, reveals that inboth these regions the haemodynamic responsedecreases in all three conditions. The finding ofunspecific deactivations to stimulus presentation inthe right inferior parietal cortex is in agreementwith the view that this area belongs to the restingstate network and is tonically active in the baselinestate (Raichle et al., 2003; Fox et al., 2005). Thus,the positive response in this region as calculated bythe interaction AV – (A+V) is caused by the factthat the added deactivations during unimodal trialsexceed the deactivation during bimodal trials (seeFig. 4b). A more complex behaviour of the haemo-dynamic response was found in the orbitofrontalcortex showing a decrease of the blood oxygen leveldependent (BOLD) response with varying delayfollowed by a positive response further complicat-ing the interpretation of the positive interactioncomputed by AV – (A+V). Negative interactionswere found in dorsolateral prefrontal areas, motorcortex and cerebellum. Event-related responses inthe right dorsolateral prefrontal cortex (see Fig.4d), however, show that the negative interaction inthis region is due to very similar responses in allthree conditions. The vulnerability of the interac-tion AV – (A+V) to unspecific activations attribu-table to the behavioural task resulting in negativeinteractions and unspecific deactivations of restingstate network components producing positive in-teractions demands caution in the application ofthis technique. Therefore, we suggest that interpre-tation of both positive and negative interactionscalculated by this approach should be based oninspection of time series.

Calvert et al. (2001) suggested that electrophysi-ological criteria for investigation of multimodal

Fig. 4. (a) Positive (red) and negative (green) interactions as calculated by AV– (A+V). Brain activations are thresholded at a height

threshold of po0.001 (uncorrected). Event-related responses to unimodal auditory (red), unimodal visual (blue) and bimodal (ma-

genta) stimuli in (b) the right lateral inferior parietal cortex (MNI coordinates: x ¼ 57; y ¼ –66; z ¼ 12, Z ¼ 4.81, k ¼ 36), (c) the right

orbitofrontal cortex (MNI coordinates: x ¼ 39; y ¼ 21; z ¼ –21, Z ¼ 5.23, k ¼ 46) and (d) the right dorsolateral prefrontal cortex

(MNI coordinates: x ¼ 45; y ¼ 6; z ¼ 21, Z ¼ 4.71, k ¼ 229).

353

integration should be applied to the BOLD effect.According to these criteria, cells subserving multi-modal integration show responses to congruentinformation obtained via several modalities thatexceed the sum of responses to the respectiveunimodal stimuli (supra-additivity: Bimodal(con-gruent)4Unimodal 1+Unimodal 2). Further-more, conflicting multimodal information resultsin response depression in which the response toincongruent bimodal information is smaller thanthe stronger of the two unimodal responses (re-

sponse depression: Bimodal(incongruent)o Max-imum(Unimodal 1, Unimodal 2)). Calvert et al.(2001) demonstrated that BOLD responses withinthe superior colliculi fulfil these criteria showingsupra-additive responses to audiovisual nonspeechstimuli if they are presented in temporal synchronyand corresponding response depressions if theaudiovisual stimuli are presented in an asynchro-nous manner. We investigated whether one of ourregions of interest shows a comparable behaviourif congruence of audiovisual information is defined

354

by emotional content conveyed via affectiveprosody and facial expressions. To this end, wecompared responses to audiovisual trials with con-gruent emotional information (e.g. showing atleast 50% of the respective emotion in the facialexpression as expressed via emotional prosody) tothe sum of haemodynamic responses to unimodalvisual and auditory trials (AV(congruent) –(A+V), see Fig. 5a). Obviously, this analysis tech-nique is burdened with the same drawbacks as thesimple interaction AV – (A+V) and not surpris-ingly, positive interactions driven by similar de-creases of the BOLD response in all threeconditions were found in the right lateral inferiorparietal and orbitofrontal cortex. In addition, sig-nificant supra-additive responses were also foundin the right posterior insula/claustrum (see Fig.5b). Inspection of event-related responses demon-strates that the interaction found in this region isof a completely different nature than those foundin the parietal and orbitofrontal cortex showingrobust activation to congruent audiovisual trialsand slightly negative responses to unimodal stim-ulation in either modality (see Fig. 5b). To inves-tigate whether the responses of this region fulfilthe criteria of response depression to conflictingemotional content as conveyed by voice and face,we plotted event-related responses to bimodaltrials with incongruent emotional information

Fig. 5. (a) Positive interaction in the right posterior insula/claustrum

po0.05, SVC corrected) as calculated by AV(congruent) – (A+V). Br

(uncorrected). (b) Event-related responses to unimodal auditory (re

emotional information (magenta) and incongruent information (cyan

(see Fig. 5b). No evidence for a depression of re-sponses below the level of unimodal responses wasfound. Instead, the results of this analysis suggestthat the right posterior insula/claustrum also re-sponds, albeit to some lesser extent, to audiovisualinformation conveying conflicting emotional infor-mation. The observation that this region showsstrongest haemodynamic responses if auditory andvisual information are simultaneously available isconcordant with results obtained from single-cellrecordings of the posterior insula cortex (Loe andBenevento, 1969). The finding that activation inthis region is stronger during congruent than dur-ing conflicting audiovisual emotional informationis in keeping with observations from an fMRIexperiment on synchronized and desynchronizedaudiovisual speech (Olson et al., 2002). In this ex-periment, audiovisual stimuli presented in tempo-ral synchrony resulted in stronger haemodynamicresponses in the left claustrum than asynchronallypresented stimuli. Olson et al. (2002) reasoned thataudiovisual integration might be better explained bya ‘‘communication relay’’ model in which subcor-tical areas, such as the claustrum, receive their in-formation directly from unimodal cortices than by a‘‘site-specific’’ model assuming integration in multi-sensory brain regions, such as the posterior STS.We argue that increased haemodynamic responsesas found for temporally congruent information in

(MNI coordinates: x ¼ 39; y ¼ –6; z ¼ –3; Z ¼ 4.05; k ¼ 14;

ain activations are thresholded at a height threshold of po0.001

d), unimodal visual (blue) and bimodal trials with congruent

) have been plotted.

355

the study of Olson et al. (2002) and emotionallycongruent information in our study might reflect amore general role of the claustrum in determiningwhether multisensory information matches or not.

A possible problem with application of neuro-physiological criteria to analyses of the BOLDsignal arises from the fact that the signal acquiredin an fMRI is caused by haemodynamic pheno-mena. While supra-additivity is a well-knownelectrophysiological characteristic of multimodalcells (Stein and Meredith, 1993), there is reason todoubt that supra-additive neuronal responses mustnecessarily translate into supra-additive BOLD re-sponses. There is compelling evidence that theBOLD response to two stimuli in temporal proxi-mity is overpredicted by simply adding the re-sponses to the two stimuli presented in isolation(Friston et al., 1998; Mechelli et al., 2001). Thisphenomenon has been named ‘‘haemodynamic re-fractoriness’’ and is specific for an fMRI since themajor part of this nonlinear behaviour arises fromthe transformation of changes in regional cerebralblood flow to the BOLD response (Friston et al.,1998; Mechelli et al., 2001). The fact that a pre-ceding or simultaneously presented stimulus canattenuate the BOLD response to a second stimulusmight compromise the sensitivity of an fMRI dataanalysis in which responses during multimodalintegration are expected to exceed the linear sumof the responses to unimodal stimuli. It mighttherefore be useful to estimate the neuronal re-sponse from the BOLD response via a plausiblebiophysical model (Friston et al., 2000; Gitelmanet al., 2003) and search for regions in which theneuronal response exhibits supra-additive effects.

While construction of a full 2� 2 factorialdesign for investigation of interaction betweensensory modalities is compromised by the lack ofan appropriate control condition, investigating in-teractions between emotions expressed via thesemodalities is not burdened with this problem. Fur-thermore, no unimodal trials are required in fac-torial analyses designed to investigate interactionsbetween emotional information expressed via twodifferent modalities. Instead, such designs includeonly bimodal trials with two conditions in whichemotional information is presented in a congruentway (e.g. happy face-happy voice (hH) and fearful

face-fearful voice (fF)) and for two conditions thatwere conflicting information is expressed via thetwo modalities (e.g. happy face-fearful voice (fH)and fearful face-happy voice (hF)). The interactionis then calculated by (hH – fH) – (hF – fF). Forinterpretation of results calculated by this inter-action term it is worth noting that the interactionof emotional information is mathematically equiv-alent to contrasting congruent with incongruentconditions: (hH+fF) – (fH+hF). This factorialdesign was used by Dolan et al. (2001) to inves-tigate interactions between visually and auditorilypresented fear and happiness. In this study, par-ticipants were instructed to categorize the pre-sented facial expression in either fearful or happy,while ignoring the simultaneously presented emo-tional voice. Behaviourally, the authors found afacilitation of the emotion categorization as indi-cated by shortened response latencies for con-gruent as compared to incongruent audiovisualtrials. On a cerebral level, a significantly strongeractivation of the left basolateral amygdala ascalculated by (hH – fH) – (hF – fF) was found.Inspection of parameter estimates of the four con-ditions revealed that this interaction was mainlydriven by an augmentation of haemodynamic re-sponses during the fear congruency condition (fF).On the basis of their results, Dolan et al. (2001)concluded that it is the left amygdala that sub-serves crossmodal integration in fear processing.This interpretation is in line with observationsfrom neuropsychological studies indicating thatlesions of the amygdala can impair recognition ofboth fearful faces (Adolphs et al., 1994) and fear-ful voices (Scott et al., 1997) as well as neuro-imaging studies of healthy subjects demonstratingenhanced activation to fear signalled via the face(Breiter et al., 1996; Morris et al., 1996) and thevoice (Phillips et al., 1997). So far, little is knownon the neuronal substrates underlying integra-tion of audiovisual emotional information otherthan fear. The factorial design employed by Dolanet al. (2001) has the potential to clarify whetheraudiovisual integration of other basic emotionsalso occurs via the amygdala or is mediated bydifferent neuroanatomical structures. However, wefeel that interpretability of the results provided bysuch factorial designs could be improved by using

356

neutral facial expressions (N) and intonations (n) incombination with facial expressions (E) and into-nations (e) of one emotion and then calculate theinteraction accordingly (e.g. (eE – nE) – (eN – nN)).

Correlation analyses between brain activation andcrossmodal behavioural effects

On a behavioural level, crossmodal integration re-sults in shortened response latencies to congruentbimodal information (Miller et al., 1982; Schrogerand Widmann, 1998; de Gelder and Vroomen,2000). Furthermore, judgement of sensory infor-mation from one modality can be influenced byinformation obtained from another modality(McGurk and MacDonald, 1976). To investigatewhich brain regions mediate the McGurk illusion,Jones and Callan (2003) employed the McGurkparadigm in an fMRI experiment and correlatedhaemodynamic responses with the influence of vis-ual cues on judgement of auditory information.Activity in the occipitotemporal junction wasfound to correlate with the strength of the McGurkillusion, suggesting that modulation of responseswithin this region might constitute the neuralsubstrate of the McGurk effect (Jones and Callan,2003). Crossmodal biases also occur in perceptionof emotional information (Massaro and Egan,

Fig. 6. (a) Valence ratings of facial expressions ranging from 0% to

spoken sentence. (b) Correlation analysis between crossmodal impac

revealed a cluster within the left amygdala (MNI coordinates: x ¼ –2

related responses of the left basolateral amygdala for trials in which f

with a fearful voice (red) and trials where no such shift in interpretat

1996; de Gelder and Vroomen, 2000) and it hasbeen demonstrated that fearful or neutral faces areperceived as more fearful when accompanied bya fearful voice (see Fig. 6a, Ethofer et al., 2006). Toexamine which brain regions mediate this shiftin judgement of facial affect, we correlated thedifference in brain responses to facial expressionsin presence and absence of a fearful voice withthe difference of the subjects’ valence rating of thefacial expressions in both conditions. A significantcorrelation was found in the left basolateral am-ygdala extending into the periamygdaloid cortex(see Fig. 6b). This finding indicates that cogni-tive evaluation of emotional information signalingthreat or danger is modulated by amygdalar re-sponses and is in agreement with the view that theamygdala has a key integrative role in processingof emotional content, particularly when fear isexpressed across sensory channels (Dolan et al.,2001). Response-related correlation analyses be-tween brain activation and crossmodal behaviouraleffects represent a useful approach to model sys-tems associated with the behavioural outcome ofmultisenory integration.

Connectivity analyses

All analysis techniques discussed so far are con-cerned with the localization of brain areas that

100% fear in presence (red) and absence (blue) of a fearfully

t of fearful voices on judgement of faces and BOLD response

4, y ¼ –6, z ¼ –24, Z ¼ 3.84, k ¼ 42, po0.05, SVC). (c) Event-

aces were judged as being more fearful when presented together

ion occurred (blue). Modified from Ethofer et al. (2006).

357

mediate a certain cognitive process of interest.Recent developments (Friston et al., 2003; Gitel-man et al., 2003) in modeling effective connectivitybetween brain regions (i.e. the influence one neuralsystem exerts over another, Friston et al., 1997)offer the opportunity to investigate neural inter-actions between sensory systems. In an fMRIstudy on the neural substrates of speaker recogni-tion, von Kriegstein et al. (2005) found that fa-miliar voices activate the fusiform face area (FFA,Kanwisher et al., 1997). A psychophysiological in-teraction (PPI) analysis (Friston et al., 1997) withFFA activity as physiological and familiarity ofthe voices as psychological factor revealed that thiscrossmodal activation of face-sensitive areas by anauditory stimulus was driven by activity of voice-sensitive areas in the middle part of the STS. Onthe basis of these results, von Kriegstein et al.(2005) suggested that assessment of person fami-liarity does not necessarily engage a ‘‘supra-modal cortical substrate’’ but can be the result ofa ‘‘supramodal process’’ constituting an enhancedfunctional coupling of face- and voice-sensitivecortices. In our study, significantly stronger acti-vations were found in the right FFA (MNI co-ordinates: x ¼ 24, y ¼ –69, z ¼ –15, Z ¼ 3.91,k ¼ 46) during judgement of facial expressions inpresence of fearful as compared to happy voices(Ethofer et al., 2006). To elucidate which corticalareas mediate this crossmodal effect of a fearfulvoice on processing within the fusiform face area,a PPI analysis with activity of the right FFA asphysiological and emotion expressed via affectiveprosody (fear or happiness) as psychological factorwas carried out. Increased effective connectivitybetween the right FFA and left basolateral am-ygdala/periamygdaloid cortex (MNI coordinates:x ¼ –18; y ¼ –12; z ¼ –30; Z ¼ 2.68; k ¼ 5,po0.05 small volume corrected for the amygdalarcluster in which activity was correlated with be-havioural responses, see above) was found duringrating of facial expressions in presence of a fearfulas compared to a happy voice. Even at low thresh-olds (po0.05, uncorrected), no evidence for mod-ulation of responses in the right FFA by activity inSTS regions was found, suggesting that crossmo-dal effects of emotional auditory information onFFA responses is not mediated by direct coupling

between voice- and face-sensitive cortices, as de-scribed previously for speaker identification (vonKriegstein et al., 2005), but rather via supramodalrelay areas, such as the amygdala. The amygdala isanatomically well positioned to provide such asupramodal relay function (Murray and Mishkin,1985) since it is bidirectionally connected with vis-ual and auditory higher-order cortices (Pitkanen,2000). Furthermore, augmented effective connec-tivity between amygdala and fusiform cortex incontext of a fearful voice is in agreement with dataobtained from lesion (Adolphs et al., 1994; Scott etal., 1997) and neuroimaging studies (Breiter et al.,1996; Morris et al., 1996; Phillips et al., 1997) im-plicating the amygdala in fear processing, and re-sults from a PET experiment suggest that theamygdala exerts a top-down control on neural ac-tivity in extrastriate cortex (Morris et al., 1998).Although anatomically segregated, voice- andface-processing modules have to interact to forma unified percept of the emotional information ex-pressed via different sensory channels. Analyses ofeffective connectivity have the potential to inves-tigate the interaction between these modules andcould elucidate whether integration is achieved viasupramodal nodes or by direct coupling of face-and voice-sensitive cortices.

Conclusion

Integration of information from different sensorychannels is a complex phenomenon and recruitsseveral cerebral structures. Application of differenttypes of analyses aimed at identifying multisensoryintegration sites to the fMRI data set presentedhere revealed the cortex in the left posteriorSTS, the right posterior insula/claustrum and theleft amygdala as being implicated in audiovisualintegration.

The left posterior STS responded significantlystronger to bimodal stimuli than to isolated pres-entation of either faces or voices as determined bythe conjunction [AV – V] \ [AV – A]. Notably, theleft posterior STS did not show supra-additiveBOLD responses as determined by the compar-ison of bimodal to the sum of unimodal responses(AV – (A+V)). This lack of supra-additivity in

358

posterior STS regions in our study is in agreementwith results obtained in previous fMRI experi-ments on audiovisual integration (Beauchampet al., 2004a, b; van Atteveldt et al., 2004). Onthe basis of the observation that the BOLD re-sponse in posterior STS to audiovisual stimulidoes not exceed the sum of responses to the re-spective unimodal stimuli as expected from single-cell recordings (Bruce et al., 1981; Hikosaka et al.,1988; Barraclough et al., 2005), Beauchamp (2005)challenged the concept of supra-additivity as anappropriate criterion for definition of multisensoryregions in neuroimaging. The ‘‘haemodynamicrefractoriness’’ (Friston et al., 1998) of BOLDresponses to temporally proximate stimuli mightconstitute a possible explanation for this discrep-ancy between fMRI and electrophysiological data.

The right posterior insula/claustrum respondedstronger to congruent audiovisual emotionalinformation than to unimodal information of bothmodalities or to incongruent audiovisual informa-tion. This stronger responsiveness to congruentthan to conflicting audiovisual information is inline with previous reports on the claustrum show-ing stronger responses to synchronized than todesynchronized audiovisual speech (Olson et al.,2002). These converging findings obtained acrossdomains (temporal domain in the study of Olsonet al. (2002) and emotional domain in our study)suggest a more general role of the claustrum inprocesses that determine whether informationgathered across different channels matches ornot. Future studies relying on analyses of effec-tive connectivity should address the questionwhether the claustrum receives its information di-rectly from unimodal cortices as suggested by the‘‘communication relay’’ model or via multimodalcortices in posterior STS.

Activation in the left amygdala was found tocorrelate with changes in rating of emotional facialexpressions induced by a simultaneously presentedfearful voice (Ethofer et al., 2006). Correlation ofamygdalar activity with behavioural effects sug-gests that the amygdala modulates cognitive judge-ments. This finding is consistent with previoussuggestions implicating the amygdala in integrationof emotional information obtained from differentmodalities, particularly if this information signals

threat or danger (Dolan et al., 2001). The rightfusiform cortex showed stronger responses whenfacial expressions were rated in presence of a fear-ful as compared to a happy voice (Ethofer et al.,2006). A psychophysiological interaction analysisrevealed enhanced effective connectivity betweenthe left amygdala and right fusiform cortex pro-viding a possible neural basis for the observed be-havioural effects.

The aim of this chapter was to review the differ-ent methodological approaches to model multi-sensory integration in neuroimaging including con-junction analyses, interaction analyses, correlationanalyses between brain responses and behaviouraleffects and connectivity analyses. None of theseapproaches can be considered as the optimalmethod to clarify as to which brain structures par-ticipate in multisensory integration. Rather, wewould like to emphasize that each of these analyseselucidates different aspects of the interplay of brainregions in integrational processes and thus pro-vides complementing information.

Abbreviations

A auditoryAV audiovisualBOLD blood oxygen level dependentEE bimodal trial with emotional

voice and emotional faceeN bimodal trial with emotional

voice and neutral faceERP event-related potentialsfF bimodal trial with fearful voice

and fearful faceFFA fusiform face areafH bimodal trial with fearful voice

and happy face.fMRI functional magnetic resonance

imagingH0 null hypothesishF bimodal trial with happy voice

and fearful facehH bimodal trial with happy voice

and happy faceMNI Montreal Neurological InstitutenE bimodal trial with neutral voice

and emotional face

359

nN bimodal trial with neutral voiceand neutral face

PET positron emission tomographyPPI psychophysiological interactionSAM self-assessment manikinSPM statistical parametric mappingSTS superior temporal sulcusSVC small volume correctionV visual

Acknowledgments

This study was supported by the Deutsche Forsc-hungsgemeinschaft (SFB 550) and by the JuniorScience Programme of the Heildelberger Academyof Sciences and Humanities.

References

Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1994)

Impaired recognition of emotion in facial expressions fol-

lowing bilateral damage to the human amygdala. Nature,

372: 669–672.

Barraclough, N.F., Xiao, D., Baker, C.I., Oram, M.W. and

Perret, D.I. (2005) Integration of visual and auditory infor-

mation by superior temporal sulcus neurons responsive to the

sight of actions. J. Cogn. Neurosci., 17: 377–391.

Beauchamp, M.S. (2005) Statistical criteria in FMRI studies of

multisensory integration. Neuroinformatics, 3: 93–113.

Beauchamp, M.S., Argall, B.D., Bodurka, J., Duyn, J.H. and

Martin, A. (2004a) Unravelling multisensory integration:

patchy organization within human STS multisensory cortex.

Nat. Neurosci., 7: 1190–1192.

Beauchamp, M.S., Lee, K.E., Argall, B.D. and Martin, A.

(2004b) Integration of auditory and visual information about

objects in superior temporal sulcus. Neuron, 41: 809–823.

Bradley, M.M. and Lang, P.J. (1994) Measuring emotion: The

self-assessment manikin and the semantic differential. J. Be-

hav. Ther. Exp. Psychiatry, 225: 49–59.

Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Ra-

uch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E. and

Rosen, B.R. (1996) Response and habituation of the human

amygdala during visual processing of facial expression. Neu-

ron, 2: 875–887.

Bruce, C.J., Desimone, R. and Gross, C.G. (1981) Visual prop-

erties of neurons in a polysensory area in superior temporal

sulcus of the macaque. J. Neurophysiol., 46: 369–384.

Calvert, G.A. (2001) Crossmodal processing in the human

brain: Insights from functional neuroimaging studies. Cereb.

Cortex, 11: 1110–1123.

Calvert, G.A., Brammer, M.J., Bullmore, E.T., Campbell, R.,

Iversen, S.D. and David, S.A. (1999) Response amplification

in sensory-specific cortices during crossmodal binding. Ne-

uroReport, 10: 2619–2623.

Calvert, G.A., Campbell, R. and Brammer, M. (2000) Evidence

from functional magnetic resonance imaging of crossmodal

binding in human heteromodal cortex. Curr. Biol., 10: 649–657.

Calvert, G.A., Hansen, P.C., Iversen, S.D. and Brammer, M.J.

(2001) Detection of audio-visual integration in humans by

application of electrophysiological criteria to the BOLD

effect. NeuroImage, 14: 427–438.

Calvert, G.A. and Thesen, T. (2004) Multisensory integration:

methodological approaches and emerging principles in the

human brain. J. Physiol. Paris, 98: 191–205.

Chavis, D.A. and Pandya, D.N. (1976) Further observations on

corticofrontal connections in the rhesus monkey. Brain Res.,

117: 369–386.

Collins, D.L., Neelin, P., Peters, T.M. and Evans, A.C. (1994)

Automatic 3D intersubject registration of MR volumetric

data in standardized Talairach space. J. Comput. Assist.

Tomogr., 18: 192–205.

Damasio, A.R. (1989) Time-locke multiregional retroactiva-

tion: a systems-level proposal for the neural substrates of

recall and recognition. Cognition, 33: 25–62.

de Gelder, B., Bocker, K.B.E., Tuomainen, J., Hensen, M.

and Vroomen, J. (1999) The combined perception of emo-

tion from voice and face: early interaction revealed by human

electric brain responses. Neurosci. Lett., 260: 133–136.

de Gelder, B. and Vroomen, J. (2000) The perception of emo-

tion by ear and eye. Cognit. Emotion, 14: 289–312.

Dolan, R.J., Morris, J. and de Gelder, B. (2001) Crossmodal

binding of fear in voice and face. Proc. Natl. Acad. Sci., 98:

10006–10010.

Driver, J. and Spence, C. (2000) Multisensory perception: be-

yond modularity and convergence. Curr. Biol., 10: 731–735.

Ekman, P. and Friesen, W.V. (1976) Pictures of Facial Affect.

Consulting Psychologists Press, Palo Alto.

Ethofer, T., Anders, S., Erb, M., Droll, C., Royen, L., Saur, R.,

Reiterer, S., Grodd, W. and Wildgruber, D. (2006). Impact of

voice on emotional judgment of faces: an event-related fMRI

study. Hum. Brain Mapp., In press. DOI: 10.1002/hbm.20212.

Fallon, J.H., Benevento, L.A. and Loe, P.R. (1978) Frequency-

dependent inhibition to tones in neurons of cat insular cortex

(AIV). Brain Res., 779: 314–319.

Fox, M.D., Snyder, A.Z., Vincent, J.L., Corbetta, M., Van Es-

sen, D.C. and Raichle, M.E. (2005) The human brain is in-

trinsically organized into dynamic anticorrelated functional

networks. Proc. Natl. Acad. Sci., 102: 9673–9678.

Fries, W. (1984) Cortical projections to the superior colliculus

in the macaque monkey: a retrograde study using horseradish

peroxidase. J. Comp. Neurol., 230: 55–76.

Friston, K.J., Buechel, C., Fink, G.R., Morris, J., Rolls, E. and

Dolan, R.J. (1997) Psychophysiological and modulatory in-

teractions in neuroimaging. NeuroImage, 6: 218–229.

Friston, K.J., Harrison, L. and Penny, W. (2003) Dynamic

causal modelling. NeuroImage, 19: 1273–1302.

Friston, K.J., Holmes, A.P., Price, C.J., Buechel, C. and Wo-

rsley, K.J. (1999) Multisubject fMRI studies and conjunction

analyses. NeuroImage, 10: 385–396.

360

Friston, K.J., Josephs, O., Rees, G. and Turner, R. (1998)

Nonlinear event-related responses in fMRI. Magn. Res.

Med., 39: 41–52.

Friston, K.J., Mechelli, A., Turner, R. and Price, C.J. (2000)

Nonlinear responses in fMRI: the balloon model, volterra

kernels, and other hemodynamics. NeuroImage, 12: 466–477.

Friston, K.J., Penny, W. and Glaser, D.E. (2005) Conjunction

revisited. NeuroImage, 25: 661–667.

Gitelman, D.R., Penny, W.D., Ashburner, J. and Friston, K.J.

(2003) Modelling regional and psychophysiologic interac-

tions in fMRI: the importance of hemodynamic deconvolu-

tion. NeuroImage, 19: 200–207.

Gordon, B.G. (1973) Receptive fields in the deep layers of the

cat superior colliculus. J. Neurophysiol., 36: 157–178.

Goldman-Racic, P. (1995) Architecture of the prefrontal cortex

and the central executive. In: Grafman, J., Holyoak, K. and

Bollder, F. (Eds.), Structure and Functions of the Human

Prefrontal Cortex. The New York Academy of Sciences, NY,

USA, pp. 71–83.

Grefkes, C., Weiss, P.H., Zilles, K. and Fink, G.R. (2002)

Crossmodal processing of object features in human anterior

intraparietal cortex: an fMRI study implies equivalencies be-

tween humans and monkeys. Neuron, 35: 173–184.

Hikosaka, K., Iwai, E., Saito, H. and Tanaka, K. (1988)

Polysensory properties of neurons in the anterior bank of the

caudal superior temporal sulcus of the macaque monkey. J.

Neurophysiol., 60: 1615–1637.

Jones, J.A. and Callan, D.E. (2003) Brain activity during au-

diovisual speech perception: an fMRI study of the McGurk

effect. NeuroReport, 14: 1129–1133.

Jones, E.G. and Powell, T.P.S. (1970) An anatomical study of

converging sensory pathways within the cerebral cortex of

the monkey. Brain, 93: 793–820.

Kanwisher, N., Mc Dermott, J. and Chun, M.M. (1997) The

fusiform face area: a module in human extrastriate cortex

specialized for face processing. J. Neurosci., 17: 4302–4311.

Lewis, J.W., Beauchamp, M.S. and DeYoe, E.A. (2000) A

comparison of visual and auditory motion processing in hu-

man cerebral cortex. Cereb. Cortex, 10: 873–888.

Loe, P.R. and Benevento, L.A. (1969) Auditory–visual inter-

action in single units in the orbito-insular cortex of the cat.

Electroencephalogr. Clin. Neurophysiol., 26: 395–398.

Macaluso, E., Frith, C.D. and Driver, J. (2000) Selective spatial

attention in vision and touch: unimodal and multimodal

mechanisms revealed by PET. J. Neurophysiol., 83: 3062–3075.

Massaro, D.W. and Egan, J.W. (1996) Perceptual recognition

of facial affect: cross-cultural comparisons. Mem. Cogn., 24:

812–822.

McDonald, A.J. (1998) Cortical pathways to the mammalian

amygdala. Prog. Neurobiol., 55: 257–332.

McGurk, H. and MacDonald, J. (1976) Hearing lips and seeing

voices. Nature, 264: 746–748.

Mechelli, A., Price, C.J. and Friston, K.J. (2001) Nonlinear

coupling between evoked rCBF and BOLD signals: a simu-

lation study on hemodynamic responses. NeuroImage, 14:

862–872.

Meredith, M.A. and Stein, B.E. (1983) Interactions among

converging sensory inputs in the superior colliculus. Science,

221: 389–391.

Mesulam, M.M. (1998) From sensation to cognition. Brain,

121: 1013–1052.

Mesulam, M.M. and Mufson, E.J. (1982) Insula of the old

world monkey. III: efferent cortical output and comments on

function. J. Comp. Neurol., 212: 38–52.

Miller, J.O. (1982) Divided attention: evidence for coactivation

with redundant signals. Cogn. Psychol., 14: 247–279.

Miller, J.O. (1986) Time course of coactivation in bimodal

divided attention. Percept. Psychophys., 40: 331–343.

Morris, J.S., Friston, K.J., Buechel, C., Frith, C.D., Young,

A.W., Calder, A.J. and Dolan, R.J. (1998) A neuromodula-

tory role for the human amygdala in processing emotional

facial expressions. Brain, 121: 47–57.

Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young,

A.W., Calder, A.J. and Dolan, R.J. (1996) A differential

response in the human amygdala to fearful and happy facial

expressions. Nature, 383: 812–815.

Mufson, E.J. and Mesulam, M.M. (1984) Thalamic connections

of the insula in the rhesus monkey and comments on the

paralimbic connectivity of the medial pulvinar nucleus.

J. Comp. Neurol., 227: 109–120.

Murray, E.A. and Mishkin, M. (1985) Amygdalactomy impairs

crossmodal association in monkeys. Science, 228: 604–606.

Nichols, T., Brett, M., Andersson, J., Wager, T. and Poline,

J.B. (2005) Valid conjunction inference with the minimum

statistic. NeuroImage, 25: 653–660.

Olson, I.R., Gatenby, J.C. and Gore, J.C. (2002) A comparison

of bound and unbound audio–visual information processing

in the human cerebral cortex. Brain Res. Cogn. Brain Res.,

14: 129–138.

Peck, C.K. (1987) Auditory interactions in cat’s superior col-

liculus: their role in the control of gaze. Brain Res., 420:

162–166.

Pearson, R.C., Brodal, P., Gatter, K.C. and Powell, T.P.

(1982) The organisation of the connections between the cor-

tex and the claustrum in the monkey. Brain Res., 234:

435–441.

Phillips, M.L., Young, A.W., Scott, S.K., Calder, A.J., Andrew,

C., Giampietro, V., Williams, S.C., Bullmore, E.T., Brammer,

M. and Gray, J.A. (1997) Neural responses to facial and vocal

expressions of fear and disgust. Proc. R. Soc. Lond. Ser. B,

265: 1809–1817.

Pitkanen, A. (2000). Connectivity of the rat amygdaloid com-

plex. In: The Amygdala: A Functional Analysis. Oxford

University Press, New York.

Pourtois, G., Debatisse, D., Despland, P.A. and de Gelder, B.

(2002) Facial expressions modulate the time course of long

latency auditory potentials. Cogn. Brain Res., 14: 99–105.

Pourtois, G., de Gelder, B., Bol, A. and Crommelinck, M.

(2005) Perception of facial expression and voices and their

combination in the human brain. Cortex, 41: 49–59.

Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B. and

Crommelinck, M. (2000) The time-course of intermodal

361

binding between seeing and hearing affective information.

NeuroReport, 11: 1329–1333.

Price, C.J. and Friston, K.J. (1997) Cognitive conjunctions: a

new approach to brain activation experiments. NeuroImage,

5: 261–270.

Raichle, M.E., MacLeod, A.M., Snyder, A.Z., Powers, W.J.,

Gusnard, D.A. and Shulman, G.L. (2003) A default mode of

brain function. Proc. Natl. Acad. Sci., 98: 676–682.

Schroger, E. and Widmann, A. (1998) Speeded responses to

audiovisual signal changes result from bimodal integration.

Psychophysiology, 35: 755–759.

Scott, S.K., Young, A.W., Calder, A.J., Hellawell, D.J., Aggl-

eton, J.P. and Johnson, M. (1997) Impaired auditory recog-

nition of fear and anger following bilateral amygdala lesions.

Nature, 385: 254–257.

Seltzer, B. and Pandya, D.N. (1978) Afferent cortical connections and

architectonics of the superior temporal sulcus and surrounding cortex

in the rhesus monkey. Brain Res., 149: 1–24.

Stein, B.E., London, N., Wilkinson, L.K. and Price, D.D. (1996)

Enhancement of perceived visual intensity by auditory stimuli:

a psychophysical analysis. J. Cogn. Neurosci., 8: 497–506.

Stein, B.E. and Meredith, M.A. (1993) Merging of Senses. MIT

Press, Cambridge.

Stein, B.E. and Wallace, M.T. (1996) Comparison of cross-

modal integration in midbrain and cortex. Prog. Brain Res.,

112: 289–299.

Turner, B.H., Mishkin, M. and Knapp, M. (1980) Organization

of the amygdalopetal projections from modality-specific

cortical association areas. J. Comp. Neurol., 191: 515–543.

van Atteveldt, N., Formisano, E., Goebel, R. and Blomert,

L. (2004) Integration of letters and speech sounds in the hu-

man brain. Neuron, 43: 271–282.

von Kriegstein, K., Kleinschmidt, A., Sterzer, P. and Giraud,

A.-L. (2005) Interaction of face and voice areas during

speaker recognition. J. Cogn. Neurosci., 17: 367–376.

Vroomen, J., Driver, J. and de Gelder, B. (2001) Is cross-modal

integration of emotional expressions independent of attent-

ional resources? Cogn. Affect. Behav. Neurosci., 1: 382–387.

Wallace, M.T., Meredith, M.A. and Stein, B.E. (1992) Integra-

tion of multiple sensory modalities in cat cortex. Exp. Brain.

Res., 91: 484–488.

Wallace, M.T., Meredith, M.A. and Stein, B.E. (1993) Con-

verging influences from visual, auditory, and somatosensory

cortices onto output neurons of the superior colliculus.

J. Neurophysiol., 69: 1797–1809.

Wallace, M.T., Wilkinson, L.K. and Stein, B.E. (1996) Repre-

sentation and integration of multiple sensory inputs in pri-

mate superior colliculus. J. Neurophysiol., 76: 1246–1266.

White, M. (1999) Representation of facial expressions of emo-

tion. Am. J. Psychol., 112: 371–381.

Worsley, K.J., Marrett, S., Neelin, P., Vandal, A.C., Friston,

K.J. and Evans, A.C. (1996) A unified statistical approach

for determining significant signals in images of cerebral ac-

tivation. Hum. Brain Mapp., 4: 58–73.

Wright, T.M., Pelphrey, K.A., Allison, T., McKeown, M.J. and

McCarthy, G. (2003) Polysensory interactions along lateral

temporal regions evoked by audiovisual speech. Cereb. Cor-

tex,, 13: 34–43.