13
ORIGINAL RESEARCH Cross-modal representations in early visual and auditory cortices revealed by multi-voxel pattern analysis Jin Gu 1 & Baolin Liu 2 & Xianglin Li 3 & Peiyuan Wang 4 & Bin Wang 3 # Springer Science+Business Media, LLC, part of Springer Nature 2019 Abstract Primary sensory cortices can respond not only to their defined sensory modality but also to cross-modal information. In addition to the observed cross-modal phenomenon, it is valuable to research further whether cross-modal information can be valuable for categorizing stimuli and what effect other factors, such as experience and imagination, may have on cross-modal processing. In this study, we researched cross-modal information processing in the early visual cortex (EVC, including the visual area 1, 2, and 3 (V1, V2, and V3)) and auditory cortex (primary (A1) and secondary (A2) auditory cortex). Images and sound clips were presented to participants separately in two experiments in which participantsimagination and expectations were restricted by an orthogonal fixation task and the data were collected by functional magnetic resonance imaging (fMRI). We successfully decoded categories of the cross-modal stimuli in the ROIs except for V1 by multi-voxel pattern analysis (MVPA). It was further shown that familiar sounds had the advantage of classification accuracies in V2 and V3 when compared with unfamiliar sounds. The results of the cross-classification analysis showed that there was no significant similarity between the activity patterns induced by different stimulus modalities. Even though the cross-modal representation is robust when considering the restriction of top-down expectations and mental imagery in our experiments, the sound experience showed effects on cross-modal repre- sentation in V2 and V3. In addition, primary sensory cortices may receive information from different modalities in different ways, so the activity patterns between two modalities were not similar enough to complete the cross-classification successfully. Keywords Cross-modal . Auditory cortex . Early visual cortex . MVPA . fMRI Introduction We perceive the environment through multiple sensory chan- nels, and sensations are perceived only if received and processed by the primary sensory cortices. As important cortical areas, the primary sensory cortices have received much attention in neu- roimaging researches, especially the early visual cortex (EVC) and primary auditory cortex. The primary sensory cortices have traditionally been considered exclusive to their respective mo- dalities (Mesulam 1998). However, it has also been proposed that the primary sensory cortices may be multisensory in nature (Ghazanfar and Schroeder 2006). The roles for sensory-specific brain areas have been uncovered in multisensory or cross- sensory perception in recent years (Driver and Noesselt 2008). Many neuroimaging studies have indicated that multisen- sory interactions take place even in primary sensory cortices (Kayser et al. 2008; Werner and Noppeney 2010; Klemen and Chambers 2012), and the mechanism by which these interac- tions occur in early sensory may differ from that found in association cortices (Rohe and Noppeney 2016). Some re- searchers thought that the modulation of the cross-modal stim- uli to the primary sensory cortices (e.g., visual stimuli to the auditory cortex) relied on the fiber tracts between primary sensory areas (Beer et al. 2011, 2013). Direct projections * Baolin Liu [email protected] 1 College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin 300350, Peoples Republic of China 2 School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, Peoples Republic of China 3 Medical Imaging Research Institute, Binzhou Medical University, Yantai, Shandong 264003, Peoples Republic of China 4 Department of Radiology, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, Shandong 264003, Peoples Republic of China Brain Imaging and Behavior https://doi.org/10.1007/s11682-019-00135-2

Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

ORIGINAL RESEARCH

Cross-modal representations in early visual and auditory corticesrevealed by multi-voxel pattern analysis

Jin Gu1& Baolin Liu2

& Xianglin Li3 & Peiyuan Wang4& Bin Wang3

# Springer Science+Business Media, LLC, part of Springer Nature 2019

AbstractPrimary sensory cortices can respond not only to their defined sensory modality but also to cross-modal information. In additionto the observed cross-modal phenomenon, it is valuable to research further whether cross-modal information can be valuable forcategorizing stimuli and what effect other factors, such as experience and imagination, may have on cross-modal processing. Inthis study, we researched cross-modal information processing in the early visual cortex (EVC, including the visual area 1, 2, and 3(V1, V2, and V3)) and auditory cortex (primary (A1) and secondary (A2) auditory cortex). Images and sound clips werepresented to participants separately in two experiments in which participants’ imagination and expectations were restricted byan orthogonal fixation task and the data were collected by functional magnetic resonance imaging (fMRI). We successfullydecoded categories of the cross-modal stimuli in the ROIs except for V1 by multi-voxel pattern analysis (MVPA). It was furthershown that familiar sounds had the advantage of classification accuracies in V2 and V3 when compared with unfamiliar sounds.The results of the cross-classification analysis showed that there was no significant similarity between the activity patternsinduced by different stimulus modalities. Even though the cross-modal representation is robust when considering the restrictionof top-down expectations and mental imagery in our experiments, the sound experience showed effects on cross-modal repre-sentation in V2 and V3. In addition, primary sensory corticesmay receive information from different modalities in different ways,so the activity patterns between two modalities were not similar enough to complete the cross-classification successfully.

Keywords Cross-modal . Auditory cortex . Early visual cortex .MVPA . fMRI

Introduction

We perceive the environment through multiple sensory chan-nels, and sensations are perceived only if received and processed

by the primary sensory cortices. As important cortical areas, theprimary sensory cortices have received much attention in neu-roimaging researches, especially the early visual cortex (EVC)and primary auditory cortex. The primary sensory cortices havetraditionally been considered exclusive to their respective mo-dalities (Mesulam 1998). However, it has also been proposedthat the primary sensory cortices may be multisensory in nature(Ghazanfar and Schroeder 2006). The roles for sensory-specificbrain areas have been uncovered in multisensory or cross-sensory perception in recent years (Driver and Noesselt 2008).

Many neuroimaging studies have indicated that multisen-sory interactions take place even in primary sensory cortices(Kayser et al. 2008; Werner and Noppeney 2010; Klemen andChambers 2012), and the mechanism by which these interac-tions occur in early sensory may differ from that found inassociation cortices (Rohe and Noppeney 2016). Some re-searchers thought that the modulation of the cross-modal stim-uli to the primary sensory cortices (e.g., visual stimuli to theauditory cortex) relied on the fiber tracts between primarysensory areas (Beer et al. 2011, 2013). Direct projections

* Baolin [email protected]

1 College of Intelligence and Computing, Tianjin Key Laboratory ofCognitive Computing and Application, Tianjin University,Tianjin 300350, People’s Republic of China

2 School of Computer and Communication Engineering, University ofScience and Technology Beijing, Beijing 100083,People’s Republic of China

3 Medical Imaging Research Institute, Binzhou Medical University,Yantai, Shandong 264003, People’s Republic of China

4 Department of Radiology, Yantai Affiliated Hospital of BinzhouMedical University, Yantai, Shandong 264003,People’s Republic of China

Brain Imaging and Behaviorhttps://doi.org/10.1007/s11682-019-00135-2

Page 2: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

between primary sensory cortices were also found in someneurophysiological studies of animals and these projectionsshowed a feedforward nature of cross-sensory modulation(Klemen and Chambers 2012; Sieben et al. 2013; Iurilliet al. 2012). Primary sensory cortices not only are involvedin the multisensory interactions but also make direct responsesto the cross-modal information without any corresponding-modal input. Silent videos that imply sounds (e.g., a videoof a howling dog with the sound removed) have been success-fully discriminated in the early auditory cortex (Meyer et al.2010), while visual objects can be decoded in the early so-matosensory cortex (Meyer et al. 2011; Smith and Goodale2015), and the early visual cortex can represent the naturalsound clips (Vetter et al. 2014). According to these studies,cross-modal stimuli can induce content-specific representa-tion in the primary sensory cortices. Overall, in many studies,primary sensory cortices have been suggested to deal withcross-modal information, but further studies are needed toexplore the specific mechanisms underlying cross-modal in-formation processing.

In the cross-modal phenomenon, many factors are in-volved; for example, people may recall the sound producedby a familiar animal when shown a silent video of that animalor imagine the sound produced when a silent video shows aman drumming his finger on a table (Albers et al. 2013; Koket al. 2014). Information recall or imagination, which originatefrom the unimodal sensory experience and contextual infor-mation, are thought to play a crucial role in cross-modal pro-cessing (Sieben et al. 2015; Hsieh et al. 2012). The EVCshowed a content-specific representation of imagery fromsound information; it also successfully classified the soundeven when participants performed orthogonal attention andworking memory tasks, which required the use of imagination(Vetter et al. 2014), so the role of recall and imagination incross-modal processing remains unclear. However, animal ex-periments indicate that it is possible that primary sensory cor-tices can respond to the cross-modal stimuli directly, and thus,multisensory processing emerges before the cross-modal ex-perience (Bieler et al. 2017b).

Previous research indicates that primary sensory corticesparticipate in processes involving cross-modal stimuli; how-ever, it is still unclear whether stimulus category informationcan be successfully decoded and how experience and otherinternal factors such as imagination could affect cross-modalprocessing. Referring to the research about cross-modal rep-resentation in EVC (Smith and Goodale 2015; de Haas et al.2013), we are wondering whether the auditory cortical re-sponses can be induced just by the appearance of stimuli whenrestricting subjects’ higher order perception or top-down in-formation, including imagination, experience and so on.While that previous research focused more on content-specific representation (Meyer et al. 2010), we chose to inves-tigate the category-specific representation of cross-modal

information, asking whether category information of cross-modal stimuli can be decoded in primary sensory corticeseven when there is more variation in one category. Further,we examined whether the representation of cross-modal infor-mation shares a similar pattern to that induced by the corre-sponding modality. In the present study, we focused on cross-modal representation in the early visual cortex and auditorycortex simultaneously. We hypothesized that these regionswould respond to the cross-modal information even whenthe top-down information was restricted, and the cross-modal information would be decoded according to the cate-gory specificity of stimuli. In addition, we speculated that thecross-modal representation would not be similar to the patterninduced by the corresponding modality. To test these hypoth-eses, we designed two experiments. The visual experimentcontained eight categories in total based on three factors: an-imal or object, familiar or unfamiliar, and sound-implying orwith less sound implication. Participants were asked to countthe number of red points appearing in the screen center whenthey perceived static images. In the auditory experiment,sounds clips corresponding to the images used in the visualexperiment were presented, and participants were asked to tryto avoid visualizing the stimuli.

The majority of previous neuroimaging research adoptedtraditional univariate analysis, such as the general linear mod-el (GLM), to study activation in specific experimental condi-tions on a voxel-by-voxel basis. However, only the statistical-ly significant voxels were reported in the univariate analysis,which led to a loss of pattern information across voxels. Inrecent years, multi-voxel pattern analysis (MVPA) has beenwidely used for decoding neural activity patterns in the cortex,which can help decode content-specific representations evenwhen the activation levels are low in the region of interests(ROIs) (Harrison and Tong 2009). In this study, we first per-formed the univariate analysis to examine whether there issignificant activation in the ROIs caused by cross-modal in-formation. To investigate whether or not the patterns in theROIs are category-specific to the cross-modal stimuli, MVPAwas conducted to classify the stimulus categories.Furthermore, a cross-classification analysis based on MVPAwas performed across the visual and auditory stimuli to ex-plore whether there is a similar pattern for different modalitiesin the same sensory cortices.

Materials and methods

Subjects

Twenty healthy, right-handed volunteers without any historyof neurologic or psychiatric disorders participated in this study(10 males, 10 females; mean age 21.5 years, range 20–23 years old). They all had a normal or corrected-to-normal

Brain Imaging and Behavior

Page 3: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

vision and had no auditory deficit. Two subjects were exclud-ed from further data analyses due to the excessive head move-ment (more than 3 mm in translation or rotation) during scan-ning. This study was approved by the Institutional ReviewBoard (IRB) of Tianjin University. Written informed consentwas collected from all participants before the experiment.

Stimuli

In the visual experiment, 64 full-color images were presentedin a block design protocol, comprising two semantic catego-ries (animals and objects) that were grouped into familiar andunfamiliar categories according to a familiarity evaluation per-formed by volunteers. Half of these images imply sounds,while the rest share the same exemplars without implyingsounds significantly. For example, a picture with a horsestanding quietly is a stimulus with less sound-implication,while a picture of a horse neighing with neck craned andmouth open belongs to the sound-implying category. All im-ages were grouped into 8 sub-categories based on three

aspects (animals or objects falling into four categories: famil-iar and sound-implying, unfamiliar and sound-implying, fa-miliar and with less sound-implication, unfamiliar and withless sound-implication). Figure 1a provides exemplars of eachcategory. The final set of stimuli was constructed based on apool of 384 pictures that were chosen according to the defini-tion of the 8 subcategories, and the degree of familiarity ofeach stimulus was rated by another 13 volunteers. We choseunfamiliar stimuli from the lowest scoring images and thefamiliar stimuli from the highest scoring images; in total, 64images were chosen for the final stimuli set. The images wereedited to 400 * 400 pixels with the same brightness, contrast,and format using Adobe Photoshop.

The auditory experiment used 32 sound clips correspond-ing to the sound-implying images, which were divided intofour categories (familiar animals, unfamiliar animals, familiarobjects, and unfamiliar objects). Sounds for the objects weregenerated by performing an easily understood action on them.For example, the sound of the paper was recorded when thepaper was torn. All sound stimuli were edited to 2.5 s in

Fig. 1 Stimuli exemplar and schematic representation of theexperiments design. a Both visual and auditory stimuli comprised twosemantic categories (animals and objects), which were grouped intofamiliar and unfamiliar categories. Images were also separated into twogroups according to the degree of sound implication. Y indicatessignificant sounds implication in the image while N indicates no soundimplication. Therefore, the auditory stimuli comprised 4 subcategoriesand visual stimuli comprised 8 subcategories. b A typical run of the

visual experiment using a block design. After a 10 s fixation at thebeginning, 8 images from the same category appeared in a block. Then,a button task was completed in 2 s, during which participants indicatedthe number of red dots they had seen in the previous block. After that, an8 s fixation was presented before the start of the next block. c A typicalrun of the auditory experiment using a block design. After a 10 s fixationat the beginning, 8 sound clips from the same category were presented ina block, followed by a 10 s fixation before the start of the next block

Brain Imaging and Behavior

Page 4: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

duration and converted to two channels (mono, 44.1 kHz, 16-bit), 80 dB C-weighted in both ears (GoldWave Software).These stimuli were also assessed by the same 13 volunteersas the visual stimuli to ensure recognition.

Task and procedure

A block design was adopted in our experiments. The visualexperiment consisted of six runs, and each run lasted for4 min and 58 s with an initial 10 s of fixation. Figure 1bshows a schematic diagram for a typical run. Each runcomprised 16 blocks (two blocks for each category) thatwere randomly presented. In each block, 8 different imagesbelonging to the same category were presented in se-quence, and each image appeared for 800 ms, followedby 200 ms inter-stimulus interval (ISI). While the stimuliwere presented, participants were asked to fixate on a redor green dot presented above the image at the center of thescreen. In each block, the red dot appeared once, twice orthree times, and participants responded with how manytimes the red dot appeared, making an answer via a buttonbox within a 2-s interval after the end of the block. Thistask is orthogonal to the category of the visual stimulipresented in order to control participants’ attention. Boththe static images and the orthogonal task helped to reducethe imagination response or memory of the correspondingsounds. After the 2-s response interval, a white fixationregion with a gray background was presented for 8 s beforethe start of the next block. We used the orthogonal task totry to reduce imagination in the visual experiment, unlikeprevious studies using videos as stimuli that did not per-form any attention restriction. Since the subjects’ imagina-tions cannot be totally restricted, the sound-implying im-ages can help to isolate the top-down influences more di-rectly when responses to these images are compared withresponses to images with less sound implication.

In the auditory experiment, two runs were conducted, witheach run including an initial 10-s fixation and 8 randomlypresented blocks (two blocks for each category) (Fig. 1c).Each run lasted for 4 min and 42 s. For each block, one2.5-s sound clip followed by a 0.5-s ISI were repeated 8 timeswith different stimuli. Participants were blindfolded andinstructed to keep their eyes closed and focus on the soundclips. There was a 10-s interval between stimuli blocks. Allblocks and all trials within each block were presented ran-domly. Previous studies have suggested that auditory stimulican be classified when subjects perform orthogonal attentionand working memory task, while an orthogonal task was notused in visual experiments in previous cross-modal research.Thus, we did not repeat the auditory orthogonal task. Instead,subjects were instructed to focus on the sound clips and to tryto avoid imagination.

MRI data acquisition

fMRI data were acquired on a 3.0 Tesla Siemens Skyra scan-ner with an 8-channel head coil at Yantai AffiliatedHospital ofBinzhou Medical University. To reduce the head motion andscanner noise, foam pads and earplugs were used during thescanning (Yang et al. 2017; Yin Liang et al. 2018). Structuralimages were collected using a whole-brain structural T1-weighted MPRAGE sequence (repetition time (TR) =1900 ms, echo time (TE) = 2.52 ms, voxel size = 1 × 1 ×1 mm3, matrix size = 256 × 256, flip angle (FA) = 90°). Agradient-echo planar imaging (EPI) sequence (TR =2000 ms, TE = 30 ms, voxel size = 3.1 × 3.1 × 4.0 mm3, ma-trix size = 64 × 64, slices = 33, slices thickness = 4 mm, slicesgap = 0.6 mm, FA = 90°) was used for functional data collec-tion. The stimuli presentation and behavioral response collec-tion were performed by E-Prime 2.0 Professional (PsychologySoftware Tools, Pittsburgh, PA, USA) through the audio-visual somatosensory device equipment with high-resolutionglasses and headphones.

Data preprocessing

Functional data were analyzed using SPM8 package (http://www.fil.ion.ucl.ac.uk/spm). The first five functional volumesfrom each run were discarded to allow for T1 equilibrationeffects (Zeidman et al. 2015; Liang et al. 2013b). The remain-ing images were slice-time corrected to the middle slice andrealigned to the first image to correct for head movementsusing a 6-parameter affine transformation. Then the high-resolution T1 images were co-registered to the mean function-al image and the functional images were normalized to theMontreal Neurological Institute (MNI) space using the ana-tomical image unified segmentation, resampled to an isotropicvoxel size of 3 mm. The normalized data were spatiallysmoothed with a 6 × 6 × 6 mm3 full-width at half maximum(FWHM) Gaussian kernel. Noted that the smoothed and un-smoothed data were separately used in the univariate analysisand pattern classification in the subsequent analyses.

ROI definition

The main objective of this study was to explore the cross-modal information decoding in primary visual and auditorycortices, so it was important to exclude any multimodalcortical areas from the analysis. The auditory cortex, inparticular, is located in close proximity to the multisensoryarea of the superior temporal gyrus, so a functionallocalizer would increase the risk of labeling multimodalareas, compared with an anatomical mask (Meyer et al.2010). Therefore, the anatomical regions in the SPMAnatomy toolbox (Eickhoff et al. 2005) were used to helpdefine the ROIs that were used in the cross-modal study

Brain Imaging and Behavior

Page 5: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

(M. Liang et al. 2013a). It is worth noting that we restrictedthe primary auditory cortex in Heschl’s gyrus as far aspossible to limit multimodal areas. In addition, we alsotake the secondary auditory cortex (A2) into consideration,which can be used as a reference for A1. For consistencywith the definition of primary auditory cortex, the EVCwas also defined by probabilistic anatomical atlases inthe SPM Anatomy toolbox on the single subject level,comprising V1, V2, and V3 (M. Liang et al. 2013a). A1was defined as the primary auditory cortex. To verify theclassification performance outside the primary cortices, wedefined an area in the anterior cingulate cortex that waslocated with the WFU_Pick Atlas as a control area to ver-ify the validity of the decoding method. The number ofvoxels included in the control ROI was approximatelyequal to the average number of voxels in other ROIs. AllROIs were shown in Fig. 2.

Univariate analysis

The generalized linear model (GLM) was used to analyze thefunctional data in the visual and auditory experiments withSPM8. First, we defined contrasts on the subcategory of thestimuli and obtained the first-level statistical parametric mapsfor each participant using the smoothed data. Next, the ran-dom effects were estimated using a one-sample t-test to ex-plore the activated regions in the visual and auditory experi-ments. Finally, the mean beta values under different condi-tions were computed in all ROIs, and paired t-tests were per-formed to discover the differences within the different subcat-egories in the same ROI.

Multi-voxel pattern analysis

To explore whether the neural activation was category-specific in the noncorresponding primary sensory cortices,MVPA was performed in combination with a linear support

vector machine (lib-SVM) classifier which was trained usingthe libSVM’s default parameters (Chang and Lin 2011).Unsmoothed functional data used in the MVPAwere normal-ized to z-scores by z transformations before training the clas-sifiers, which were used to remove the differences induced bythe base-line shift between runs. Principal component analysis(PCA) was performed first for feature selection to estimatefixable features for different ROIs. The classification analysiswas then conducted through a leave-one run out cross-validation approach. For example, the classifier was trainedon five runs and tested on the rest run, which was repeated sixtimes with each run as the test, and then the classificationaccuracy was averaged. Because there is less correlation be-tween different runs, the method can help us obtain reliableresults (Liang et al. 2017; Wolbers et al. 2011; Axelrod andYovel 2012). Afterward, the one-sample t-tests were conduct-ed on the classification performances to test for significanceand the mean accuracy across participants was calculated fromthe group-level accuracies. Paired samples t-test were usedwhen we try to compare the classification accuracies in twoROIs. The Shapiro–Wilk test was used to verify whether anormal distribution is obeyed before the statistical test, and itwas shown that the data followed a normal distribution with ap value which was greater than 0.05.

We trained a pair-wise classifier in the visual experiment todiscriminate the semantic category as animals or objects. An8-way classifier was trained for distinguishing the subcate-gories. In the auditory experiment, the pairwise classifierswere trained to discriminate animals or objects and we alsotrained a 4-way classifier for distinguishing the subcategories.Additionally, we performed the pairwise classifications be-tween animals and objects by controlling the familiarity anddegree of sound-implication. We first distinguished the famil-iar animals and familiar objects, and then compared the clas-sification performance with the accuracy of discriminating theunfamiliar animals and unfamiliar objects, which can helpinvestigate the effects of higher level information on the

Fig. 2 ROI locations. To excludeany multimodal cortices, we usedanatomical regions as the ROIsobtained with the SPM Anatomytoolbox and the WFU_Pick Atlasas ROIs. A1: primary auditorycortex; A2: higher-order auditorycortex (as a reference for A1); V1,V2, andV3: the early visual areas;Control region: a part of the ante-rior cingulate cortex

Brain Imaging and Behavior

Page 6: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

cross-modal information representation. The same analysiswas conducted to examine the impact of the sound implicationin images, which may contribute to the decoding of cross-modal information in primary sensory cortices.

Results

Univariate analysis

As expected, most activation was distributed in the occipitalareas when participants were viewing images. Moreover, ac-tivation was also observed in parts of A1 despite the absenceof sound (t = 2.13, p = 0.024, FDR correction). In addition tothe specific ROIs, activation was found in other areas includ-ing the superior temporal sulcus (STS), middle frontal gyrus(MFG), inferior frontal gyrus (IFG) and postcentral gyrus.Unsurprisingly, the main area activated in the auditory exper-iment was located in the temporal cortex. The sound stimulidid not produce significant activation in the EVC at the grouplevel even though activation could be found in the EVC inevery single subject. In addition, we found weak activations inhigher-order visual areas. Parts of the middle occipital gyruswere found to be activated at the individual level when aliberal threshold was set at p = 0.05. Moreover, we observedsignificant activation in the frontal cortex, including the MFGand IFG. The group-level activations are shown in Fig. 3. Inaddition, the mean beta values were computed in all ROIs, andthe paired t-test indicated that there was no significant differ-ence between the animal and object conditions in the sameROI (p > 0.05).

MVPA analysis results

Classification analysis of the category informationin specific ROIs

In the visual experiment, the pairwise classification analysis todecode the animal and object representation showed that thevisual stimuli could be successfully decoded in A1 (t(17) =2.866, p = 0.006), significantly higher than the chance level of50% by performing 1-sample t-tests (one-tailed). Except forthe control region, the classification accuracy was significant-ly above chance in the EVC and A2, with detailed resultsreported in Table 1. The paired t-test between A1 and A2suggested that the classification accuracy of A2 was signifi-cantly higher than that of A1 (t(17) = 2.87, p = 0.005). Next, 8-way classification analysis was performed to decode the im-ages according to their 8 subcategories, which was successfulin all ROIs and is shown directly in Fig. 4b. In view of thesignificance in the control region, we performed paired t-testsbetween the control region and other ROIs to examine thedecoding effectiveness, especially A1 and A2. The resultshowed that the classification accuracy in the control regionwas significantly lower than that of A1 (t(17) = 3.09, p =0.003) and A2 (t(17) = 4.21, p < 0.001).

In the auditory experiment, the performance of the 2-wayclassifiers showed statistically significant accuracies in A1(t(17) = 11.872, p < 0.001), A2 (t(17) = 14.837, p < 0.001), V2(t(17) = 4.036, p < 0.001) and V3 (t(17) = 5.062, p < 0.001), com-pared with the chance level (50%). However, the classifiercouldn’t distinguish the sound category in V1 (t(17) = 1.136,p = 0.163) and the control region (t(17) = 0.569, p = 0.288).Figure 4a shows the classification accuracies between images

Fig. 3 Group-level activations in univariate analysis. Color coding fort-values is indicated by the color bar, right. a In the visual experiment,most activation was found in the occipital areas. Activation was alsoobserved in parts of the auditory area. Activation was also found in someadditional areas including the STS, MFG, IFG and postcentral gyrus. b In

the auditory experiment, most activation was found in the temporal cor-tex. The sound stimuli did not produce activation in the EVC, even at aliberal threshold (p < 0.05, uncorrected). Moreover, we observed signifi-cant activation in the frontal cortex, including the MFG and IFG

Brain Imaging and Behavior

Page 7: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

and sounds of two categories in each ROI. In addition, the clas-sification analysis of subcategory information suggested a similarperformance when distinguishing animal and object stimuli,which is shown in Fig. 4c and the detailed classification accura-cies are shown in Tables 1 and 2.

Influence of familiarity and degreeof sound-implication

To investigate whether the prior experience of the stimuli had aninfluence on the animal-object decoding, classifiers were trained

for each subject to classify the animals and objects respectively inthe familiar and unfamiliar groups. The classification analyseswere performed in both the visual and auditory experiments. Inthe visual decoding of animal-object, the paired t-test (one-tailed)indicated that stimulus familiarity had no impact on classificationaccuracy (p > 0.05, uncorrected), shown in Fig. 5. However,there was higher accuracy in V2 (t(17) = 2.700, p = 0.008) andV3 (t(17) = 1.974, p = 0.032) when classifying the familiar ani-mal and object sounds in the auditory experiment. Table 3 showsthe classification performance of the familiar and unfamiliargroups in the visual and auditory experiments for all ROIs.

Table 1 Classification results ofvisual stimuli in each ROI Visual stimuli decoding

2-way classification 8-way classification

Accuracy (%) SEM t(17) P Accuracy (%) SEM t(17) P

A1 52.011 0.702 2.866 0.006* 16.348 0.713 5.398 <0.001*

A2 54.905 0.858 5.714 <0.001* 17.650 0.835 6.165 <0.001*

V1 59.997 1.133 8.820 <0.001* 22.917 1.018 10.233 <0.001*

V2 60.330 1.112 9.289 <0.001* 25.897 1.257 10.660 <0.001*

V3 64.540 0.955 15.223 <0.001* 26.881 1.081 13.298 <0.001*

Control 51.606 0.997 1.611 0.063 13.954 0.640 2.273 0.018*

One-sample t-tests of classification accuracies for visual stimuli in each ROI (one-tailed). *Significant P values(FDR corrected for number of ROIs). In the 8-way classification of visual stimuli, the control region showsignificant decoding but the paired t-test showed that the accuracy was significantly lower than other ROIs

Fig. 4 Classificationperformance in each ROI. Theblack line indicates chance level, allerror bars indicate the SEM. Starsindicate statistical significance (asindicated by 1-sample t-tests (one-tailed), *p < 0.05, **p < 0.01,***p < 0.001). a Two-way classifi-cation accuracies in visual and au-ditory experiments and the blackline indicates the chance level(50%). Classification of soundswas not successful in V1 and con-trol areas. b Eight-way classifica-tion accuracies of images and theline of the chance level is 12.5%.Despite the significant classifica-tion accuracy in the control area, itwas significantly lower than otherROIs with the paired t-test. c Four-way classification accuracies ofsounds in all ROIs, and the linerepresenting the chance level is25%

Brain Imaging and Behavior

Page 8: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

The analysis in the visual experiment, in which we classi-fied animals and objects in different degree of less sound-implication, showed that the decoding accuracy was higherin A2 for sound-implying images (t(17) = 5.779, p < 0.001)and V3 (t(17) = 3.320, p = 0.002).

Cross-classification in ROIs

We performed a cross-classification analysis to investigatewhether there were similar activity patterns between theimages and the corresponding sound clips in all ROIs.

Table 2 Classification results ofauditory stimuli in each ROI Auditory stimuli decoding

2-way classification 4-way classification

Accuracy (%) SEM t(17) P Accuracy (%) SEM t(17) P

A1 67.419 1.467 11.872 <0.001* 40.191 1.397 10.872 <0.001*

A2 70.081 1.353 14.837 <0.001* 43.779 1.760 10.672 <0.001*

V1 51.968 1.732 1.136 0.163 27.517 1.675 1.503 0.091

V2 57.494 1.857 4.036 <0.001* 33.304 1.669 4.977 <0.001*

V3 58.362 1.652 5.062 <0.001* 33.333 1.830 4.555 <0.001*

Control 50.926 1.628 0.569 0.288 27.141 1.747 1.226 0.118

One-sample t-tests of classification accuracies for auditory stimuli in each ROI (one-tailed). *Significant P values(FDR corrected for number of ROIs). Except for V1 and control region, the accuracies were significantly higherthan chance level

Fig. 5 Results of animal-object classification under the influence ofstimuli familiarity and the degree of sound implication of images ineach ROI. The black line indicates the chance level (50%), all error barsindicate the SEM. Stars indicate significantly different decodingaccuracies between the two conditions (as indicated by paired t-tests,*p < 0.05, **p < 0.01, ***p < 0.001). a Classification accuracies ofdecoding the images in the familiar and unfamiliar group. All 5 ROIssucceeded in 2-way classification, but the accuracies had no significantdifference between the two conditions. b Classification accuracies ofdecoding images in different degree of sound implication. All ROIs

succeeded in 2-way classification. Compared with the decoding of im-ages without sound implication, the classification accuracies of sound-implying images were significantly higher in A2 and V3. There was nosignificant difference in other ROIs. c Classification accuracies ofdecoding sounds in the familiar and unfamiliar group. V1 failed in 2-way classification. Compared with the unfamiliar sound stimulusdecoding, the classification performance for familiar sound was signifi-cantly better in V2 and V3. There was no significant difference in otherROIs

Brain Imaging and Behavior

Page 9: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

The classifier was trained for each subject on the imagetrials to discriminate the sounds trials and vice versa. Weperformed 2way classification (animal-object) for cross-classification, and found that the mean classification accu-racy was only statistically significant above the chancelevel (50%) in V3 (t(17) = 2.595, p = 0.009, one-sample t-test (one-tailed)). However, 4way classification failed in allROIs, when images were grouped into four categories tomatch the category-setting of the sounds.

Discussion

We investigated cross-modal responses in the early visualand auditory cortex. At the group-level, significant acti-vation was observed in A1 and A2 when participants per-ceived static visual stimuli without any sound imagery.Using MVPA, it was suggested that the cross-modal acti-vation pattern was indeed category-specific in A1 and A2by reliable decoding of the stimuli categories despiteusing static images and restricting participants’ imagina-tions. Decoding of sound categories was also successfulin V2 and V3 but failed in V1. Our study verified that thecross-modal representation was robust enough to discrim-inate the category information. In the primary auditorycortex, there was no significant difference for theanimals-objects image classification accuracy regardlessof familiarity, which was the same as when the sound-implying influence was examined. By contrast, the classi-fication accuracy of familiar sounds showed significantadvantages in V2 and V3, and the classification accuracyof sound-implying images showed significant advantagesin A2. Although different modalities can be successfullydecoded in the same areas, further cross-classificationanalysis did not succeed, indicating that activity patternsinduced by auditory stimulation are coded differently thanthose induced by visual stimulation.

Cross-modal nature of the visual and auditory cortices

Our study showed that the activation patterns in V2, V3, andauditory cortex represented the stimulus category informationwhen the bottom-up corresponding input was absent, andthese results were consistent with a set of studies that showedthat content-specific cross-modal information could bedecoded in regions assumed to be the unimodal sensory cortex(Meyer et al. 2010; Vetter et al. 2014). In a study performed byMeyer (Meyer et al. 2010), participants were shown silentvideo clips depicting events that implied sound. These visualstimuli could lead to content-specific representations in pri-mary auditory cortex. Participants reported that they sponta-neously generated auditory imagery, mentioned in Meyer’slater work (Man et al. 2012), which made it conceivable thatthe different patterns of activation observed in primary audi-tory cortex were driven by sound imagination and not merelydifferences in the visual appearance. Similar stimuli and theexperimental procedure were adopted by many studies oncross-modal processing (Smith and Goodale 2015; Vetteret al. 2014; Hsieh et al. 2012), where the stimuli carried a massof connotative information from another modality and partic-ipants’ imagination or memory played an important role in theperception of different modalities (Albers et al. 2013; Koket al. 2014). Although previous studies showed the content-specific nature of cross-modal representation in the early sen-sory cortices, it remained unclear whether the primary sensorycortex could respond to the category information of cross-modal stimuli with controlled top-down information fromhigher-level cortical areas.

In our study, visual stimuli employed static images to helpreduce the participants’ imagination compared with videostimuli. The orthogonal task helped the participants’ focuson the appearance of the stimuli alone, rather than imaginingthe corresponding cross-modal information, which contrib-utes to attenuating possible top-down effects (de Haas et al.2013). Even taking measures to control top-down influence,the image categories could still be successfully decoded in

Table 3 Influence of thefamiliarity and the degree ofsound-implication on the classifi-cation accuracies

Visual stimuli Auditory stimuli

Familiar - Unfamiliar Sound-implying - Quiet Familiar - Unfamiliar

t(17) P t(17) P t(17) P

A1 0.020 0.492 1.310 0.104 0.050 0.480

A2 0.651 0.262 5.779 <0.001* 0.445 0.331

V1 0.761 0.228 0.464 0.324 0.447 0.330

V2 1.051 0.154 0.752 0.231 2.700 0.008*

V3 1.338 0.099 3.320 0.002* 1.974 0.032*

Paired t-test (one-tailed) of classification accuracies, animal-object classification between familiar and unfamiliarstimuli, and between sound-implying and no-sound-implying. *Significant P values

Brain Imaging and Behavior

Page 10: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

auditory cortex. Similarly, reliable decoding classification ac-curacy was obtained in V2 and V3 when only sound stimuliwere presented. We speculate that V1 would need more infor-mation in cross-modal representation so it failed in decodingeven though it was suggested to represent sound imaginationinformation in a previous study (Vetter et al. 2014). Unlike thestudies where discriminations among individual stimuli wereperformed (Hsieh et al. 2012; de Haas et al. 2013), our studyfocused on the category specificity of the activity patterns inthe process of cross-modal information representing.Compared with results from previous studies, the classifica-tion accuracies in our study were not very high. However, theclassification accuracy in our study showed that classifier per-formance in discriminating subcategories was better than clas-sifying the animal vs. objects. A reasonable explanation is thatwhen more stimuli were categorized into a category, fewerspecific features were extracted from the average activity pat-terns induced by stimuli within a category (Iordan et al. 2015,2016). Discriminating two categories of information from dif-ferent stimuli was relatively more difficult than discriminatingtwo different stimuli from two categories. In addition, primarysensory cortices have fewer advantages in decoding the se-mantic information than higher-order cortices, which is sup-ported by classification performance in our study, since theclassification accuracy of A2 was higher than that of A1. Wespeculated that another reason for lower overall classificationaccuracy could be that much higher-order information wasreduced in our study because of the relatively brief appearanceof stimuli and the restriction of mental activity by the orthog-onal task, increasing the difficulty of decoding the categoryinformation (de Haas et al. 2013).

In this study, we found reliable decoding performance afterreducing the mass of higher-level cognition, including spon-taneous mind imagery and recall. Our results provided furtherevidence that cross-modal representation was not caused bysimple reactivation of primary sensory cortices due to top-down influences, and we inferred that the primary sensorycortex can respond to the different modality information di-rectly. One recent study showed that there are a few neuronsdirectly encoding cross-modal information in the primary sen-sory cortices of rats (Bieler et al. 2017a), which may beexisting in the human primary sensory cortex as well. Manystudies of the blind individuals have shown the primary visualcortex responding to sounds (Ricciardi et al. 2014; Renieret al. 2014), and studies of deaf individuals have revealedprimary auditory cortex that can respond to visual information(Lee andWhitt 2015; Benetti et al. 2017). Representing cross-modal information may be the nature of the primary sensorycortex, which was even seen in normal adults after a short-term sensory deprivation (Lo Verde et al. 2017). The main-tained brain plasticity may be a representation of the multisen-sory nature of human brain cortex and supports the view thatthe primary sensory cortices may be inherently multisensory

(Liang et al. 2013a). In addition, cross-modal information canbe transferred between the primary sensory cortices directly orvia other regions (Petro et al. 2017; Eckert et al. 2008). Inrecent years, functional and effective connectivity analyseshave enabled more innovative research in the area of multi-modal integration (Zhang et al. 2017; Xu et al. 2016; Genget al. 2018), and some studies have employed these methodsto uncover information transmission between primary visualand auditory cortices (Klinge et al. 2010; Collignon et al.2013). Further, research showed the existence of direct whitematter connections between auditory and visual cortex (Beeret al. 2011, 2013). As a result, the specifics of the network ofcross-modal information processing will be a major questionin our future work.

Role of familiarity and sound implicationin cross-modal processing

Based on that spontaneous and uncontrolled imagination andrecall were restricted in our study, familiarity and sound-implication in the stimuli provided another way to explorethe influence of higher-level information in primary sensorycortices. The familiar stimuli were thought to carry more in-formation for the perceptual experience (Meyer et al. 2010) sothat they would be more easily discriminated in the cross-modal sensory cortices. The visual-haptic cross-modal re-search (Smith and Goodale 2015) showed that familiar visualobjects were reliably decoded in the early somatosensory cor-tex, while unfamiliar visual objects failed to be decoded. Inour study of sound discrimination, V2 and V3 had the advan-tage of discriminating the familiar sounds with higher classi-fication accuracy than unfamiliar sounds, which is consistentwith the abovementioned research. However, in the visualexperiment, the classification analyses showed that there wasno significant difference between the familiar and unfamiliarimages, which suggested that familiarity had no influence onthe decoding power of primary sensory cortices in our study.We inferred that the orthogonal task in the visual experimentmade participants focus on stimulus appearance with less ex-periential information influencing the activity pattern (de Haaset al. 2013), which reduced any advantage of the familiarstimuli over the unfamiliar in terms of providing more cross-modal information. In addition, the categories being discrim-inated in the study were superordinate categories (animals andobjects). Though participants had few sensory experienceswith the unfamiliar visual stimuli, prior knowledge about theanimals and objects may have helped the participants’ primarysensory cortex to classify superordinate categories, but verylittle prior knowledge could help the participants’ primarysensory cortex to discriminate between unfamiliar sounds.Therefore, another appropriate superordinate category shouldbe identified for future studies in this area.

Brain Imaging and Behavior

Page 11: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

The sound-implying visual stimuli were thought to bemoreeasily discriminated in the primary auditory cortex becausethey could more easily evoke a mental representation of sound(Meyer et al. 2010). In our study, A1 showed no advantages inthe classification accuracies of sound-implying images.However, A2 performed better when classifying the sound-implying images. The A2 may receive more cross-modal in-formation from other areas (Petro et al. 2017). Overall, ourresults revealed a robust cross-modal representation in prima-ry auditory cortex regardless of the amount of informationimplied by the stimuli.

Different patterns induced by different modalities

We can generate similar concepts of an object whether we areperceiving its visual appearance or its sound, suggesting thatthere is invariant representation in our brain across modalities.The modality-invariance areas mainly included higher-order as-sociation cortices (Damasio 1989; Meyer and Damasio 2009).The posterior superior temporal sulcus (pSTS), for example, ex-hibits similar activity patterns when activated by visual and au-ditory stimuli (Man et al. 2012). Using cross-classification anal-ysis, auditotactile and visuotactile invariance were found inpostcentral gyrus and other areas (Man et al. 2015). In the presentstudy, cross-classification helped us test for visual-auditory in-variance in primary sensory cortices. The 4-way cross-classifica-tion results indicated that the same category information induceddifferent activation patterns in early visual and auditory cortexwhen the stimuli came from different modalities. A study per-formed by Vetter (Vetter et al. 2014) obtained a successful cross-classification between auditory perception and imagery in V1andV2 but failed in auditory cortex. In contrast to these findings,the activity pattern of different modalities possessed less relatedinformation in our results, likely due to the restriction of feedbackinformation. In addition, we focused on category specificity ofstimuli, which may be different from the content specific ap-proach in previous studies. Our results were reasonable, consid-ering that our classification accuracies within the same modalitywere not very high, which produced a theoretical ceiling forcross-classification. The 2-way cross-classification succeeded inV3 in our study, and it may due to themore significant differencebetween object and animal when compared with the subcategoryin the 4-way cross-classification. Meyer et al. (2010) found suc-cessful classification between animals and objects in two out offour participants in the auditory cortex; however, it is not unclearwhether the average accuracy of 0.60 for the animalobject dis-crimination across the 4 participants is statistically significantabove chance according to the description in their supplementaryresults. Overall, our study did not find enough modality invari-ance in the primary sensory cortices for successful subcategoriescross-classification, but this also does not mean that the patternsinduced by different modalities are totally distinct.We speculatedthat different modalities could have activated different areas or

specific neurons in the primary sensory cortex, which may haveled to different representation patterns. This possibility is support-ed by research on the higher-level cortex, such as frontoparietalcortex, where gradients have been found according to stimulusmodality (Braga et al. 2017).

Conclusion

This study revealed that the successful representation of thecategory-specific stimulus information from different modali-ties was found in V2, V3, and the primary and secondary au-ditory cortex. However, the activation pattern inV1 didn’t showa significant difference to the categories of auditory stimuli inour study which was with restrictions on the implication ofother modalities within stimuli and on participants’ imaginationand recall. There was no significant influence of familiarity onthe classification accuracy of animal-object images, and classi-fying sound-implying images showed no advantages in primaryauditory cortex. The results of the visual experiment providedevidence that the decoding capacity was robust enough.Considering previous studies, the robust representation maycontribute to the evidence in favor of multisensory neuronsmay exist in primary sensory cortices so that they can directlyrespond to the appearance of cross-modal stimuli. Becausethere was less restriction in the auditory experiment, the familiarsounds were better classified in V2 and V3, suggesting thatexperiential information could also be represented. Furthercross-classification analysis indicated that stimulus informationwas processed in separate ways in different sensory cortices.Additionally, there might be direct connectivity between theprimary auditory and visual cortices translating stimulus infor-mation, which will be examined in future work.

Funding This work was supported by the National Natural ScienceFoundation of China (No. U1736219 and No.61571327).

Compliance with ethical standards

Conflict of interest Jin Gu, Baolin Liu, Xianglin Li, PeiyuanWang, BinWang declare that they have no actual or potential conflict of interestincluding any financial, personal or other relationships with other peopleor organizations that can inappropriately influence our work. All of theauthors declare that the work described in the manuscript was originalresearch that has not been published previously, and was not under con-sideration for publication elsewhere, in whole or in part.

Ethical approval This study was approved by the Research EthicsCommittee of Tianjin University. All procedures followed were inaccordance with the ethical standards of the responsible committeeon human experimentation (institutional and national) and with theHelsinki Declaration of 1975, and the applicable revisions at thetime of the investigation.

Informed consent Informed consent was obtained from all subjects forbeing included in the study.

Brain Imaging and Behavior

Page 12: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

References

Albers, A. M., Kok, P., Toni, I., Dijkerman, H. C., & de Lange, F. P.(2013). Shared representations for working memory and mental im-agery in early visual cortex. Current Biology, 23(15), 1427–1431.https://doi.org/10.1016/j.cub.2013.05.065.

Axelrod, V., & Yovel, G. (2012). Hierarchical processing of face view-point in human visual cortex. Journal of Neuroscience, 32(7), 2442–2452. https://doi.org/10.1523/JNEUROSCI.4770-11.2012.

Beer, A. L., Plank, T., & Greenlee, M. W. (2011). Diffusion tensor imag-ing shows white matter tracts between human auditory and visualcortex. Experimental Brain Research, 213(2–3), 299–308. https://doi.org/10.1007/s00221-011-2715-y.

Beer, A. L., Plank, T., Meyer, G., & Greenlee, M. W. (2013). Combineddiffusion-weighted and functional magnetic resonance imaging re-veals a temporal-occipital network involved in auditory-visual ob-ject processing. Frontiers in Integrative Neuroscience, 7, 5. https://doi.org/10.3389/fnint.2013.00005.

Benetti, S., van Ackeren, M. J., Rabini, G., Zonca, J., Foa, V., Baruffaldi,F., et al. (2017). Functional selectivity for face processing in thetemporal voice area of early deaf individuals. Proceedings of theNational Academy of Sciences of the United States of America,114(31), E6437–E6446. https://doi.org/10.1073/pnas.1618287114.

Bieler, M., Sieben, K., Cichon, N., Schildt, S., Roder, B., & Hanganu-Opatz, I. L. (2017a). Rate and temporal coding convey multisensoryinformation in primary sensory cortices. eNeuro, 4(2). https://doi.org/10.1523/ENEURO.0037-17.2017.

Bieler, M., Sieben, K., Schildt, S., Roder, B., & Hanganu-Opatz, I. L.(2017b). Visual-tactile processing in primary somatosensory cortexemerges before cross-modal experience. Synapse, 71(6). https://doi.org/10.1002/syn.21958.

Braga, R.M., Hellyer, P. J., Wise, R. J., & Leech, R. (2017). Auditory andvisual connectivity gradients in frontoparietal cortex. Human BrainMapping, 38(1), 255–270. https://doi.org/10.1002/hbm.23358.

Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: a library for support vectormachines. ACM Transactions on Intelligent Systems andTechnology, 2(27), 21–27:27.

Collignon, O., Dormal, G., Albouy, G., Vandewalle, G., Voss, P., Phillips,C., et al. (2013). Impact of blindness onset on the functional orga-nization and the connectivity of the occipital cortex.Brain, 136(Pt9),2769–2783. https://doi.org/10.1093/brain/awt176.

Damasio, A. R. (1989). Time-locked multiregional retroactivation: Asystems-level proposal for the neural substrates of recall and recog-nition. Cognition, 33(1–2), 25–62.

de Haas, B., Schwarzkopf, D. S., Urner, M., & Rees, G. (2013). Auditorymodulation of visual stimulus encoding in human retinotopic cortex.NeuroImage, 70, 258–267. https://doi.org/10.1016/j.neuroimage.2012.12.061.

Driver, J., & Noesselt, T. (2008). Multisensory interplay revealscrossmodal influences on 'sensory-specific' brain regions, neural re-sponses, and judgments. Neuron, 57(1), 11–23. https://doi.org/10.1016/j.neuron.2007.12.013.

Eckert, M. A., Kamdar, N. V., Chang, C. E., Beckmann, C. F., Greicius,M. D., & Menon, V. (2008). A cross-modal system linking primaryauditory and visual cortices: evidence from intrinsic fMRI connec-tivity analysis. Human Brain Mapping, 29(7), 848–857. https://doi.org/10.1002/hbm.20560.

Eickhoff, S. B., Stephan, K. E., Mohlberg, H., Grefkes, C., Fink, G. R.,Amunts, K., et al. (2005). A new SPM toolbox for combining prob-abilistic cytoarchitectonic maps and functional imaging data.NeuroImage, 25(4), 1325–1335. https://doi.org/10.1016/j.neuroimage.2004.12.034.

Geng, X., Xu, J., Liu, B., & Shi, Y. (2018). Multivariate classification ofmajor depressive disorder using the effective connectivity and

functional connectivity. Frontiers in Neuroscience, 12, 38. https://doi.org/10.3389/fnins.2018.00038.

Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentiallymultisensory? Trends in Cognitive Sciences, 10(6), 278–285.https://doi.org/10.1016/j.tics.2006.04.008.

Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents ofvisual working memory in early visual areas. Nature, 458(7238),632–635. https://doi.org/10.1038/nature07832.

Hsieh, P. J., Colas, J. T., & Kanwisher, N. (2012). Spatial pattern ofBOLD fMRI activation reveals cross-modal information in auditorycortex. Journal of Neurophysiology, 107(12), 3428–3432. https://doi.org/10.1152/jn.01094.2010.

Iordan, M. C., Greene, M. R., Beck, D. M., & Fei-Fei, L. (2015). Basiclevel category structure emerges gradually across human ventralvisual cortex. Journal of Cognitive Neuroscience, 27(7), 1427–1446. https://doi.org/10.1162/jocn_a_00790.

Iordan, M. C., Greene, M. R., Beck, D. M., & Fei-Fei, L. (2016).Typicality sharpens category representations in object-selective cor-tex. NeuroImage, 134, 170–179. https://doi.org/10.1016/j.neuroimage.2016.04.012.

Iurilli, G., Ghezzi, D., Olcese, U., Lassi, G., Nazzaro, C., Tonini, R., et al.(2012). Sound-driven synaptic inhibition in primary visual cortex.Neuron, 73(4), 814–828. https://doi.org/10.1016/j.neuron.2011.12.026.

Kayser, C., Petkov, C. I., & Logothetis, N. K. (2008). Visual modulationof neurons in auditory cortex. Cerebral Cortex, 18(7), 1560–1574.https://doi.org/10.1093/cercor/bhm187.

Klemen, J., & Chambers, C. D. (2012). Current perspectives and methodsin studying neural mechanisms of multisensory interactions.Neuroscience and Biobehavioral Reviews, 36(1), 111–133. https://doi.org/10.1016/j.neubiorev.2011.04.015.

Klinge, C., Eippert, F., Roder, B., & Buchel, C. (2010). Corticocorticalconnections mediate primary visual cortex responses to auditorystimulation in the blind. Journal of Neuroscience, 30(38), 12798–12805. https://doi.org/10.1523/JNEUROSCI.2384-10.2010.

Kok, P., Failing, M. F., & de Lange, F. P. (2014). Prior expectations evokestimulus templates in the primary visual cortex. Journal of CognitiveNeuroscience, 26(7), 1546–1554. https://doi.org/10.1162/jocn_a_00562.

Lee, H. K., &Whitt, J. L. (2015). Cross-modal synaptic plasticity in adultprimary sensory cortices. Current Opinion in Neurobiology, 35,119–126. https://doi.org/10.1016/j.conb.2015.08.002.

Liang, M., Mouraux, A., Hu, L., & Iannetti, G. D. (2013a). Primarysensory cortices contain distinguishable spatial patterns of activityfor each sense. Nature Communications, 4, 1979. https://doi.org/10.1038/ncomms2979.

Liang, M., Mouraux, A., & Iannetti, G. D. (2013b). Bypassing primarysensory cortices–a direct thalamocortical pathway for transmittingsalient sensory information. Cerebral Cortex, 23(1), 1–11. https://doi.org/10.1093/cercor/bhr363.

Liang, Y., Liu, B., Xu, J., Zhang, G., Li, X., Wang, P., et al. (2017).Decoding facial expressions based on face-selective and motion-sensitive areas. Human Brain Mapping, 38(6), 3113–3125. https://doi.org/10.1002/hbm.23578.

Liang, Y., Liu, B., Li, X., & Wang, P. (2018). Multivariate pattern classi-fication of facial expressions based on large-scale functional con-nectivity. Frontiers in Human Neuroscience, 12. https://doi.org/10.3389/fnhum.2018.00094.

Lo Verde, L., Morrone, M. C., & Lunghi, C. (2017). Early cross-modalplasticity in adults. Journal of Cognitive Neuroscience, 29(3), 520–529. https://doi.org/10.1162/jocn_a_01067.

Man, K., Kaplan, J. T., Damasio, A., & Meyer, K. (2012). Sight andsound converge to form modality-invariant representations intemporoparietal cortex. Journal of Neuroscience, 32(47), 16629–16636. https://doi.org/10.1523/JNEUROSCI.2342-12.2012.

Man, K., Damasio, A., Meyer, K., & Kaplan, J. T. (2015). Convergentand invariant object representations for sight, sound, and touch.

Brain Imaging and Behavior

Page 13: Cross-modal representations in early visual and auditory ...static.tongtianta.site/paper_pdf/a63cd1d4-8cb8-11e... · # Springer Science+Business Media, LLC, part of Springer Nature

Human Brain Mapping, 36(9), 3629–3640. https://doi.org/10.1002/hbm.22867.

Mesulam, M. M. (1998). From sensation to cognition. Brain, 121(Pt 6),1013–1052. https://doi.org/10.1093/brain/121.6.1013.

Meyer, K., & Damasio, A. (2009). Convergence and divergence in a neuralarchitecture for recognition and memory. Trends in Neurosciences,32(7), 376–382. https://doi.org/10.1016/j.tins.2009.04.002.

Meyer, K., Kaplan, J. T., Essex, R.,Webber, C., Damasio, H., &Damasio,A. (2010). Predicting visual stimuli on the basis of activity in audi-tory cortices. Nature Neuroscience, 13(6), 667–668. https://doi.org/10.1038/nn.2533.

Meyer, K., Kaplan, J. T., Essex, R., Damasio, H., & Damasio, A. (2011).Seeing touch is correlated with content-specific activity in primarysomatosensory cortex. Cerebral Cortex, 21(9), 2113–2121. https://doi.org/10.1093/cercor/bhq289.

Petro, L. S., Paton, A. T., & Muckli, L. (2017). Contextual modulation ofprimary visual cortex by auditory signals. In Philosophical transac-tions of the Royal Society of London. Series B, Biological sciences.https://doi.org/10.1098/rstb.2016.0104.

Renier, L., De Volder, A. G., & Rauschecker, J. P. (2014). Cortical plas-ticity and preserved function in early blindness. Neuroscience andBiobehavioral Reviews, 41, 53–63. https://doi.org/10.1016/j.neubiorev.2013.01.025.

Ricciardi, E., Bonino, D., Pellegrini, S., & Pietrini, P. (2014). Mind theblind brain to understand the sighted one! Is there a supramodal cor-tical functional architecture? Neuroscience and BiobehavioralReviews, 41, 64–77. https://doi.org/10.1016/j.neubiorev.2013.10.006.

Rohe, T., & Noppeney, U. (2016). Distinct computational principles gov-ern multisensory integration in primary sensory and association cor-tices. Current Biology, 26(4), 509–514. https://doi.org/10.1016/j.cub.2015.12.056.

Sieben, K., Roder, B., & Hanganu-Opatz, I. L. (2013). Oscillatory en-trainment of primary somatosensory cortex encodes visual controlof tactile processing. Journal of Neuroscience, 33(13), 5736–5749.https://doi.org/10.1523/JNEUROSCI.4432-12.2013.

Sieben, K., Bieler, M., Roder, B., & Hanganu-Opatz, I. L. (2015).Neonatal restriction of tactile inputs leads to long-lasting

impairments of cross-modal processing. PLoS Biology, 13(11),e1002304. https://doi.org/10.1371/journal.pbio.1002304.

Smith, F.W., &Goodale,M. A. (2015). Decoding visual object categoriesin early somatosensory cortex. Cerebral Cortex, 25(4), 1020–1031.https://doi.org/10.1093/cercor/bht292.

Vetter, P., Smith, F. W., & Muckli, L. (2014). Decoding sound and imag-ery content in early visual cortex. Current Biology, 24(11), 1256–1262. https://doi.org/10.1016/j.cub.2014.04.020.

Werner, S., & Noppeney, U. (2010). Distinct functional contributions ofprimary sensory and association areas to audiovisual integration inobject categorization. Journal of Neuroscience, 30(7), 2662–2675.https://doi.org/10.1523/JNEUROSCI.5091-09.2010.

Wolbers, T., Zahorik, P., & Giudice, N. A. (2011). Decoding the directionof auditory motion in blind humans. NeuroImage, 56(2), 681–687.https://doi.org/10.1016/j.neuroimage.2010.04.266.

Xu, J., Yin, X., Ge, H., Han, Y., Pang, Z., Liu, B., et al. (2016).Heritability of the effective connectivity in the resting-state defaultmode network. Cerebral Cortex. https://doi.org/10.1093/cercor/bhw332.

Yang, X., Xu, J., Cao, L., Li, X., Wang, P., Wang, B., et al. (2017). Linearrepresentation of emotions in whole persons by combining facialand bodily expressions in the Extrastriate body area. Frontiers inHuman Neuroscience, 11, 653. https://doi.org/10.3389/fnhum.2017.00653.

Zeidman, P., Mullally, S. L., & Maguire, E. A. (2015). Constructing,perceiving, and maintaining scenes: Hippocampal activity and con-nectivity. Cerebral Cortex, 25(10), 3836–3855. https://doi.org/10.1093/cercor/bhu266.

Zhang, G., Cheng, Y., & Liu, B. (2017). Abnormalities of voxel-basedwhole-brain functional connectivity patterns predict the progressionof hepatic encephalopathy. Brain Imaging and Behavior, 11(3),784–796. https://doi.org/10.1007/s11682-016-9553-2.

Publisher’s note Springer Nature remains neutral with regard to juris-dictional claims in published maps and institutional affiliations.

Brain Imaging and Behavior