51S. Lacey and R. Lawson (eds.), Multisensory Imagery, DOI 10.1007/978-1-4614-5879-1_4, Springer Science+Business Media, LLC 2013
Abstract Empirical fi ndings from studies on imagery of auditory features (pitch, timbre, loudness, duration, tempo, rhythm) and imagery of auditory objects (musical contour and melody, musical key and harmony, notational audiation, speech and text, environmental stimuli) are reviewed. Potential individual differences in auditory imagery (involving vividness, auditory hallucination, development, musical ability, training) are considered. It is concluded that auditory imagery (a) preserves many of the structural and temporal properties of auditory information present in auditory (or multisensory or crossmodal) stimuli, (b) can impact subsequent responding by in fl uencing perception or by in fl uencing expectancies regarding sub-sequent stimuli, (c) involves mechanisms similar to many of those used in auditory perception, and (d) is subserved by many of the same cortical structures as is audi-tory perception.
Keywords Auditory imagery Music Speech Reading Auditory perception Individual differences Auditory features Objects
Research on auditory imagery has often involved examination of how information regarding auditory features and auditory objects is preserved in imagery and how this information in fl uences other cognitive processes. This research is reviewed, and
Chapter 4 Auditory Aspects of Auditory Imagery
Timothy L. Hubbard
T. L. Hubbard (*) Department of Psychology , Texas Christian University , Fort Worth , TX 76132 , USA e-mail: firstname.lastname@example.org
52 T.L. Hubbard
some general properties of auditory imagery are suggested. The focus is on auditory components of auditory imagery (discussion of multisensory and crossmodal components can be found in Chap. 12 ). Individual differences in auditory imagery that involve the vividness of auditory imagery, relationship of auditory imagery to auditory hallucinations, development of auditory imagery, and relationship of audi-tory imagery to musical ability and training are considered. For the purposes here, auditory imagery is de fi ned as an introspective and nonhallucinatory experience of auditory sensory qualities in the absence of a corresponding auditory stimulus (cf. Baddeley and Logie 1992 ; Intons-Peterson 1992 ; Kosslyn et al. 1995 ) . Imagery of auditory features and auditory objects is considered in Sects. 4.2 and 4.3 , respec-tively, and individual differences in auditory imagery are considered in Sect. 4.4 . Suggestions regarding properties of auditory imagery are presented in Sect. 4.5 , and some conclusions are given in Sect. 4.6 .
4.2 Auditory Features
Most discussions of auditory features involve structural or temporal properties of an auditory stimulus (e.g., wavelength, amplitude, stress pattern). Accordingly, audi-tory features considered here include pitch, timbre, loudness, duration, and tempo and rhythm.
Farah and Smith ( 1983 ) had participants listen to or image tones of 715 or 1,000 Hz and simultaneously or subsequently detect whether a target tone of 715 or 1,000 Hz was presented. Imaging the same, rather than a different, frequency facilitated detection if detection was simultaneous with, or subsequent to, image generation. Hubbard and Stoeckig ( 1988 ) had participants image a single tone that varied across trials, and after participants indicated they had an image, a tar-get tone was presented. Participants judged whether the pitch of the target tone was the same as or different from the pitch of the imaged tone. Judgments were facilitated if the pitch of the image matched the pitch of the target. Okada and Matsuoka ( 1992 ) had participants image a tone of 800 Hz and then discriminate which of fi ve target tones was presented. Discrimination was poorer if the pitch of the image matched the pitch of the tone, and this seemed inconsistent with fi ndings of Farah and Smith and of Hubbard and Stoeckig. Okada and Matsuoka suggested the difference between their fi ndings and those of Farah and Smith re fl ected the difference between detection and discrimination (cf. Finke 1986 ) ; however, it is not clear if the same applies to the difference between their fi ndings and those of Hubbard and Stoeckig.
534 Auditory Imagery
Halpern ( 1989 ) had participants indicate the fi rst pitch of a well-known melody (e.g., Somewhere Over The Rainbow ) by humming or by pressing a key (located by touch on a visually occluded keyboard). Participants generally chose lower or higher starting pitches for melodies that ascended or descended, respectively, in pitch during the fi rst few notes. The same participants were subsequently pre-sented with their previously indicated starting pitch (not identi fi ed as such) and other possible starting pitches, and they generally preferred their previously indi-cated pitch or a pitch a speci fi c musical interval from their previously indicated pitch. Halpern suggested this was consistent with a form of absolute pitch in memory for the starting note (cf. Schellenberg and Trehub 2003 ) . Intons-Peterson et al. ( 1992 ) examined relative pitches of different pairs of stimuli. Verbal descrip-tions of common objects (e.g., cat purring, door slamming) were visually pre-sented, and participants formed auditory images of the sounds indicated by those descriptions. If participants judged whether the pitch of the fi rst sound was the same as the pitch of the second sound, then response times decreased with increases in pitch distance. If participants adjusted the pitch of one sound to match the pitch of the other sound, then response times increased with increases in pitch distance.
Janata ( 2001 ) examined emitted potentials related to auditory imagery. Participants listened to the initial portion of an ascending or descending instrumen-tal phrase and then imaged a continuation of that phrase. In some conditions, subse-quent expected notes were not presented, and participants exhibited an emitted potential similar to the evoked potentials for perceived pitches. Janata and Paroo ( 2006 ; Cebrian and Janata 2010b ) presented participants with the fi rst few notes of a scale and participants imaged the remaining notes. The fi nal note was presented, and participants judged whether that note was in tune. Judgments were not in fl uenced by whether participants listened to all of the notes or imaged some of the notes, and Janata and Paroo concluded participants had a high level of pitch acuity. Pecenka and Keller ( 2009 ) found that auditory images of pitch were more likely to be mis-tuned upward than downward. Cebrian and Janata ( 2010a ) reported that, for partici-pants with more accurate images, the amplitude of the N1 in response to a target tone after a series of imaged pitches was smaller and comparable to the N1 if the preceding pitches had been perceived. The P3a response to mistuned targets was larger for participants who formed more accurate auditory images. Also, auditory images were more accurate for pitches more closely associated with the tonal context than for pitches that were less closely associated (cf. Vuvan and Schmuckler 2011 ) .
Numerous studies suggest representation of perceived pitch involves a vertical dimension (e.g., Eitan and Granot 2006 ; Spence 2011 ) , and Elkin and Leuthold ( 2011 ) examined whether representation of imaged pitch similarly involved a ver-tical dimension. Participants compared the pitch of a perceived tone or an imaged tone with the pitch of a perceived comparison tone, and the response keys were oriented horizontally or vertically. Whether a pitch should be imaged at 1,000, 1,500, or 2,000 Hz was cued by visual presentation of the letters A, B, or C, respec-tively. For imagery and for perception, judgments of tones closer in pitch yielded
54 T.L. Hubbard
longer response times and higher error rates than did judgments of tones more distant in pitch (cf. Intons-Peterson et al. 1992 ) . Also, responses were faster if lower-pitched imaged tones involved a bottom-key or left-key response and higher-pitched imaged tones involved a top-key or right-key response, and Elkin and Leuthold suggested tones were coded spatially (cf. Keller et al. 2010 ; Keller and Koch 2008 ) . Such a pattern is consistent with the SMARC (spatial-musical association of response codes) effect in perceived pitch (e.g., see Rusconi et al. 2006 ) and with the hypothesis that imaged pitch involves the same properties as perceived pitch.
Crowder ( 1989 ) presented a sine wave tone and instructed participants to image that tone in the timbre of a speci fi c musical instrument (e.g., fl ute, guitar, piano). Participants judged whether the pitch of a subsequently perceived tone matched the pitch of the image; response times were faster if the subsequent tone was in the timbre participants had been instructed to image. Pitt and Crowder ( 1992 ; Crowder and Pitt 1992 ) found spectral elements of imaged timbre, but not dynamic elements of imaged timbre, in fl uenced response times to judgments of subsequently pre-sented tones. Halpern et al. ( 2004 ) had participants rate similarities of pairs of musical instrument timbres. Ratings of imaged timbres were highly correlated with ratings of perceived timbres. Timbre perception activated primary and secondary auditory cortex, but timbre imagery only activated secondary auditory cortex. Timbre imagery and perception activated superior temporal area, and timbre imag-ery also activated supplementary mo