44
Music and the Brain What is the secret of music's strange power? Seeking an answer, scientists are piecing together a picture of what happens in the brains of listeners and musicians By Norman M. Weinberger Music surrounds us–and we wouldn't have it any other way. An exhilarating orchestral crescendo can bring tears to our eyes and send shivers down our spines. Background swells add emotive punch to movies and TV shows. Organists at ballgames bring us together, cheering, to our feet. Parents croon soothingly to infants. And our fondness has deep roots: we have been making music since the dawn of culture. More than 30,000 years ago early humans were already playing bone flutes, percussive instruments and jaw harps--and all known societies throughout the world have had music. Indeed, our appreciation appears to be innate. Infants as young as two months will turn toward consonant, or pleasant, sounds and away from dissonant ones. And when a symphony's denouement gives delicious chills, the same kinds of pleasure centers of the brain light up as they do when eating chocolate, having sex or taking cocaine. Therein lies an intriguing biological mystery: Why is music--universally beloved and uniquely powerful in its ability to wring emotions--so pervasive and important to us? Could its emergence have enhanced human survival somehow, such as by aiding courtship, as Geoffrey F. 1

Music and the Brain

Embed Size (px)

DESCRIPTION

Music and the Brain

Citation preview

Music and the Brain

Music and the BrainWhat is the secret of music's strange power? Seeking an answer, scientists are piecing together a picture of what happens in the brains of listeners and musicians

By Norman M. Weinberger

Music surrounds usand we wouldn't have it any other way. An exhilarating orchestral crescendo can bring tears to our eyes and send shivers down our spines. Background swells add emotive punch to movies and TV shows. Organists at ballgames bring us together, cheering, to our feet. Parents croon soothingly to infants.

And our fondness has deep roots: we have been making music since the dawn of culture. More than 30,000 years ago early humans were already playing bone flutes, percussive instruments and jaw harps--and all known societies throughout the world have had music. Indeed, our appreciation appears to be innate. Infants as young as two months will turn toward consonant, or pleasant, sounds and away from dissonant ones. And when a symphony's denouement gives delicious chills, the same kinds of pleasure centers of the brain light up as they do when eating chocolate, having sex or taking cocaine.

Therein lies an intriguing biological mystery: Why is music--universally beloved and uniquely powerful in its ability to wring emotions--so pervasive and important to us? Could its emergence have enhanced human survival somehow, such as by aiding courtship, as Geoffrey F. Miller of the University of New Mexico has proposed? Or did it originally help us by promoting social cohesion in groups that had grown too large for grooming, as suggested by Robin M. Dunbar of the University of Liverpool? On the other hand, to use the words of Harvard University's Steven Pinker, is music just "auditory cheesecake"--a happy accident of evolution that happens to tickle the brain's fancy?

Why is music--universally beloved and uniquely powerful in its ability to wring emotions--so pervasive and important to us?

Neuroscientists don't yet have the ultimate answers. But in recent years we have begun to gain a firmer understanding of where and how music is processed in the brain, which should lay a foundation for answering evolutionary questions. Collectively, studies of patients with brain injuries and imaging of healthy individuals have unexpectedly uncovered no specialized brain "center" for music. Rather music engages many areas distributed throughout the brain, including those that are normally involved in other kinds of cognition. The active areas vary with the person's individual experiences and musical training. The ear has the fewest sensory cells of any sensory organ--3,500 inner hair cells occupy the ear versus 100 million photoreceptors in the eye. Yet our mental response to music is remarkably adaptable; even a little study can "retune" the way the brain handles musical inputs.

Inner Songs Until the advent of modern imaging techniques, scientists gleaned insights about the brain's inner musical workings mainly by studying patients--including famous composers--who had experienced brain deficits as a result of injury, stroke or other ailments. For example, in 1933 French composer Maurice Ravel began to exhibit symptoms of what might have been focal cerebral degeneration, a disorder in which discrete areas of brain tissue atrophy. His conceptual abilities remained intact--he could still hear and remember his old compositions and play scales. But he could not write music. Speaking of his proposed opera Jeanne d'Arc, Ravel confided to a friend, "...this opera is here, in my head. I hear it, but I will never write it. It's over. I can no longer write my music." Ravel died four years later, following an unsuccessful neurosurgical procedure. The case lent credence to the idea that the brain might not have a specific center for music.

The experience of another composer additionally suggested that music and speech were processed independently. After suffering a stroke in 1953, Vissarion Shebalin, a Russian composer, could no longer talk or understand speech, yet he retained the ability to write music until his death 10 years later. Thus, the supposition of independent processing appears to be true, although more recent work has yielded a more nuanced understanding, relating to two of the features that music and language share: both are a means of communication, and each has a syntax, a set of rules that govern the proper combination of elements (notes and words, respectively). According to Aniruddh D. Patel of the Neurosciences Institute in San Diego, imaging findings suggest that a region in the frontal lobe enables proper construction of the syntax of both music and language, whereas other parts of the brain handle related aspects of language and music processing. Imaging studies have also given us a fairly fine-grained picture of the brain's responses to music. These results make the most sense when placed in the context of how the ear conveys sounds in general to the brain. Like other sensory systems, the one for hearing is arranged hierarchically, consisting of a string of neural processing stations from the ear to the highest level, the auditory cortex. The processing of sounds, such as musical tones, begins with the inner ear (cochlea), which sorts complex sounds produced by, say, a violin, into their constituent elementary frequencies. The cochlea then transmits this information along separately tuned fibers of the auditory nerve as trains of neural discharges. Eventually these trains reach the auditory cortex in the temporal lobe. Different cells in the auditory system of the brain respond best to certain frequencies; neighboring cells have overlapping tuning curves so that there are no gaps. Indeed, because neighboring cells are tuned to similar frequencies, the auditory cortex forms a "frequency map" across its surface.

The response to music per se, though, is more complicated. Music consists of a sequence of tones, and perception of it depends on grasping the relationships between sounds. Many areas of the brain are involved in processing the various components of music. Consider tone, which encompasses both the frequencies and loudness of a sound. At one time, investigators suspected that cells tuned to a specific frequency always responded the same way when that frequency was detected.

But in the late 1980s Thomas M. McKenna and I, working in my laboratory at the University of California at Irvine, raised doubts about that notion when we studied contour, which is the pattern of rising and falling pitches that is the basis for all melodies. We constructed melodies consisting of different contours using the same five tones and then recorded the responses of single neurons in the auditory cortices of cats. We found that cell responses (the number of discharges) varied with the contour. Responses depended on the location of a given tone within a melody; cells may fire more vigorously when that tone is preceded by other tones rather than when it is the first. Moreover, cells react differently to the same tone when it is part of an ascending contour (low to high tones) than when it is part of a descending or more complex one. These findings show that the pattern of a melody matters: processing in the auditory system is not like the simple relaying of sound in a telephone or stereo system.

Although most research has focused on melody, rhythm (the relative lengths and spacing of notes), harmony (the relation of two or more simultaneous tones) and timbre (the characteristic difference in sound between two instruments playing the same tone) are also of interest. Studies of rhythm have concluded that one hemisphere is more involved, although they disagree on which hemisphere. The problem is that different tasks and even different rhythmic stimuli can demand different processing capacities. For example, the left temporal lobe seems to process briefer stimuli than the right temporal lobe and so would be more involved when the listener is trying to discern rhythm while hearing briefer musical sounds.

The situation is clearer for harmony. Imaging studies of the cerebral cortex find greater activation in the auditory regions of the right temporal lobe when subjects are focusing on aspects of harmony. Timbre also has been "assigned" a right temporal lobe preference. Patients whose temporal lobe has been removed (such as to eliminate seizures) show deficits in discriminating timbre if tissue from the right, but not the left, hemisphere is excised. In addition, the right temporal lobe becomes active in normal subjects when they discriminate between different timbres.

Brain responses also depend on the experiences and training of the listener. Even a little training can quickly alter the brain's reactions. For instance, until about 10 years ago, scientists believed that tuning was "fixed" for each cell in the auditory cortex. Our studies on contour, however, made us suspect that cell tuning might be altered during learning so that certain cells become extra sensitive to sounds that attract attention and are stored in memory.

Learning retunes the brain, so that more cells respond best to behaviorally important sounds.

To find out, Jon S. Bakin, Jean-Marc Edeline and I conducted a series of experiments during the 1990s in which we asked whether the basic organization of the auditory cortex changes when a subject learns that a certain tone is somehow important. Our group first presented guinea pigs with many different tones and recorded the responses of various cells in the auditory cortex to determine which tones produced the greatest responses. Next, we taught the subjects that a specific, nonpreferred tone was important by making it a signal for a mild foot shock. The guinea pigs learned this association within a few minutes. We then determined the cells' responses again, immediately after the training and at various times up to two months later. The neurons' tuning preferences had shifted from their original frequencies to that of the signal tone. Thus, learning retunes the brain so that more cells respond best to behaviorally important sounds. This cellular adjustment process extends across the cortex, "editing" the frequency map so that a greater area of the cortex processes important tones. One can tell which frequencies are important to an animal simply by determining the frequency organization of its auditory cortex.

The retuning was remarkably durable: it became stronger over time without additional training and lasted for months. These findings initiated a growing body of research indicating that one way the brain stores the learned importance of a stimulus is by devoting more brain cells to the processing of that stimulus. Although it is not possible to record from single neurons in humans during learning, brain-imaging studies can detect changes in the average magnitude of responses of thousands of cells in various parts of the cortex. In 1998 Ray Dolan and his colleagues at University College London trained human subjects in a similar type of task by teaching them that a particular tone was significant. The group found that learning produces the same type of tuning shifts seen in animals. The long-term effects of learning by retuning may help explain why we can quickly recognize a familiar melody in a noisy room and also why people suffering memory loss from neurodegenerative diseases such as Alzheimer's can still recall music that they learned in the past.

Even when incoming sound is absent, we all can "listen" by recalling a piece of music. Think of any piece you know and "play" it in your head. Where in the brain is this music playing? In 1999 Andrea R. Halpern of Bucknell University and Robert J. Zatorre of the Montreal Neurological Institute at McGill University conducted a study in which they scanned the brains of nonmusicians who either listened to music or imagined hearing the same piece of music. Many of the same areas in the temporal lobes that were involved in listening to the melodies were also activated when those melodies were merely imagined.

Well-Developed Brains Studies of musicians have extended many of the findings noted above, dramatically confirming the brain's ability to revise its wiring in support of musical activities. Just as some training increases the number of cells that respond to a sound when it becomes important, prolonged learning produces more marked responses and physical changes in the brain. Musicians, who usually practice many hours a day for years, show such effects--their responses to music differ from those of nonmusicians; they also exhibit hyperdevelopment of certain areas in their brains.

Christo Pantev, then at the University of Mnster in Germany, led one such study in 1998. He found that when musicians listen to a piano playing, about 25 percent more of their left-hemisphere auditory regions respond than do so in nonmusicians. This effect is specific to musical tones and does not occur with similar but nonmusical sounds. Moreover, the authors found that this expansion of response area is greater the younger the age at which lessons began. Studies of children suggest that early musical experience may facilitate development. In 2004 Antoine Shahin, Larry E. Roberts and Laurel J. Trainor of McMaster University in Ontario recorded brain responses to piano, violin and pure tones in four- and five-year-old children. Youngsters who had received greater exposure to music in their homes showed enhanced brain auditory activity, comparable to that of unexposed kids about three years older.

Musicians may display greater responses to sounds, in part because their auditory cortex is more extensive. Peter Schneider and his co-workers at the University of Heidelberg in Germany reported in 2002 that the volume of this cortex in musicians was 130 percent larger. The percentages of volume increase were linked to levels of musical training, suggesting that learning music proportionally increases the number of neurons that process it.

In addition, musicians' brains devote more area toward motor control of the fingers used to play an instrument. In 1995 Thomas Elbert of the University of Konstanz in Germany and his colleagues reported that the brain regions that receive sensory inputs from the second to fifth (index to pinkie) fingers of the left hand were significantly larger in violinists; these are precisely the fingers used to make rapid and complex movements in violin playing. In contrast, they observed no enlargement of the areas of the cortex that handle inputs from the right hand, which controls the bow and requires no special finger movements. Nonmusicians do not exhibit these differences. Further, Pantev, now at the Rotman Research Institute at the University of Toronto, reported in 2001 that the brains of professional trumpet players react in such an intensified manner only to the sound of a trumpet--not, for example, to that of a violin.

Musicians also must develop greater ability to use both hands, particularly for keyboard playing. Thus, one might expect that this increased coordination between the motor regions of the two hemispheres has an anatomical substrate. That seems to be the case. The anterior corpus callosum, which contains the band of fibers that interconnects the two motor areas, is larger in musicians than in nonmusicians. Again, the extent of increase is greater the earlier the music lessons began. Other studies suggest that the actual size of the motor cortex, as well as that of the cerebellum--a region at the back of the brain involved in motor coordination--is greater in musicians.

Ode to Joy--or Sorrow beyond examining how the brain processes the auditory aspects of music, investigators are exploring how it evokes strong emotional reactions. Pioneering work in 1991 by John A. Sloboda of Keele University in England revealed that more than 80 percent of sampled adults reported physical responses to music, including thrills, laughter or tears. In a 1995 study by Jaak Panksepp of Bowling Green State University, 70 percent of several hundred young men and woman polled said that they enjoyed music "because it elicits emotions and feelings." Underscoring those surveys was the result of a 1997 study by Carol L. Krumhansl of Cornell University. She and her co-workers recorded heart rate, blood pressure, respiration and other physiological measures during the presentation of various pieces that were considered to express happiness, sadness, fear or tension. Each type of music elicited a different but consistent pattern of physiological change across subjects.

Until recently, scientists knew little about the brain mechanisms involved. One clue, though, comes from a woman known as I. R. (initials are used to maintain privacy), who suffered bilateral damage to her temporal lobes, including auditory cortical regions. Her intelligence and general memory are normal, and she has no language difficulties. Yet she can make no sense of nor recognize any music, whether it is a previously known piece or a new piece that she has heard repeatedly. She cannot distinguish between two melodies no matter how different they are. Nevertheless, she has normal emotional reactions to different types of music; her ability to identify an emotion with a particular musical selection is completely normal! From this case we learn that the temporal lobe is needed to comprehend melody but not to produce an emotional reaction, which is both subcortical and involves aspects of the frontal lobes.

An imaging experiment in 2001 by Anne Blood and Zatorre of McGill sought to better specify the brain regions involved in emotional reactions to music. This study used mild emotional stimuli, those associated with people's reactions to musical consonance versus dissonance. Consonant musical intervals are generally those for which a simple ratio of frequencies exists between two tones. An example is middle C (about 260 hertz, or Hz) and middle G (about 390 Hz). Their ratio is 2:3, forming a pleasant-sounding "perfect fifth" interval when they are played simultaneously. In contrast, middle C and C sharp (about 277 Hz) have a "complex" ratio of about 8:9 and are considered unpleasant, having a "rough" sound.

What are the underlying brain mechanisms of that experience? PET (positron emission tomography) imaging conducted while subjects listened to consonant or dissonant chords showed that different localized brain regions were involved in the emotional reactions. Consonant chords activated the orbitofrontal area (part of the reward system) of the right hemisphere and also part of an area below the corpus callosum. In contrast, dissonant chords activated the right parahippocampal gyrus. Thus, at least two systems, each dealing with a different type of emotion, are at work when the brain processes emotions related to music. How the different patterns of activity in the auditory system might be specifically linked to these differentially reactive regions of the hemispheres remains to be discovered.

In the same year, Blood and Zatorre added a further clue to how music evokes pleasure. When they scanned the brains of musicians who had chills of euphoria when listening to music, they found that music activated some of the same reward systems that are stimulated by food, sex and addictive drugs.

Overall, findings to date indicate that music has a biological basis and that the brain has a functional organization for music. It seems fairly clear, even at this early stage of inquiry, that many brain regions participate in specific aspects of music processing, whether supporting perception (such as apprehending a melody) or evoking emotional reactions. Musicians appear to have additional specializations, particularly hyperdevelopment of some brain structures. These effects demonstrate that learning retunes the brain, increasing both the responses of individual cells and the number of cells that react strongly to sounds that become important to an individual. As research on music and the brain continues, we can anticipate a greater understanding not only about music and its reasons for existence but also about how multifaceted it really is.Do You Hear What I Hear?

By Paul D. Lehrman

LEARNING TO LISTEN IN A MEDIATED WORLD

There's a priceless moment on the Firesign Theatre's third album when an authority figure (a prosecutor who is somehow also an auctioneer) bellows, What do I hear? and a stoned voice from the back of the room responds, That's metaphysically absurd, man. How can I know what you hear?

This brings to mind two questions. First of all, as we're professionals who depend on our hearing to produce sounds that will appeal to other people's ears, how do we know what our audience is actually hearing? And second, for that matter, how do we know what we're hearing? These two questions are becoming even more prevalent today, as most music listeners are enjoying sounds on low-fi playback systems or headphones far from the quality of studio monitors.

When it comes to our audience, you might as well ask, What do you mean by green? Physicists can agree on a range of wavelengths for green, while everyone else can point to different objects and get a general consensus from those around them that said objects are or are not the color in question. But no one can possibly put themselves into someone else's mind to see exactly how they experience green. As conscious beings, our perceptions are ours alone. Lily Tomlin's character Trudy the Bag Lady, in The Search for Signs of Intelligent Life in the Universe, put it perfectly when she said, Reality is nothing but a collective hunch.

Similarly with sound, we can measure its volume, look at its spectrum, see how it changes over time and analyze the impulse response of the space in which it's produced. But there's that subjective response to the sound that's within our heads that can't be measured at least not without a sensor on every brain cell and synapse involved.

Because we're in the business of shaping the reality of sounds, it's fairly important that our hunches be correct. And it's our ears that we trust. No amount of visual or data analysis will allow us to decide that a sound is right without hearing it.

How do we make that decision? A crucial part of the act of hearing is making comparisons between what our ears are telling us at the moment and the models that live in our memory of what we've heard before. From the moment our auditory faculties first kick in, those memories are established and baselines are formed. The first sounds all humans hear are their mothers, and then they hear other family members, then domestic sounds and gradually they take in the larger world outside. I imagine it's a safe bet to say that for most of us in this business, among those earliest aural experiences were the sounds of singing and musical instruments. Not only did these sounds intrigue and inspire us, but they provided us with the context in which we would listen and judge the sounds we would work with in our professional lives.

So we know what things are supposed to sound like. As professionals, we learn something else: What we're hearing through the studio monitors isn't the same as what we hear when there's a direct acoustic path from the sound source to our ears. Ideally, speakers would be totally flat with no distortion or phase error and with perfect dispersion, but even the best monitors are still far from being totally transparent. In addition, every indoor space that's not an anechoic chamber has its peculiar colorations, which are different from any other space. We need to be able to compensate for these distortions, consciously or unconsciously, and block out the sound of the speakers and the room as we listen. Our experience and training as professionals teach us how to eliminate the medium and concentrate on the source.

But this weird thing has happened in the past hundred or so years, and the trend is accelerating: The proportion of musical sounds that people are exposed to throughout their lives that are produced by organic means has been decreasing and is quickly approaching zero. This means that the baselines that we, and our audiences, need to determine what sounds real and what doesn't are disappearing.

Before the end of the 19th century, the only music anyone heard was performed live. The sound that reached an audience member's ears was that of the instruments and the singers, with nothing mediating between the mechanism of production whether it was a stick hitting a dried goatskin, the plucking of a taut piece of feline intestine or the vibrations of a set of vocal cords and the mechanism of perception.

But with the invention of the radio and the phonograph, all of that has changed. People could now listen to music 24 hours a day every day if they wanted and be nowhere near actual musicians. Compared to real instruments, wax cylinders and crystal sets sounded dreadful, but the convenience of hearing a huge variety of music at any time without leaving home more than made up for the loss in quality for most people.

The hi-fi boom that started in the 1950s improved things, as listeners began to appreciate better sound reproduction and the price of decent-sounding equipment fell to where even college students who soon became the music industry's most important market could afford it. Today's high-end and even medium-priced home audio equipment sounds better than ever.

But as the media for music delivery have blossomed from wax cylinders to XM Radio fewer people experience hearing acoustic music. Symphony orchestras are cutting back seasons or going out of business altogether all over America, and school music programs, which traditionally have given students the precious opportunity to hear what real instruments sound like from both a player's and a listener's perspective, are in the toilet. While there are certainly parts of the live music scene that are still healthy, they depend on sound systems that, as they get bigger and more complex to project to the farthest reaches of a large venue, serve to isolate the audiences even more from what's happening onstage acoustically.

And, as electronic sources of music have become more prolific, another thing has happened: Because it is now so easy to listen to music, people actually listen to it less, and it has become more of an environmental element like aural wallpaper. Because audiences aren't focusing so much on the music, the quality of the systems that many listen to has been allowed to slip backward. Personal stereos have been a major factor in this: From the Sony Walkman to the iPod, people are listening to crummy sound reproduction at top volume, screening out any kind of sonic reality and replacing it with a lo-fi sound. Everyone can now have their own private soundtrack, as if they were perpetually walking alone through a theme park, without any other aural distractions, with a 15dB dynamic range and nothing below 100 Hz.

I remember this hitting me like a ton of bricks one day in the summer of 1979. I had been out of the country for a few months, and soon after I returned to the U.S., I was walking in New York City's Central Park and came upon an amazing picture: On a patch of blacktop were several dozen gyrating disco-dancing roller skaters, but the only sound I could hear was that of the skate wheels on the pavement. Each of the dancers was sporting a pair of headphones with little antennae coming out of them. Inside each of the headphones, I soon realized, was an FM radio, and they were all dancing to music that I couldn't hear. But it became obvious after I watched them for a few minutes that they weren't all dancing to the same music; each was tuned to a different station.

The multimedia speaker systems that people now plug into their computers so they can listen to MP3 streams have taken us further down the same road. Companies that decades ago revolutionized speaker designs such as Advent, KLH and Altec Lansing have had their brands swallowed up by multinational electronics foundries that slap those once-revered names on tinny little underpowered speakers connected to subwoofers that produce a huge hump at 120 Hz so that consumers think they're getting something for their money.

More recently, the tools of personal audio wallpaper have entered the production chain. Again, one incident sticks out in my mind that showed me clearly where this was going: A couple of years ago, I went into a hip coffeehouse where the blaring post-punk music makes it impossible to hold a normal conversation and sat down at a table near a young man wearing earbuds and peering intently into a PowerBook. I glanced over, and to my amazement, I realized he was working on something in Digital Performer.

How many composers live in apartment buildings where they work late into the night and, for fear of disturbing their neighbors, never turn on their monitors but only mix on headphones? How many of your colleagues, or even you, boast of doing some of your best audio editing on a transcontinental plane flight?

A pessimist looking at this might conclude we were approaching a kind of perfect storm in which we lose complete control over what our audience hears. No one ever finds out what a real instrument sounds like; the systems that we use to reproduce and disseminate music are getting worse. And because most people don't even listen closely to music anymore, they don't care.

In my own teaching, I've seen how the lack of proper aural context results in an inability to discriminate between good and bad, real and not-real sound. In one of my school's music labs, I use a 14-year-old synth that, although I really like it as a teaching tool, I'll be the first to admit has a factory program set that is a little dated. But one of my students recently said, The sounds are so realistic, why would anyone need to use anything else?

There are nine workstations in that lab, which means the students have to work on headphones. We use pretty decent closed-ear models and the students generally don't have any complaints. That is until we play back their assignments on the room's powered speakers. Why does it sound so incredibly different? one will invariably ask. I take this as a splendid opportunity to teach them something about acoustics: how reflections and room modes affect bass response, the role of head effects in stereo imaging and so on. They dutifully take it in, but then they say, Yes, but why does it sound so incredibly different? The idea of the music and the medium being separate from each other sometimes just doesn't sink in.

If you're looking for an answer or even a conclusion here, I haven't got one. But I do know that the next generation of audio engineers and mixers if there's going to be one will have a hard time if they don't have more exposure than the average young person to natural, unamplified and unprocessed sound. If every sound we ever hear comes through a medium (and most of them suck), then how are we ever going to agree on what we hear?

Which means that our ears and our judgment are still all we have. Try to take care of both of them. And keep listening and keep learning.

How We Localize Sound Relying on a variety of cues, including intensity, timing, and spectrum, our brains recreate a three-dimensional image of the acoustic landscape from the sounds we hear. -- William M. HartmannFor as long as we humans have lived on Earth, we have been able to use our ears to localize the sources of sounds. Our ability to localize warns us of danger and helps us sort out individual sounds from the usual cacophony of our acoustical world. Characterizing this ability in humans and other animals makes an intriguing physical, physiological, and psychological study (see figure 1). John William Strutt (Lord Rayleigh) understood at least part of the localization process more than 120 years ago.1 He observed that if a sound source is to the right of the listeners forward direction, then the left ear is in the shadow cast by the listeners head. Therefore, the signal in the right ear should be more intense than the signal in the left one, and this difference is likely to be an important clue that the sound source is located on the right.

Interaural level differenceThe standard comparison between intensities in the left and right ears is known as the interaural level difference (ILD). In the spirit of the spherical cow, a physicist can estimate the size of the effect by calculating the acoustical intensity at opposite poles on the surface of a sphere, given an incident plane wave, and then taking the ratio. The level difference is that ratio expressed in decibels. As shown in figure 2, the ILD is a strong function of frequency over much of the audible spectrum (canonically quoted as 2020 000 Hz). That is because sound waves are effectively diffracted when their wavelength is longer than the diameter of the head. At a frequency of 500 Hz, the wavelength of sound is 69 cm -- four times the diameter of the average human head. The ILD is therefore small for frequencies below 500 Hz, as long as the source is more than a meter away. But the scattering by the head increases rapidly with increasing frequency, and at 4000 Hz the head casts a significant shadow.

Ultimately, the use of an ILD, small or large, depends on the sensitivity of the central nervous system to such differences. In evolutionary terms, it would make sense if the sensitivity of the central nervous system would somehow reflect the ILD values that are actually physically present. In fact, that does not appear to be the case. Psychoacoustical experiments find that the central nervous system is about equally sensitive at all frequencies. The smallest detectable change in ILD is approximately 0.5 dB, no matter what the frequency.2 Therefore the ILD is a potential localization cue at any frequency where it is physically greater than a decibel. It is as though Mother Nature knew in advance that her offspring would walk around the planet listening to portable music through headphones. The spherical-head model is obviously a simplification. Human heads include a variety of secondary scatterers that can be expected to lead to structure in the higher-frequency dependence of the ILD. Conceivably, this structure can serve as an additional cue for sound localization. As it turns out, that is exactly what happens, but that is another story for later in this article.

In the long-wavelength limit, the spherical-head model correctly predicts that the ILD should become uselessly small. If sounds are localized on the basis of ILD alone, it should be very difficult to localize a sound with a frequency content that is entirely below 500 Hz. It therefore came as a considerable surprise to Rayleigh to discover that he could easily localize a steady-state low-frequency pure tone such as 256 or 128 Hz. Because he knew that localization could not be based on ILD, he finally concluded in 1907 that the ear must be able to detect the difference in waveform phases between the two ears.3Interaural time differenceFor a pure tone like Rayleigh used, a difference in phases is equivalent to a difference in arrival times of waveform features (such as peaks and positive-going zero crossings) at the two ears. A phase difference corresponds to an interaural time difference (ITD) of t = /(2f) for a tone with frequency f. In the long-wavelength limit, the formula for diffraction by a sphere4 gives the interaural time difference t as a function of the azimuthal (leftright) angle :

where a is the radius of the head (approximately 8.75 cm) and c is the speed of sound (34 400 cm/s). Therefore, 3a/c = 763 s.

Psychoacoustical experiments show that human listeners can localize a 500 Hz sine tone with considerable accuracy. Near the forward direction ( near zero), listeners are sensitive to differences as small as 12. The idea that this sensitivity is obtained from an ITD initially seems rather outrageous. A 1 difference in azimuth corresponds to an ITD of only 13 s. It hardly seems possible that a neural system, with synaptic delays on the order of a millisecond, could successfully encode such small time differences. However, the auditory system, unaware of such mathematical niceties, goes ahead and does it anyway. This ability can be proved in headphone experiments, in which the ITD can be presented independently of the ILD. The key to the brains success in this case is parallel processing. The binaural system apparently beats the unfavorable timing dilemma by transmitting timing information through many neurons. Estimates of the number of neurons required, based on statistical decision theory, have ranged from 6 to 40 for each one-third-octave frequency band.

There remains the logical problem of just how the auditory system manages to use ITDs. There is now good evidence that the superior olivea processing center, or nucleus, in the midbrainis able to perform a cross-correlation operation on the signals in the two ears, as described in the box below. The headphone experiments with an ITD give the listener a peculiar experience. The position of the image is located to the left or right as expected, depending on the sign of the ITD, but the image seems to be within the listeners headit is not perceived to be in the real external world. Such an image is said to be lateralized and not localized. Although the lateralized headphone sensation is quite different from the sensation of a localized source, experiments show that lateralization is intimately connected to localization.

Figure 1. The sound localization facility at Wright Patterson Air Force Base in Dayton, Ohio, is a geodesic sphere, nearly 5 m in diameter, housing an array of 277 loudspeakers. Each speaker has a dedicated power amplifier, and the switching logic allows the simultaneous use of as many as 15 sources. The array is enclosed in a 6 m cubical anechoic room: Foam wedges 1.2 m long on the walls of the room make the room strongly absorbing for wavelengths longer than 5 m, or frequencies above 70 Hz. Listeners in localization experiments indicate perceived source directions by placing an electromagnetic stylus on a small globe. (Courtesy of Mark Ericson and Richard McKinley.)

Using headphones, one can measure the smallest detectable change in ITD as a function of the ITD itself. These ITD data can be used with equation 1 to predict the smallest detectable change in azimuth for a real source as a function of . When the actual localization experiment is done with a real source, the results agree with the predictions, as is to be expected if the brain relies on ITDs to make decisions about source location.

Like any phase-sensitive system, the binaural phase detector that makes possible the use of ITDs suffers from phase ambiguity when the wavelength is comparable to the distance between the two measurements. This problem is illustrated in figure 3. The equivalent temporal viewpoint is that, to avoid ambiguity, a half period of the wave must be longer than the delay between the ears. When the delay is exactly half a period, the signals at the two ears are exactly out of phase and the ambiguity is complete. For shorter periods, between twice the delay and the delay itself, the ITD leads to an apparent source location that is on the opposite side of the head compared to the true location. It would be better to have no ITD sensitivity at all than to have a process that gives such misleading answers. In fact, the binaural system solves this problem in what appears to be the best possible way: The binaural system rapidly loses sensitivity to any ITD at all as the frequency of the wave increases from 1000 to 1500 Hzexactly the range in which the interaural phase difference becomes ambiguous.

One might imagine that the network of delay lines and coincidence detectors described in the box vanishes at frequencies greater than about 1500 Hz. Such a model would be consistent with the results of pure-tone experiments, but it would be wrong. In fact, the binaural system can successfully register an ITD that occurs at a high frequency such as 4000 Hz, if the signal is modulated. The modulation, in turn, must have a rate that is less than about 1000 Hz. Therefore, the failure of the binaural timing system to process sine tones above 1500 Hz cannot be thought of as a failure of the binaural neurons tuned to high frequency. Instead, the failure is best described in the temporal domain, as an inability to track rapid variations.

To summarize the matter of binaural differences, the physiology of the binaural system is sensitive to amplitude cues from ILDs at any frequency, but for incident plane waves, ILD cues exist physically only for frequencies above about 500 Hz. They become large and reliable for frequencies above 3000 Hz, making ILD cues most effective at high frequencies. In contrast, the binaural physiology is capable of using phase information from ITD cues only at low frequencies, below about 1500 Hz. For a sine tone of intermediate frequency, such as 2000 Hz, neither cue works well. As a result, human localization ability tends to be poor for signals in this frequency region.

The inadequacy of binaural difference cuesThe binaural time and level differences are powerful cues for the localization of a source, but they have important limitations. Again, in the spherical-head approximation, the inadequacy of interaural differences is evident because, for a source of sound moving in the midsagittal plane (the perpendicular bisector of a line drawn through both ears), the signals to left and right earsand therefore binaural differencesare the same. As a result, the listener with the hypothetical spherical head cannot distinguish between sources in back, in front, or overhead. Because of a fine sensitivity to binaural differences, this listener can detect displacements of only a degree side-to-side, but cannot tell back from front! This kind of localization difficulty does not correspond to our usual experience. There is another problem with this binaural difference model: If a tone or broadband noise is heard through headphones with an ITD, an ILD, or both, the listener has the impression of lateralitycoming from the left or rightas expected, but, as previously mentioned, the sound image appears to be within the head, and it may also be diffuse and fuzzy instead of compact. This sensation, too, is unlike our experience of the real world, in which sounds are perceived to be externalized. The resolution of frontback confusion and the externalization of sound images turn on another sound localization cue, the anatomical transfer function.

Figure 2. Interaural level differences, calculated for a source in the azimuthal plane defined by the two ears and the nose. The source radiates frequency f and is located at an azimuth of 10 (green curve), 45 (red), or 90 (blue) with respect to the listeners forward direction. The calculations assume that the ears are at opposite poles of a rigid sphere.

The anatomical transfer functionSound waves that come from different directions in space are differently scattered by the listeners outer ears, head, shoulders, and upper torso. The scattering leads to an acoustical filtering of the signals appearing at left and right ears. The filtering can be described by a complex response functionthe anatomical transfer function (ATF), also known as the head-related transfer function (HRTF). Because of the ATF, waves that come from behind tend to be boosted in the 1000 Hz frequency region, whereas waves that come from the forward direction are boosted near 3000 Hz. The most dramatic effects occur above 4000 Hz: In this region, the wavelength is less than 10 cm and details of the head, especially the outer ears, or pinnae, become significant scatterers. Above 6000 Hz, the ATF for different individuals becomes strikingly individualistic, but there are a few features that are found rather generally. In most cases, there is a valley-and-peak structure that tends to move to higher frequencies as the elevation of the source increases from below to above the head. For example, figure 4 shows the spectrum for sources in front, in back, and directly overhead, measured inside the ear of a Knowles Electronics Manikin for Acoustic Research (KEMAR). The peak near 7000 Hz is thought to be a particularly prominent cue for a source overhead. The direction-dependent filtering by the anatomy, used by listeners to resolve frontback confusion and to determine elevation, is also a necessary component of externalization. Experiments further show that getting the ATF correct with virtual reality techniques is sufficient to externalize the image. But there is an obvious problem in the application of the ATF. A priori, there is no way that a listener can know if a spectrally prominent feature comes from direction-dependent filtering or whether it is part of the original source spectrum. For instance, a signal with a strong peak near 7000 Hz may not necessarily come from aboveit might just come from a source that happens to have a lot of power near 7000 Hz.

Figure 3. Interaural time differences, given by the difference in arrival times of waveform features at the two ears, are useful localization cues only for long wavelengths. In (a), the signal comes from the right, and waveform features such as the peak numbered 1 arrive at the right ear before arriving at the left. Because the wavelength is greater than twice the head diameter, no confusion is caused by other peaks of the waveform, such as peaks 0 or 2. In (b), the signal again comes from the right, but the wavelength is shorter than twice the head diameter. As a result, every feature of cycle 2 arriving at the right ear is immediately preceded by a corresponding feature from cycle 1 at the left ear. The listener naturally concludes that the source is on the left, contrary to fact.Confusion of this kind between the source spectrum and the ATF immediately appears with narrow-band sources such as pure tones or noise bands having a bandwidth of a few semitones. When a listener is asked to say whether a narrow-band sound comes from directly in front, in back, or overhead, the answer will depend entirely on the frequency of the soundthe true location of the sound source is irrelevant.5 Thus, for narrow-band sounds, the confusion between source spectrum and location is complete. The listener can solve this localization problem only by turning the head so that the source is no longer in the midsagittal plane. In an interesting variation on this theme, Frederic Wightman and Doris Kistler at the University of WisconsinMadison have shown that it is not enough if the source itself movesthe listener will still be confused about front and back. The confusion can be resolved, though, if the listener is in control of the source motion.6Fortunately, most sounds of the everyday world are broadband and relatively benign in their spectral variation, so that listeners can both localize the source and identify it on the basis of the spectrum. It is still not entirely clear how this localization process works. Early models of the process that focused on particular spectral features (such as the peak at 7000 Hz for a source overhead) have given way, under the pressure of recent research, to models that employ the entire spectrum.

The Binaural Cross-Correlation Model

In 1948, Lloyd Jeffress proposed that the auditory system processes interaural time differences by using a network of neural delay lines terminating in ee neurons.10 An ee neuron is like an AND gate, responding only if excitation is present on both of two inputs (hence the name ee). According to the Jeffress model, one input comes from the left ear and the other from the right. Inputs are delayed by neural delay lines so that different ee cells experience a coincidence for different arrival times at the two ears. An illustration of how the network is imagined to work is shown in the figure. An array of ee cells is distributed along two axes: frequency and neural internal delay. The frequency axis is needed because binaural processing takes place in tuned channels. These channels represent frequency analysisthe first stage of auditory processing. Any plausible auditory model must contain such channels.Inputs from left ear (blue) and right ear (red) proceed down neural delay lines in each channel and coincide at the ee cells for which the neural delay exactly compensates for the fact that the signal started at one ear sooner than the other. For instance, if the source is off to the listeners left, then signals start along the delay lines sooner from the left side. They coincide with the corresponding signals from the right ear at neurons to the right of = 0, that is, at a positive value of . The coincidence of neural signals causes the ee neurons to send spikes to higher processing centers in the brain.The expected value for the number of coincidences Nc at the ee cell specified by delay t is given in terms of the rates PL(t) and PR(t) of neural spikes from left and right ears by the convolution-like integral

where TW is the width of the neurons coincidence window and TS is the duration of the stimulus.11 Thus, Nc is the cross correlation between signals in the left and right ears. Neural delay and coincidence circuits of just this kind have been found in the superior olive in the midbrain of cats.12

The experimental artMost of what we know about sound localization has been learned from experiments using headphones. With headphones, the experimenter can precisely control the stimulus heard by the listener. Even experiments done on cats, birds, and rodents have these creatures wearing miniature earphones. In the beginning, much was learned about fundamental binaural capabilities from headphone experiments with simple differences in level and arrival time for tones of various frequencies and noises of various compositions.7 However, work on the larger question of sound localization had to await several technological developments to achieve an accurate rendering of the ATF in each ear. First were the acoustical measurements themselves, done with tiny probe microphones inserted in the listeners ear canals to within a few millimeters of the eardrums. Transfer functions measured with these microphones allowed experimenters to create accurate simulations of the real world using headphones, once the transfer functions of the microphones and headphones themselves had been compensated by inverse filtering.Adequate filtering requires fast, dedicated digital signal processors linked to the computer that runs experiments. The motion of the listeners head can be taken into account by means of an electromagnetic head tracker. The head tracker consists of a stationary transmitter, whose three coils produce low-frequency magnetic fields, and a receiver, also with three coils, that is mounted on the listeners head. The tracker gives a reading of all six degrees of freedom in the head motion, 60 times per second. Based on the motion of the head, the controlling computer directs the fast digital processor to refilter the signals to the ears so that the auditory scene is stable and realistic. This virtual reality technology is capable of synthesizing a convincing acoustical environment. Starting with a simple monaural recording of a conversation, the experimenter can place the individual talkers in space. If the listeners head turns to face a talker, the auditory image remains constant, as it does in real life. What is most important for the psychoacoustician, this technology has opened a large new territory for controlled experiments.Making it wrongWith headphones, the experimenter can create conditions not found in nature to try to understand the role of different localization mechanisms. For instance, by introducing an ILD that points to the left opposed by an ITD that points to the right, one can study the relative strengths of these two cues. Not surprisingly, it is found that ILDs dominate at high frequency and ITDs dominate at low frequency. But perception is not limited to just pointlike localization; it also includes size and shape. Rivalry experiments such as contradictory ILDs and ITDs lead to a source image that is diffuse: The image occupies a fuzzy region within the head that a listener can consistently describe. The effect can also be measured as an increased variance in lateralization judgments.

The curves show the spectrum of a small loudspeaker as heard in the left ear of a manikin when the speaker is in front (red), overhead (blue), and in back (green). A comparison of the curves reveals the relative gains of the anatomical transfer function.

left) The KEMAR manikin is, in every gross anatomical detail, a typical American. It has silicone outer ears and microphones in its head. The coupler between the ear canal and the microphone is a cavity tuned to have the input acoustical impedance of the middle ear. The KEMAR shown here is in an anechoic room accompanied by Tim, an undergraduate physics major at Michigan State.Incorporating the ATF into headphone simulations considerably expands the menu of bizarre effects. An accurate synthesis of a broadband sound leads to perception that is like the real world: Auditory images are localized, externalized, and compact. Making errors in the synthesis, for example progressively zeroing the ITD of spectral lines while retaining the amplitude part of the ATF, can cause the image to come closer to the head, push on the face, and form a blob that creeps into the ear canal and finally enters the head. The process can be reversed by progressively restoring accurate ITD values.8 A wide variety of effects can occur, by accident or design, with inaccurate synthesis. There are a few general rules: Inaccuracies tend to expand the size of the image, put the images inside the head, and produce images that are in back rather than in front. Excellent accuracy is required to avoid frontback confusion. The technology permits a listener to hear the world with someone elses ears, and the usual result is an increase in confusion about front and back. Reduced accuracy often puts all source images in back, although they are nevertheless externalized. Further reduction in accuracy puts the images inside the back of the head.Rooms and reflectionsThe operations of interaural level and time difference cues and of spectral cues have normally been tested with headphones or by sound localization experiments in anechoic rooms, where all the sounds travel in a straight path from the source to the listener. Most of our everyday listening, however, is done in the presence of walls, floors, ceilings, and other large objects that reflect sound waves. These reflections result in dramatic physical changes to the waveforms. It is hard to imagine how the reflected sounds, coming from all directions, can contribute anything but random variation to the cues used in localization. Therefore, it is expected that the reflections and reverberation introduced by the room are inevitably for the worse as far as sound localization is concerned. That is especially true for the ITD cue.The ITD is particularly vulnerable because it depends on coherence between the signals in the two earsthat is, the height of the cross-correlation function, as described in the box above. Reverberated sound contains no useful coherent information, and in a large room where reflected sound dominates the direct sound, the ITD becomes unreliable.By contrast, the ILD fares better. First, as shown by headphone experiments, the binaural comparison of intensities does not care whether the signals are binaurally coherent or not. Such details of neural timing appear to be stripped away as the ILD is computed. Of course, the ILD accuracy is adversely affected by standing waves in a room, but here the second advantage of the ILD appears: Almost every reflecting surface has the property that its acoustical absorption increases with increasing frequency; as a result, the reflected power becomes relatively smaller compared to the direct power. Because the binaural neurophysiology is capable of using ILDs across the audible spectrum with equal success, it is normally to the listeners advantage to use the highest frequency information that can be heard. Experiments in highly reverberant environments find listeners doing exactly that, using cues above 8000 Hz. A statistical decision theory analysis using ILDs and ITDs measured with a manikin shows that the pattern of localization errors observed experimentally can be understood by assuming that listeners rely entirely on ILDs and not at all on ITDs. This strategy of reweighting localization cues is entirely unconscious.The precedence effectThere is yet another strategy that listeners unconsciously employ to cope with the distorted localization cues that occur in a room: They make their localization judgments instantly based on the earliest arriving waves in the onset of a sound. This strategy is known as the precedence effect, because the earliest arriving sound wavethe direct sound with accurate localization informationis given precedence over the subsequent reflections and reverberation that convey inaccurate information. Anyone who has wandered around a room trying to locate the source of a pure tone without hearing the onset can appreciate the value of the effect. Without the action of the precedence effect on the first arriving wave, localization is virtually impossible. There is no ITD information of any use, and, because of standing waves, the loudness of the tone is essentially unrelated to the nearness of the source.

Figure 5. Precedence effect demonstration with two loudspeakers reproducing the same pulsed wave. The pulse from the left speaker leads in the left ear by a few hundred microseconds, suggesting that the source is on the left. The pulse from the right speaker leads in the right ear by a similar amount, which provides a contradictory localization cue. Because the listener is closer to the left speaker, the left pulse arrives sooner and wins the competitionthe listener perceives just one single pulse coming from the left.

The operation of the precedence effect is often thought of as a neural gate that is opened by the onset of a sound, accumulates localization information for about 1 ms, and then closes to shut off subsequent localization cues. This operation appears dramatically in experiments where it is to the listeners advantage to attend to the subsequent cues but the precedence effect prevents it. An alternative model regards precedence as a strong reweighting of localization cues in favor of the earliest sound, because the subsequent sound is never entirely excluded from the localization computation.Precedence is easily demonstrated with a standard home stereo system set for monophonic reproduction, so that the same signal is sent to both loudspeakers. Standing midway between the speakers, the listener hears the sound from a forward direction. Moving half a meter closer to the left speaker causes the sound to appear to come entirely from that speaker. The analysis of this result is that each speaker sends a signal to both ears. Each speaker creates an ILD andof particular importancean ITD, and these cues compete, as shown in figure 5. Because of the precedence effect, the first sound (from the left speaker) wins the competition, and the listener perceives the sound as coming from the left. But although the sound appears to come from the left speaker alone, the right speaker continues to contribute loudness and a sense of spatial extent. This perception can be verified by suddenly unplugging the right speakerthe difference is immediately apparent. Thus, the precedence effect is restricted to the formation of a single fused image with a definite location. The precedence effect appears not to depend solely on interaural differences; it operates also on the spectral differences caused by anatomical filtering for sources in the midsagittal plane.9Conclusions and conjecturesAfter more than a century of work, there is still much about sound localization that is not understood. It remains an active area of research in psychoacoustics and in the physiology of hearing. In recent years, there has been growing correspondence between perceptual observations, physiological data on the binaural processing system, and neural modeling. There is good reason to expect that next year we will understand sound localization better than we do this year, but it would be wrong to think that we have only to fill in the details. It is likely that next year will lead to a qualitatively improved understanding with models that employ new ideas about neural signal processing. In this environment, it is risky to conjecture about future development, but there are trends that give clues. Just a decade ago, it was thought that much of sound localization in general, and precedence in particular, might be a direct result of interaction at early stages of the binaural system, as in the superior olive. Recent research suggests that the process is more widely distributed, with peripheral centers of the brain such as the superior olive sending informationabout ILD, about ITD, about spectrum, and about arrival orderto higher centers where the incoming data are evaluated for self-consistency and plausibility, and are probably compared with information obtained visually. Therefore, sound localization is not simple; it is a large mental computation. But as the problem has become more complicated, our tools for studying it have become better. Improved psychophysical techniques for flexible synthesis of realistic stimuli, physiological experiments probing different neural regions simultaneously, faster and more precise methods of brain imaging, and more realistic computational models will one day solve this problem of how we localize sound. Bill Hartmann is a professor of physics at Michigan State University in East Lansing, Michigan ([email protected]; http://www.pa.msu.edu/acoustics). He is the author of the textbook Signals, Sound, and Sensation (AIP Press, 1997).

The author is grateful to his colleagues Brad Rakerd, Tim McCaskey, Zachary Constan, and Joseph Gaalaas for help with this article. His work on sound localization is supported by the National Institute on Deafness and Other Communication Disorders, one of the National Institutes of Health.References 1.J. W. Strutt (Lord Rayleigh), Phil. Mag. 3, 456 (1877). 2.W. A. Yost, J. Acoust. Soc. Am. 70, 397 (1981). 3.J. W. Strutt (Lord Rayleigh), Phil. Mag. 13, 214 (1907). 4.G. F. Kuhn, J. Acoust. Soc. Am. 62, 157 (1977). 5.J. Blauert, Spatial Hearing, 2nd ed., J. S. Allen, trans., MIT Press, Cambridge, Mass. (1997). 6.F. L. Wightman, D. J. Kistler, J. Acoust. Soc. Am. 105, 2841 (1999). 7.N. I. Durlach, H. S. Colburn, in Handbook of Perception, vol. 4, E. Carterette, M. P. Friedman, eds., Academic, New York (1978). 8.W. M. Hartmann, A. T. Wittenberg, J. Acoust. Soc. Am. 99, 3678 (1996). 9.R. Y. Litovsky, B. Rakerd, T. C. T. Yin, W. M. Hartmann, J. Neurophysiol. 77, 2223 (1997). 10.L. A. Jeffress, J. Comp. Physiol. Psychol. 41, 35 (1948). 11.R. M. Stern, H. S, Colburn, J. Acoust. Soc. Am. 64, 127 (1978). 12.T. C. T. Yin, J. C. K. Chan, J. Neurophysiol. 64, 465 (1990). 1999 American Institute of Physics

PAGE 9