1 - The Perception of Musical Tones, Pages 1-33

8/16/2019 1 - The Perception of Musical Tones, Pages 1-33

1/33

1The Perception of Musical Tones

Andrew J. Oxenham

Department of Psychology, University of Minnesota, Minneapolis

I. Introduction

A. What Are Musical Tones?

The definition of a tone—a periodic sound that elicits a pitch sensation—encompasses

the vast majority of musical sounds. Tones can be either pure—sinusoidal variations

in air pressure at a single frequency—or complex. Complex tones can be divided into

two categories, harmonic and inharmonic. Harmonic complex tones are periodic,

with a repetition rate known as the fundamental frequency (F0), and are composed of

a sum of sinusoids with frequencies that are all integer multiples, or harmonics,

of the F0. Inharmonic complex tones are composed of multiple sinusoids that arenot simple integer multiples of any common F0. Most musical instrumental or

vocal tones are more or less harmonic but some, such as bell chimes, can be

inharmonic.

B. Measuring Perception

The physical attributes of a sound, such as its intensity and spectral content, can be

readily measured with modern technical instrumentation. Measuring the perceptionof sound is a different matter. Gustav Fechner, a 19th-century German scientist,

is credited with founding the field of psychophysics—the attempt to establish a

quantitative relationship between physical variables (e.g., sound intensity and fre-

quency) and the sensations they produce (e.g., loudness and pitch; Fechner, 1860).

The psychophysical techniques that have been developed since Fechner’s time to

tap into our perceptions and sensations (involving hearing, vision, smell, touch, and

taste) can be loosely divided into two categories of measures, subjective and objec-

tive. The subjective measures typically require participants to estimate or produce

magnitudes or ratios that relate to the dimension under study. For instance, inestablishing a loudness scale, participants may be presented with a series of tones

at different intensities and then asked to assign a number to each tone, correspond-

ing to its loudness. This method of magnitude estimation thus produces a psycho-

physical function that directly relates loudness to sound intensity. Ratio estimation

follows the same principle, except that participants may be presented with two

The Psychology of Music. DOI: http://dx.doi.org/10.1016/B978-0-12-381460-9.00001-8

© 2013 Elsevier Inc. All rights reserved.

http://dx.doi.org/10.1016/B978-0-12-381460-9.00001-8http://dx.doi.org/10.1016/B978-0-12-381460-9.00001-8


2/33

sounds and then asked to judge how much louder (e.g., twice or three times) one

sound is than the other. The complementary methods are magnitude production and

ratio production. In these production techniques, the participants are required to

vary the relevant physical dimension of a sound until it matches a given magnitude(number), or until it matches a specific ratio with respect to a reference sound.

In the latter case, the instructions may be something like “adjust the level of the

second sound until it is twice as loud as the first sound.” All four techniques have

been employed numerous times in attempts to derive appropriate psychophysical

scales (e.g., Buus, Muesch, & Florentine, 1998; Hellman, 1976; Hellman &

Zwislocki, 1964; Stevens, 1957; Warren, 1970). Other variations on these methods

include categorical scaling and cross-modality matching. Categorical scaling involves

asking participants to assign the auditory sensation to one of a number of fixed

categories; following our loudness example, participants might be asked to select acategory ranging from very quiet to very loud (e.g., Mauermann, Long, & Kollmeier,

2004). Cross-modality matching avoids the use of numbers by, for instance, asking

participants to adjust the length of a line, or a piece of string, to match the perceived

loudness of a tone (e.g., Epstein & Florentine, 2005). Although all these methods

have the advantage of providing a more-or-less direct estimate of the relationship

between the physical stimulus and the sensation, they have a number of disadvan-

tages also. First, they are subjective and rely on introspection on the part of the

subject. Perhaps because of this they can be somewhat unreliable, variable across

and within participants, and prone to various biases (e.g., Poulton, 1977).The other approach is to use an objective measure, where a right and wrong

answer can be verified externally. This approach usually involves probing the limits

of resolution of the sensory system, by measuring absolute threshold (the smallest

detectable stimulus), relative threshold (the smallest detectable change in a stimulus),

or masked threshold (the smallest detectable stimulus in the presence of another

stimulus). There are various ways of measuring threshold, but most involve a forced-

choice procedure, where the subject has to pick the interval that contains the target

sound from a selection of two or more. For instance, in an experiment measuring

absolute threshold, the subject might be presented with two successive time intervals,marked by lights; the target sound is played during one of the intervals, and the

subject has to decide which one it was. One would expect performance to change

with the intensity of the sound: at very low intensities, the sound will be completely

inaudible, and so performance will be at chance (50% correct in a two-interval task);

at very high intensities, the sound will always be clearly audible, so performance will

be near 100%, assuming that the subject continues to pay attention. A psychometric

function can then be derived, which plots the performance of a subject as a function

of the stimulus parameter. An example of a psychometric function is shown in

Figure 1, which plots percent correct as a function of sound pressure level. This typeof forced-choice paradigm is usually preferable (although often more time-consuming)

than more subjective measures, such as the method of limits, which is often used today

to measure audiograms. In the method of limits, the intensity of a sound is decreased

until the subject reports no longer being able to hear it, and then the intensity

of the sound is increased until the subject again reports being able to hear it.

2 Andrew J. Oxenham


3/33

The trouble with such measures is that they rely not just on sensitivity but also on

criterion—how willing the subject is to report having heard a sound if he or she is

not sure. A forced-choice procedure eliminates that problem by forcing participants

to guess, even if they are unsure which interval contained the target sound. Clearly,

testing the perceptual limits by measuring thresholds does not tell us everything

about human auditory perception; a primary concern is that these measures are typi-

cally indirect—the finding that people can detect less than a 1% change in frequency

does not tell us much about the perception of much larger musical intervals, such as

an octave. Nevertheless it has proved extremely useful in helping us to gain a deeperunderstanding of perception and its relation to the underlying physiology of the

ear and brain.

Measures of reaction time, or response time (RT), have also been used to probe

sensory processing. The two basic forms of response time are simple response time

(SRT), where participants are instructed to respond as quickly as possible by push-

ing a single button once a stimulus is presented, and choice response time (CRT),

where participants have to categorize the stimulus (usually into one of two catego-

ries) before responding (by pressing button 1 or 2).

Although RT measures are more common in cognitive tasks, they also dependon some basic sound attributes, such as sound intensity, with higher intensity

sounds eliciting faster reactions, measured using both SRTs (Kohfeld, 1971;

Luce & Green, 1972) and CRTs (Keuss & van der Molen, 1982).

Finally, measures of perception are not limited to the quantitative or numerical

domain. It is also possible to ask participants to describe their percepts in words.

This approach has clear applications when dealing with multidimensional attributes,

such as timbre (see below, and Chapter 2 of this volume), but also has some inherent

difficulties, as different people may use descriptive words in different ways.

To sum up, measuring perception is a thorny issue that has many solutions, allwith their own advantages and shortcomings. Perceptual measures remain a crucial

“systems-level” analysis tool that can be combined in both human and animal stud-

ies with various physiological and neuroimaging techniques, to help us discover

more about how the ears and brain process musical sounds in ways that elicit

music’s powerful cognitive and emotional effects.

100

90

80

70

60

50

–5 0 5

Signal level (dB SPL)

P e r c e n t c o r r e c t

10 15

Figure 1 A schematic example of

a psychometric function, plotting

percent correct in a two-alternative

forced-choice task against thesound pressure level of a test tone.

31. The Perception of Musical Tones


4/33


5/33

broadband sounds remains roughly constant when expressed as a ratio or in deci-

bels is in line with the well-known Weber’s law, which states that the JND between

two stimuli is proportional to the magnitude of the stimuli.

In contrast to our ability to judge differences in sound level between two soundspresented one after another, our ability to categorize or label sound levels is rather

poor. In line with Miller’s (1956) famous “7 plus or minus 2” postulate for infor-

mation processing and categorization, our ability to categorize sound levels accu-

rately is fairly limited and is subject to a variety of influences, such as the context

of the preceding sounds. This may explain why the musical notation of loudness

(in contrast to pitch) has relatively few categories between pianissimo and

fortissimo—typically just six ( pp, p, mp, mf , f , and ff ).

2. Equal Loudness Contours and the Loudness Weighting Curves

There is no direct relationship between the physical sound level (in dB SPL) and

the sensation of loudness. There are many reasons for this, but an important one is

that loudness depends heavily on the frequency content of the sound. Figure 2

shows what are known as equal loudness contours. The basic concept is that two

pure tones with different frequencies, but with levels that fall on the same loudness

contour, have the same loudness. For instance, as shown in Figure 2, a pure tone

with a frequency of 1 kHz and a level of 40 dB SPL has the same loudness as a

pure tone with a frequency of 100 Hz and a level of about 64 dB SPL; in other words,

a 100-Hz tone has to be 24 dB higher in level than a 40-dB SPL 1-kHz tone in order

130

100 phons

90

80

70

60

50

40

30

20

10

Hearing threshold

120

110

100

90

80

70

60

50

40

30

20

10

0

–1016 31,5 63 125 250 500 1000

Frequency in Hz

S o u n d p r e s s u r e l e v e l i n d B

2000 4000 8000 16000

Figure 2 The equal-loudness contours, taken from ISO 226:2003.

Original figure kindly provided by Brian C. J. Moore.



6/33

to be perceived as being equally loud. The equal loudness contours are incorporated

into an international standard (ISO 226) that was initially established in 1961 and was

last revised in 2003.

These equal loudness contours have been derived several times from painstakingpsychophysical measurements, not always with identical outcomes (Fletcher &

Munson, 1933; Robinson & Dadson, 1956; Suzuki & Takeshima, 2004). The mea-

surements typically involve either loudness matching, where a subject adjusts the

level of one tone until it sounds as loud as a second tone, or loudness comparisons,

where a subject compares the loudness of many pairs of tones and the results are

compiled to derive points of subjective equality (PSE). Both methods are highly

susceptible to nonsensory biases, making the task of deriving a definitive set of

equal loudness contours a challenging one (Gabriel, Kollmeier, & Mellert, 1997).

The equal loudness contours provide the basis for the measure of “loudnesslevel,” which has units of “phons.” The phon value of a sound is the dB SPL value

of a 1-kHz tone that is judged to have the same loudness as the sound. So, by defi-

nition, a 40-dB SPL tone at 1 kHz has a loudness level of 40 phons. Continuing the

preceding example, the 100-Hz tone at a level of about 64 dB SPL also has a loud-

ness level of 40 phons, because it falls on the same equal loudness contour as the

40-dB SPL 1-kHz tone. Thus, the equal loudness contours can also be termed the

equal phon contours.

Although the actual measurements are difficult, and the results somewhat conten-

tious, there are many practical uses for the equal loudness contours. For instance, inissues of community noise annoyance from rock concerts or airports, it is more use-

ful to know about the perceived loudness of the sounds in question, rather than just

their physical level. For this reason, an approximation of the 40-phon equal loudness

contour is built into most modern sound level meters and is referred to as the

“A-weighted” curve. A sound level that is quoted in dB (A) is an overall sound level

that has been filtered with the inverse of the approximate 40-phon curve. This means

that very low and very high frequencies, which are perceived as being less loud, are

given less weight than the middle of the frequency range.

As with all useful tools, the A-weighted curve can be misused. Because it isbased on the 40-phon curve, it is most suitable for low-level sounds; however, that

has not prevented it from being used in measurements of much higher-level sounds,

where a flatter filter would be more appropriate, such as that provided by the

much-less-used C-weighted curve. The ubiquitous use of the dB (A) scale for all

levels of sound therefore provides an example of a case where the convenience of a

single-number measure (and one that minimizes the impact of difficult-to-control

low frequencies) has outweighed the desire for accuracy.

3. Loudness Scales

Equal loudness contours and phons tell us about the relationship between loudness

and frequency. They do not, however, tell us about the relationship between loud-

ness and sound level. For instance, the phon, based as it is on the decibel scale at

1 kHz, says nothing about how much louder a 60-dB SPL tone is than a 30-dB

6 Andrew J. Oxenham


7/33

SPL tone. The answer, according to numerous studies of loudness, is not twice as

loud. There have been numerous attempts since Fechner’s day to relate the physical

sound level to loudness. Fechner (1860), building on Weber’s law, reasoned that if

JNDs were constant on a logarithmic scale, and if equal numbers of JNDs reflectedan equal change in loudness, then loudness must be related logarithmically to sound

intensity. Harvard psychophysicist S. S. Stevens disagreed, claiming that JNDs

reflected “noise” in the auditory system, which did not provide direct insight into

the function relating loudness to sound intensity (Stevens, 1957). Stevens’s

approach was to use magnitude and ratio estimation and production techniques, as

described in Section I of this chapter, to derive a relationship between loudness and

sound intensity. He concluded that loudness ( L ) was related to sound intensity ( I )

by a power law:

L 5 kI α (Eq. 1)

where the exponent, α, has a value of about 0.3 at medium frequencies and for

moderate and higher sound levels. This law implies that a 10-dB increase in level

results in a doubling of loudness. At low levels, and at lower frequencies, the expo-

nent is typically larger, leading to a steeper growth-of-loudness function. Stevens

used this relationship to derive loudness units, called “sones.” By definition, 1 sone

is the loudness of a 1-kHz tone presented at a level of 40 dB SPL; 2 sones is twice

as loud, corresponding roughly to a 1-kHz tone presented at 50 dB SPL, and 4sones corresponds to the same tone at about 60 dB SPL.

Numerous studies have supported the basic conclusion that loudness can be

related to sound intensity by a power law. However, in part because of the variability

of loudness judgments, and the substantial effects of experimental methodology

(Poulton, 1979), different researchers have found different values for the best-fitting

exponent. For instance, Warren (1970) argued that presenting participants with sev-

eral sounds to judge invariably results in bias. He therefore presented each subject

with only one trial. Based on these single-trial judgments, Warren also derived a

power law, but he found an exponent value of 0.5. This exponent value is what onemight expect if the loudness of sound were proportional to its distance from the

receiver, leading to a 6-dB decrease in level for every doubling of distance. Yet

another study, which tried to avoid bias effects by using the entire (100-dB) level

range within each experiment, derived an exponent of only 0.1, implying a doubling

of loudness for every 30-dB increase in sound level (Viemeister & Bacon, 1988).

Overall, it is generally well accepted that the relationship between loudness and

sound intensity can be approximated as a power law, although methodological issues

and intersubject and intrasubject variability have made it difficult to derive a defini-

tive and uncontroversial function relating the sensation to the physical variable.

4. Partial Loudness and Context Effects

Most sounds that we encounter, particularly in music, are accompanied by other

sounds. This fact makes it important to understand how the loudness of a sound is



8/33


9/33

(Moore & Glasberg, 1997), and others have been extended to explain the loudness

of sounds that fluctuate over time (Chalupper & Fastl, 2002; Glasberg & Moore,

2002). However, none has yet attempted to incorporate context effects, such as

loudness recalibration or loudness enhancement.

B. Pitch

Pitch is arguably the most important dimension for conveying music. Sequences of

pitches form a melody, and simultaneous combinations of pitches form harmony—

two foundations of Western music. There is a vast body of literature devoted to

pitch research, from both perceptual and neural perspectives (Plack, Oxenham,

Popper, & Fay, 2005). The clearest physical correlate of pitch is the periodicity, or

repetition rate, of sound, although other dimensions, such as sound intensity, canhave small effects (e.g., Verschuure & van Meeteren, 1975). For young people

with normal hearing, pure tones with frequencies between about 20 Hz and 20 kHz

are audible. However, only sounds with repetition rates between about 30 Hz and

5 kHz elicit a pitch percept that can be called musical and is strong enough to carry

a melody (e.g., Attneave & Olson, 1971; Pressnitzer, Patterson, & Krumbholz,

2001; Ritsma, 1962). Perhaps not surprisingly, these limits, which were determined

through psychoacoustical investigation, correspond quite well to the lower and

upper limits of pitch found on musical instruments: the lowest and highest notes of

a modern grand piano, which covers the ranges of all standard orchestral instru-ments, correspond to 27.5 Hz and 4186 Hz, respectively.

We tend to recognize patterns of pitches that form melodies (see Chapter 7 of

this volume). We do this presumably by recognizing the musical intervals between

successive notes (see Chapters 4 and 7 of this volume), and most of us seem rela-

tively insensitive to the absolute pitch values of the individual note, so long as the

pitch relationships between notes are correct. However, exactly how the pitch is

extracted from each note and how it is represented in the auditory system remain

unclear, despite many decades of intense research.

1. Pitch of Pure Tones

Pure tones produce a clear, unambiguous pitch, and we are very sensitive to

changes in their frequency. For instance, well-trained listeners can distinguish

between two tones with frequencies of 1000 and 1002 Hz—a difference of only

0.2% (Moore, 1973). A semitone, the smallest step in the Western scale system,

is a difference of about 6%, or about a factor of 30 greater than the JND of

frequency for pure tones. Perhaps not surprisingly, musicians are generally better

than nonmusicians at discriminating small changes in frequency; what is moresurprising is that it does not take much practice for people with no musical train-

ing to “catch up” with musicians in terms of their performance. In a recent study,

frequency discrimination abilities of trained classical musicians were compared

with those of untrained listeners with no musical background, using both pure

tones and complex tones (Micheyl, Delhommeau, Perrot, & Oxenham, 2006).

Initially thresholds were about a factor of 6 worse for the untrained listeners.



10/33

However, it took only between 4 and 8 hours of practice for the thresholds of the

untrained listeners to match those of the trained musicians, whereas the trained

musicians did not improve with practice. This suggests that most people are able

to discriminate very fine differences in frequency with very little in the way of specialized training.

Two representations of a pure tone at 440 Hz (the orchestral A) are shown in

Figure 3. The upper panel shows the waveform—variations in sound pressure as a

function of time—that repeats 440 times a second, and so has a period of 1/440 s,

or about 2.27 ms. The lower panel provides the spectral representation, showing

that the sound has energy only at 440 Hz. This spectral representation is for an

“ideal” pure tone—one that has no beginning or end. In practice, spectral energy

spreads above and below the frequency of the pure tone, reflecting the effects of

onset and offset. These two representations (spectral and temporal) provide a goodintroduction to two ways in which pure tones are represented in the peripheral

auditory system.

The first potential code, known as the “place” code, reflects the mechanical fil-

tering that takes place in the cochlea of the inner ear. The basilar membrane, which

runs the length of the fluid-filled cochlea from the base to the apex, vibrates in

1

0.80.6

0.4

0.2

0

–0.2

–0.4

–0.6

–0.8

–10 2 4 6

Time (ms)

P r e s s u r e ( a r b i t r a r y u n i t s )

8 10 12

1

0.8

0.6

0.4

0.2

00 1000 2000 3000

Frequency (Hz)

M a g n i t u d e ( a r b i t r a r y u n i t s )

4000 5000

Figure 3 Schematic diagram

of the time waveform (upperpanel) and power spectrum

(lower panel) of a pure tone

with a frequency of 440 Hz.

10 Andrew J. Oxenham


11/33


12/33

considerably worse when the low-frequency temporal information was presented to

the “wrong” place in the cochlea, suggesting that place information is important.

In light of this mixed evidence, it may be safest to assume that the auditory sys-

tem uses both place and timing information from the auditory nerve in order toextract the pitch of pure tones. Indeed some theories of pitch explicitly require both

accurate place and timing information (Loeb, White, & Merzenich, 1983). Gaining

a better understanding of how the information is extracted remains an important

research goal. The question is of particular clinical relevance, as deficits in pitch

perception are a common complaint of people with hearing loss and people with

cochlear implants. A clearer understanding of how the brain uses information from

the cochlea will help researchers to improve the way in which auditory prostheses,

such as hearing aids and cochlear implants, present sound to their users.

2. Pitch of Complex Tones

A large majority of musical sounds are complex tones of one form or another, and

most have a pitch associated with them. Most common are harmonic complex

tones, which are composed of the F0 (corresponding to the repetition rate of the

entire waveform) and upper partials, harmonics, or overtones, spaced at integer

multiples of the F0. The pitch of a harmonic complex tone usually corresponds to

the F0. In other words, if a subject is asked to match the pitch of a complex tone to

the pitch of a single pure tone, the best match usually occurs when the frequencyof the pure tone is the same as the F0 of the complex tone. Interestingly, this is

true even when the complex tone has no energy at the F0 or the F0 is masked

(de Boer, 1956; Licklider, 1951; Schouten, 1940; Seebeck, 1841). This phenome-

non has been given various terms, including pitch of the missing fundamental, peri-

odicity pitch, residue pitch, and virtual pitch. The ability of the auditory system to

extract the F0 of a sound is important from the perspective of perceptual constancy:

imagine a violin note being played in a quiet room and then again in a room with a

noisy air-conditioning system. The low-frequency noise of the air-conditioning sys-

tem might well mask some of the lower-frequency energy of the violin, includingthe F0, but we would not expect the pitch (or identity) of the violin to change

because of it.

Although the ability to extract the periodicity pitch is clearly an important one,

and one that is shared by many different species (Shofner, 2005), exactly how the

auditory system extracts the F0 remains for the most part unknown. The initial

stages in processing a harmonic complex tone are shown in Figure 4. The upper

two panels show the time waveform and the spectral representation of a harmonic

complex tone. The third panel depicts the filtering that occurs in the cochlea—each

point along the basilar membrane can be represented as a band-pass filter thatresponds to only those frequencies close to its center frequency. The fourth panel

shows the “excitation pattern” produced by the sound. This is the average response

of the bank of band-pass filters, plotted as a function of the filters’ center frequency

(Glasberg & Moore, 1990). The fifth panel shows an excerpt of the time waveform

at the output of some of the filters along the array. This is an approximation of the



13/33

0

–10

–20

E x c i t a t i o n ( d B )

B M v

i b r a t i o n

T i m e ( m s )

0

2

4

6

8

1 0

1 2

–30

–400 1000 2000 3000 4000

Center frequency (Hz)

5000 6000 7000 8000

0

–10

–20

R e s p o n s e ( d B )

–30

–400 1000 2000 3000 4000

Frequency (Hz)

5000 6000 7000 8000

0

–10

–20

L e v e l ( d B )

–30

–400 1000 2000 3000 4000

Frequency (Hz)

5000 6000 7000 8000

2

1

0

–1

–20

Spectrum

Auditory filterbank

Excitation pattern

Time waveform

2 4 6

Time (ms)

P r e

s s u r e ( a r b i t r a r y u n i t s )

8 10 12

Figure 4 Representations of a harmonic complex tone with a fundamental frequency (F0)

of 440 Hz. The upper panel shows the time waveform. The second panel shows the power

spectrum of the same waveform. The third panel shows the auditory filter bank, representing

the filtering that occurs in the cochlea. The fourth panel shows the excitation pattern, or the

time-averaged output of the filter bank. The fifth panel shows some sample time waveforms

at the output of the filter bank, including filters centered at the F0 and the fourth harmonic,

illustrating resolved harmonics, and filters centered at the 8th and 12th harmonic of the

complex, illustrating harmonics that are less well resolved and show amplitude modulations

at a rate corresponding to the F0.


14/33

waveform that drives the inner hair cells in the cochlea, which in turn synapse with

the auditory nerve fibers to produce the spike trains that the brain must interpret.

Considering the lower two panels of Figure 4, it is possible to see a transition

as one moves from the low-numbered harmonics on the left to the high-numbered harmonics on the right: The first few harmonics generate distinct peaks

in the excitation pattern, because the filters in that frequency region are narrower

than the spacing between successive harmonics. Note also that the time waveforms

at the outputs of filters centered at the low-numbered harmonics resemble pure

tones. At higher harmonic numbers, the bandwidths of the auditory filters become

wider than the spacing between successive harmonics, and so individual peaks in

the excitation pattern are lost. Similarly, the time waveform at the output of higher-

frequency filters no longer resembles a pure tone, but instead reflects the interac-

tion of multiple harmonics, producing a complex waveform that repeats at a ratecorresponding to the F0.

Harmonics that produce distinct peaks in the excitation pattern and/or produce

quasi-sinusoidal vibrations on the basilar membrane are referred to as being

“resolved.” Phenomenologically, resolved harmonics are those that can be “heard

out” as separate tones under certain circumstances. Typically, we do not hear the

individual harmonics when we listen to a musical tone, but our attention can be

drawn to them in various ways, for instance by amplifying them or by switching

them on and off while the other harmonics remain continuous (e.g., Bernstein &

Oxenham, 2003; Hartmann & Goupell, 2006). The ability to resolve or hear outindividual low-numbered harmonics as pure tones was already noted by Hermann

von Helmholtz in his classic work, On the Sensations of Tone Perception

(Helmholtz, 1885/1954).

The higher-numbered harmonics, which do not produce individual peaks of

excitation and cannot typically be heard out, are often referred to as being “unre-

solved.” The transition between resolved and unresolved harmonics is thought to

lie somewhere between the 5th and 10th harmonic, depending on various factors,

such as the F0 and the relative amplitudes of the components, as well as on how

resolvability is defined (e.g., Bernstein & Oxenham, 2003; Houtsma &Smurzynski, 1990; Moore & Gockel, 2011; Shackleton & Carlyon, 1994).

Numerous theories and models have been devised to explain how pitch is extracted

from the information present in the auditory periphery (de Cheveigné, 2005). As with

pure tones, the theories can be divided into two basic categories—place and temporal

theories. The place theories generally propose that the auditory system uses the

lower-order, resolved harmonics to calculate the pitch (e.g., Cohen, Grossberg, &

Wyse, 1995; Goldstein, 1973; Terhardt, 1974b; Wightman, 1973). This could be

achieved by way of a template-matching process, with either “hard-wired” harmonic

templates or templates that develop through repeated exposure to harmonic series,which eventually become associated with the F0. Temporal theories typically involve

evaluating the time intervals between auditory-nerve spikes, using a form of autocor-

relation or all-interval spike histogram (Cariani & Delgutte, 1996; Licklider, 1951;

Meddis & Hewitt, 1991; Meddis & O’Mard, 1997; Schouten, Ritsma, & Cardozo,

1962). This information can be obtained from both resolved and unresolved harmonics.



15/33

Pooling these spikes from across the nerve array results in a dominant interval

emerging that corresponds to the period of the waveform (i.e., the reciprocal of the

F0). A third alternative involves using both place and temporal information. In one

version, coincident timing between neurons with harmonically related CFs is postu-lated to lead to a spatial network of coincidence detectors—a place-based template

that emerges through coincident timing information (Shamma & Klein, 2000). In

another version, the impulse-response time of the auditory filters, which depends on

the CF, is postulated to determine the range of periodicities that a certain tonotopic

location can code (de Cheveigné & Pressnitzer, 2006). Recent physiological studies

have supported at the least the plausibility of place-time mechanisms to code pitch

(Cedolin & Delgutte, 2010).

Distinguishing between place and temporal (or place-time) models of pitch has

proved very difficult. In part, this is because spectral and temporal representationsof a signal are mathematically equivalent: any change in the spectral representation

will automatically lead to a change in the temporal representation, and vice versa.

Psychoacoustic attempts to distinguish between place and temporal mechanisms

have focused on the limits imposed by the peripheral physiology in the cochlea and

auditory nerve. For instance, the limits of frequency selectivity can be used to test

the place theory: if all harmonics are clearly unresolved (and therefore providing

no place information) and a pitch is still heard, then pitch cannot depend solely on

place information. Similarly, the putative limits of phase-locking can be used: if

the periodicity of the waveform and the frequencies of all the resolved harmonicsare all above the limit of phase locking in the auditory nerve and a pitch is still

heard, then temporal information is unlikely to be necessary for pitch perception.

A number of studies have shown that pitch perception is possible even when

harmonic tone complexes are filtered to remove all the low-numbered, resolved

harmonics (Bernstein & Oxenham, 2003; Houtsma & Smurzynski, 1990;

Kaernbach & Bering, 2001; Shackleton & Carlyon, 1994). A similar conclusion

was reached by studies that used amplitude-modulated broadband noise, which has

no spectral peaks in its long-term spectrum (Burns & Viemeister, 1976, 1981).

These results suggest that pitch can be extracted from temporal information alone,thereby ruling out theories that consider only place coding. However, the pitch sen-

sation produced by unresolved harmonics or modulated noise is relatively weak

compared with the pitch of musical instruments, which produce full harmonic

complex tones.

The more salient pitch that we normally associate with music is provided by

the lower-numbered resolved harmonics. Studies that have investigated the

relative contributions of individual harmonics have found that harmonics 3 to 5

(Moore, Glasberg, & Peters, 1985), or frequencies around 600 Hz (Dai, 2000),

seem to have the most influence on the pitch of the overall complex. This is wherecurrent temporal models also encounter some difficulty: they are able to extract the

F0 of a complex tone as well from unresolved harmonics as from resolved harmo-

nics, and therefore they do not predict the large difference in pitch salience and

accuracy between low- and high-numbered harmonics that is observed in psycho-

physical studies (Carlyon, 1998). In other words, place models do not predict good



16/33

enough performance with unresolved harmonics, whereas temporal models predict

performance that is too good. The apparently qualitative and quantitative difference

in the pitch produced by low-numbered and high-numbered harmonics has led to the

suggestion that there may be two pitch mechanisms at work, one to code the tem-poral envelope repetition rate from high-numbered harmonics and one to code the

F0 from the individual low-numbered harmonics (Carlyon & Shackleton, 1994),

although subsequent work has questioned some of the evidence proposed for the two

mechanisms (Gockel, Carlyon, & Plack, 2004; Micheyl & Oxenham, 2003).

The fact that low-numbered, resolved harmonics are important suggests that

place coding may play a role in everyday pitch. Further evidence comes from a

variety of studies. The study mentioned earlier that used tones with low-frequency

temporal information transposed into a high-frequency range (Oxenham et al.,

2004) studied complex-tone pitch perception by transposing the information fromharmonics 3, 4, and 5 of a 100-Hz F0 to high-frequency regions of the cochlea—

roughly 4 kHz, 6 kHz, and 10 kHz. If temporal information was sufficient to elicit

a periodicity pitch, then listeners should have been able to hear a pitch correspond-

ing to 100 Hz. In fact, none of the listeners reported hearing a low pitch or was

able to match the pitch of the transposed tones to that of the missing fundamental.

This suggests that, if temporal information is used, it may need to be presented to

the “correct” place along the cochlea.

Another line of evidence has come from revisiting early conclusions that no

pitch is heard when all the harmonics are above about 5 kHz (Ritsma, 1962). Theinitial finding led researchers to suggest that timing information was crucial and

that at frequencies above the limits of phase locking, periodicity pitch was not per-

ceived. A recent study revisited this conclusion and found that, in fact, listeners

were well able to hear pitches between 1 and 2 kHz, even when all the harmonics

were filtered to be above 6 kHz, and were sufficiently resolved to ensure that no

temporal envelope cues were available (Oxenham et al., 2011). This outcome leads

to an interesting dissociation: tones above 6 kHz on their own do not produce a

musically useful pitch; however, those same tones when combined with others in a

harmonic series can produce a musical pitch sufficient to convey a melody. Theresults suggest that the upper limit of musical pitch may not in fact be explained by

the upper limit of phase locking: the fact that pitch can be heard even when all

tones are above 5 kHz suggests either that temporal information is not necessary

for musical pitch or that usable phase locking in the human auditory nerve extends

to much higher frequencies than currently believed (Heinz, Colburn, & Carney,

2001; Moore & Sęk, 2009).

A further line of evidence for the importance of place information has come from

studies that have investigated the relationship between pitch accuracy and auditory

filter bandwidths. Moore and Peters (1992) investigated the relationship betweenauditory filter bandwidths, measured using spectral masking techniques (Glasberg &

Moore, 1990), pure-tone frequency discrimination, and complex-tone F0 discrimi-

nation in young and elderly people with normal and impaired hearing. People

with hearing impairments were tested because they often have auditory filter band-

widths that are broader than normal. A wide range of results were found—some



17/33

participants with normal filter bandwidths showed impaired pure-tone and

complex-tone pitch discrimination thresholds; others with abnormally wide filters

still had relatively normal pure-tone pitch discrimination thresholds. However,

none of the participants with broadened auditory filters had normal F0 discrimina-tion thresholds, suggesting that perhaps broader filters resulted in fewer or no

resolved harmonics and that resolved harmonics are necessary for accurate F0 dis-

crimination. This question was pursued later by Bernstein and Oxenham (2006a,

2006b), who systematically increased the lowest harmonic present in a harmonic

complex tone and measured the point at which F0 discrimination thresholds wors-

ened. In normal-hearing listeners, there is quite an abrupt transition from good

to poor pitch discrimination as the lowest harmonic present is increased from the

9th to the 12th (Houtsma & Smurzynski, 1990). Bernstein and Oxenham reasoned

that if the transition point is related to frequency selectivity and the resolvability of the harmonics, then the transition point should decrease to lower harmonic numbers

as the auditory filters become wider. They tested this in hearing-impaired listeners

and found a significant correlation between the transition point and the estimated

bandwidth of the auditory filters (Bernstein & Oxenham, 2006b), suggesting that

harmonics may need to be resolved in order to elicit a strong musical pitch.

Interestingly, even though resolved harmonics may be necessary for accurate pitch

perception, they may not be sufficient . Bernstein and Oxenham (2003) increased

the number of resolved harmonics available to listeners by presenting alternating

harmonics to opposite ears. In this way, the spacing between successive compo-nents in each ear was doubled, thereby doubling the number of peripherally

resolved harmonics. Listeners were able to hear out about twice as many harmonics

in this new condition, but that did not improve their pitch discrimination thresholds

for the complex tone. In other words, providing access to harmonics that are

not normally resolved does not improve pitch perception abilities. These results are

consistent with theories that rely on pitch templates. If harmonics are not normally

available to the auditory system, they would be unlikely to be incorporated

into templates and so would not be expected to contribute to the pitch percept

when presented by artificial means, such as presenting them to alternate ears.Most sounds in our world, including those produced by musical instruments,

tend to have more energy at low frequencies than at high; on average, spectral

amplitude decreases at a rate of about 1/ f , or -6 dB/octave. It therefore makes sense

that the auditory system would rely on the lower numbered harmonics to determine

pitch, as these are the ones that are most likely to be audible. Also, resolved harmo-

nics—ones that produce a peak in the excitation pattern and elicit a sinusoidal tem-

poral response—are much less susceptible to the effects of room reverberation than

are unresolved harmonics. Pitch discrimination thresholds for unresolved harmonics

are relatively good (B

2%) when all the components have the same starting phase(as in a stream of pulses). However, thresholds are much worse when the phase

relationships are scrambled, as they would be in a reverberant hall or church, and

listeners’ discrimination thresholds can be as poor as 10%—more than a musical

semitone. In contrast, the response to resolved harmonics is not materially affected

by reverberation: changing the starting phase of a single sinusoid does not affect its



18/33

waveshape—it still remains a sinusoid, with frequency discriminations thresholds

of considerably less than 1%.

A number of physiological and neuroimaging studies have searched for represen-

tations of pitch beyond the cochlea (Winter, 2005). Potential correlates of periodicityhave been found in single- and multi-unit studies of the cochlear nucleus (Winter,

Wiegrebe, & Patterson, 2001), in the inferior colliculus (Langner & Schreiner,

1988), and auditory cortex (Bendor & Wang, 2005). Human neuroimaging studies

have also found correlates of periodicity in the brainstem (Griffiths, Uppenkamp,

Johnsrude, Josephs, & Patterson, 2001) as well as in auditory cortical structures

(Griffiths, Buchel, Frackowiak, & Patterson, 1998). More recently, Penagos,

Melcher, and Oxenham (2004) identified a region in human auditory cortex that

seemed sensitive to the degree of pitch salience, as opposed to physical parameters,

such as F0 or spectral region. However, these studies are also not without some con-troversy. For instance, Hall and Plack (2009) failed to find any single region in the

human auditory cortex that responded to pitch, independent of other stimulus para-

meters. Similarly, in a physiological study of the ferret’s auditory cortex, Bizley,

Walker, Silverman, King, and Schnupp (2009) found interdependent coding of pitch,

timbre, and spatial location and did not find any pitch-specific region.

In summary, the pitch of single harmonic complex tones is determined primarily

by the first 5 to 8 harmonics, which are also those thought to be resolved in the

peripheral auditory system. To extract the pitch, the auditory system must somehow

combine and synthesize information from these harmonics. Exactly how this occursin the auditory system remains a matter of ongoing research.

C. Timbre

The official ANSI definition of timbre is: “That attribute of auditory sensation

which enables a listener to judge that two nonidentical sounds, similarly presented

and having the same loudness and pitch, are dissimilar” (ANSI, 1994). The stan-

dard goes on to note that timbre depends primarily on the frequency spectrum of the sound, but can also depend on the sound pressure and temporal characteristics.

In other words, anything that is not pitch or loudness is timbre. As timbre has its

own chapter in this volume (Chapter 2), it will not be discussed further here.

However, timbre makes an appearance in the next section, where its influence on

pitch and loudness judgments is addressed.

D. Sensory Interactions and Cross-Modal Influences

The auditory sensations of loudness, pitch, and timbre are for the most part studied

independently. Nevertheless, a sizeable body of evidence suggests that these sen-

sory dimensions are not strictly independent. Furthermore, other sensory modali-

ties, in particular vision, can have sizeable effects on auditory judgments of

musical sounds.



19/33

1. Pitch and Timbre Interactions

Pitch and timbre are the two dimensions most likely to be confused, particularly bypeople without any musical training. Increasing the F0 of the complex tone results in

an increase in pitch, whereas changing the spectral center of gravity of tone increases

its brightness—one aspect of timbre (Figure 5). In both cases, when asked to describe

the change, many listeners would simply say that the sound was “higher.”

In general, listeners find it hard to ignore changes in timbre when making pitch

judgments. Numerous studies have shown that the JND for F0 increases when

the two sounds to be compared also vary in spectral content (e.g., Borchert,

Micheyl, & Oxenham, 2011; Faulkner, 1985; Moore & Glasberg, 1990). In principle,

this could be because the change in spectral shape actually affects pitch or becauselisteners have difficulty ignoring timbre changes and concentrating solely on pitch.

Studies using pitch matching have generally found that harmonic complex tones are

best matched with a pure-tone frequency corresponding to the F0, regardless of

the spectral content of the complex tone (e.g., Patterson, 1973), which means that the

detrimental effects of differing timbre may be related more to a “distraction” effect

than to a genuine change in pitch (Moore & Glasberg, 1990).

2. Effects of Pitch or Timbre Changes on the Accuracy of Loudness

Judgments

Just as listeners have more difficulty judging pitch in the face of varying timbre,

loudness comparisons between two sounds become much more challenging when

either the pitch or timbre of the two sounds differs. Examples include the difficulty

of making loudness comparisons between two pure tones of different frequency

High F0, High spectral peakHigh F0, Low spectral peak

Low F0, High spectral peak

Frequency

Low F0, Low spectral peak I n c r e a s i n g p i t c h

Increasing brightness

L e v e l ( d B )

Figure 5 Representations of F0 and spectral peak, which primarily affect the sensations of

pitch and timbre, respectively.



20/33

(Gabriel et al., 1997; Oxenham & Buus, 2000), and the difficulty of making loud-

ness comparisons between tones of differing duration, even when they have the

same frequency (Florentine, Buus, & Robinson, 1998).

3. Visual Influences on Auditory Attributes

As anyone who has watched a virtuoso musician will know, visual input affects the

aesthetic experience of the audience. More direct influences of vision on auditory

sensations, and vice versa, have also been reported in recent years. For instance,

noise that is presented simultaneously with a light tends to be rated as louder than

noise presented without light (Odgaard, Arieh, & Marks, 2004). Interestingly, this

effect appears to be sensory in nature, rather than a “late-stage” decisional effect,

or shift in criterion; in contrast, similar effects of noise on the apparent brightness

of light (Stein, London, Wilkinson, & Price, 1996) seem to stem from higher-level

decisional and criterion-setting mechanisms (Odgaard, Arieh, & Marks, 2003).

On the other hand, recent combinations of behavioral and neuroimaging techniques

have suggested that the combination of sound with light can result in increased sen-

sitivity to low-level light, which is reflected in changes in activation of sensory cor-

tices (Noesselt et al., 2010).

Visual cues can also affect other attributes of sound. For instance, Schutz and

colleagues (Schutz & Kubovy, 2009; Schutz & Lipscomb, 2007) have shown that

the gestures made in musical performance can affect the perceived duration of a

musical sound: a short or “staccato” gesture by a marimba player led to shorter

judged durations of the tone than a long gesture by the player, even though the

tone itself was identical. Interestingly, this did not hold for sustained sounds, such

as a clarinet, where visual information had much less impact on duration judg-

ments. The difference may relate to the exponential decay of percussive sounds,

which have no clearly defined end, allowing the listeners to shift their criterion for

the end point to better match the visual information.

III. Perception of Sound Combinations

A. Object Perception and Grouping

When a musical tone, such as a violin note or a sung vowel, is presented, we normally

hear a single sound with a single pitch, even though the note actually consists of

many different pure tones, each with its own frequency and pitch. This “perceptual

fusion” is partly because all the pure tones begin and end at roughly the same time,

and partly because they form a single harmonic series (Darwin, 2005). The impor-

tance of onset and offset synchrony can be demonstrated by delaying one of thecomponents relative to all the others. A delay of only a few tens of milliseconds is

sufficient for the delayed component to “pop out” and be heard as a separate

object. Similarly, if one component is mistuned compared to the rest of the com-

plex, it will be heard out as a separate object, provided the mistuning is sufficiently

large. For low-numbered harmonics, mistuning a harmonic by between 1 and 3% is



21/33

sufficient for it to “pop out” (Moore, Glasberg, & Peters, 1986). Interestingly, a

mistuned harmonic can be heard separately, but can still contribute to the overall

pitch of the complex; in fact a single mistuned harmonic continues to contribute to

the overall pitch of the complex, even when it is mistuned by as much as 8%—well above the threshold for hearing it out as a separate object (Darwin & Ciocca,

1992; Darwin, Hukin, & al-Khatib, 1995; Moore et al., 1985). This is an example

of a failure of “disjoint allocation”—a single component is not disjointly allocated

to just a single auditory object (Liberman, Isenberg, & Rakerd, 1981; Shinn-

Cunningham, Lee, & Oxenham, 2007).

B. Perceiving Multiple Pitches

How many tones can we hear at once? Considering all the different instruments in

an orchestra, one might expect the number to be quite high, and a well-trained con-

ductor will in many cases be able to hear a wrong note played by a single instru-

ment within that orchestra. But are we aware of all the pitches being presented at

once, and can we count them? Huron (1989) suggested that the number of indepen-

dent “voices” we can perceive and count is actually rather low. Huron (1989) used

sounds of homogenous timbre (organ notes) and played participants sections from a

piece of polyphonic organ music by J. S. Bach with between one and five voices

playing simultaneously. Despite the fact that most of the participants were musi-cally trained, their ability to judge accurately the number of voices present

decreased dramatically when the number of voices actually present exceeded three.

Using much simpler stimuli, consisting of several simultaneous pure tones,

Demany and Ramos (2005) made the interesting discovery that participants could

not tell whether a certain tone was present or absent from the chord, but they

noticed if its frequency was changed in the next presentation. In other words, lis-

teners detected a change in the frequency of a tone that was itself undetected.

Taken together with the results of Huron (1989), the data suggest that the pitches

of many tones can be processed simultaneously, but that listeners may only be con-sciously aware of a subset of between three and four at any one time.

C. The Role of Frequency Selectivity in the Perception of Multiple Tones

1. Roughness

When two pure tones of differing frequency are added, the resulting waveform

fluctuates in amplitude at a rate corresponding to the difference of the two frequen-

cies. These amplitude fluctuations, or “beats,” are illustrated in Figure 6, whichshows how the two tones are sometimes in phase, and add constructively (A), and

sometimes out of phase, and so cancel (B). At beat rates of less than about 10 Hz,

we hear the individual fluctuations, but once the rate increases above about 12 Hz,

we are no longer able to follow the individual fluctuations and instead perceive a

“rough” sound (Daniel & Weber, 1997; Terhardt, 1974a).



22/33


23/33

2. Pitch Perception of Multiple Sounds

Despite the important role of tone combinations or chords in music, relatively few

psychoacoustic studies have examined their perception. Beerends and Houtsma

(1989) used complex tones consisting of just two consecutive harmonics each.

Although the pitch of these two-component complexes is relatively weak, with prac-

tice, listeners can learn to accurately identify the F0 of such complexes. Beerends

and Houtsma found that listeners were able to identify the pitches of the two com-

plex tones, even if the harmonics from one sound were presented to different ears.

The only exception was when all the components were presented to one ear and

none of the four components was deemed to be “resolved.” In that case, listeners

were not able to identify either pitch accurately.

Carlyon (1996) used harmonic tone complexes with more harmonics and filtered

them so that they had completely overlapping spectral envelopes. He found that

when both complexes were composed of resolved harmonics, listeners were able to

hear out the pitch of one complex in the presence of the other. However, the sur-

prising finding was that when both complexes comprised only unresolved harmo-

nics, then listeners did not hear a pitch at all, but described the percept as an

unmusical “crackle.” To avoid ambiguity, Carlyon (1996) used harmonics that

were either highly resolved or highly unresolved. Because of this, it remained

unclear whether it is the resolvability of the harmonics before or after the two

sounds are mixed that determines whether each tone elicits a clear pitch. Micheyl

and colleagues addressed this issue, using a variety of combinations of spectral

region and F0 to vary the relative resolvability of the components (Micheyl,

Bernstein, & Oxenham, 2006; Micheyl, Keebler, & Oxenham, 2010). By compar-

ing the results to simulations of auditory filtering, they found that good pitch dis-

crimination was only possible when at least two of the harmonics from the target

sound were deemed resolved after being mixed with the other sound (Micheyl

et al., 2010). The results are consistent with place theories of pitch that rely on

resolved harmonics; however, it may be possible to adapt timing-based models of

pitch to similarly explain the phenomena (e.g., Bernstein & Oxenham, 2005).

D. Consonance and Dissonance

The question of how certain combinations of tones sound when played together

is central to many aspects of music theory. Combinations of two tones that form

certain musical intervals, such as the octave and the fifth, are typically deemed as

sounding pleasant or consonant, whereas others, such as the augmented fourth (tri-

tone), are often considered unpleasant or dissonant. These types of percepts involv-

ing tones presented in isolation of a musical context have been termed sensoryconsonance or dissonance. The term musical consonance (Terhardt, 1976, 1984)

subsumes sensory factors, but also includes many other factors that contribute to

whether a sound combination is judged as consonant or dissonant, including the

context (what sounds preceded it), the style of music (e.g., jazz or classical), and

presumably also the personal taste and musical history of the individual listener.



24/33

There has been a long-standing search for acoustic and physiological correlates

of consonance and dissonance, going back to the observations of Pythagoras that

strings whose lengths had a small-number ratio relationship (e.g., 2:1 or 3:2)

sounded pleasant together. Helmholtz (1885/1954) suggested that consonance maybe related to the absence of beats (perceived as roughness) in musical sounds.

Plomp and Levelt (1965) developed the idea further by showing that the ranking by

consonance of musical intervals within an octave was well predicted by the number

of component pairs within the two complex tones that fell within the same auditory

filters and therefore caused audible beats (see also Kameoka & Kuriyagawa,

1969a, 1969b). When two complex tones form a consonant interval, such as an

octave or a fifth, the harmonics are either exactly coincident, and so do not produce

beats, or are spaced so far apart as to not produce strong beats. In contrast, when

the tones form a dissonant interval, such as a minor second, none of the compo-nents are coincident, but many are close enough to produce beats.

Another alternative theory of consonance is based on the “harmonicity” of the

sound combination, or how closely it resembles a single harmonic series. Consider,

for instance, two complex tones that form the interval of a perfect fifth, with F0s of

440 and 660 Hz. All the components from both tones are multiples of a single

F0—220 Hz—and so, according to the harmonicity account of consonance, should

sound consonant. In contrast, the harmonics of two tones that form an augmented

fourth, with F0s of 440 Hz and 622 Hz, do not approximate any single harmonic

series within the range of audible pitches and so should sound dissonant, as foundempirically. The harmonicity theory of consonance can be implemented by using a

spectral template model (Terhardt, 1974b) or by using temporal information,

derived for instance from spikes in the auditory nerve (Tramo, Cariani, Delgutte, &

Braida, 2001).

Because the beating and harmonicity theories of consonance and dissonance pro-

duce very similar predictions, it has been difficult to distinguish between them

experimentally. A recent study took a step toward this goal by examining individ-

ual differences in a large group (.200) of participants (McDermott, Lehr, &

Oxenham, 2010). First, listeners were asked to provide preference ratings for “diag-nostic” stimuli that varied in beating but not harmonicity, or vice versa. Next,

listeners were asked to provide preference ratings for various musical sound

combinations, including dyads (two-note chords) and triads (three-note chords),

using natural and artificial musical instruments and voices. When the ratings in the

two types of tasks were compared, the correlations between the ratings for the har-

monicity diagnostic tests and the musical sounds were significant, but the correla-

tions between the ratings for the beating diagnostic tests and the musical sounds

were not. Interestingly, the number of years of formal musical training also corre-

lated with both the harmonicity and musical preference ratings, but not with thebeating ratings. Overall, the results suggested that harmonicity, rather than lack of

beating, underlies listeners’ consonance preferences and that musical training may

amplify the preference for harmonic relationships.

Developmental studies have shown that infants as young as 3 or 4 months show

a preference for consonant over dissonant musical intervals (Trainor & Heinmiller,



25/33

1998; Zentner & Kagan, 1996, 1998). However, it is not yet known whether infants

are responding more to beats or inharmonicity, or both. It would be interesting to

discover whether the adult preferences for harmonicity revealed by McDermott

et al. (2010) are shared by infants, or whether infants initially base their preferenceson acoustic beats.

IV. Conclusions and Outlook

Although the perception of musical tones should be considered primarily in musical

contexts, much about the interactions between acoustics, auditory physiology, and

perception can be learned through psychoacoustic experiments using relativelysimple stimuli and procedures. Recent findings using psychoacoustics, alone or in

combination with neurophysiology and neuroimaging, have extended our knowl-

edge of how pitch, timbre, and loudness are perceived and represented neurally,

both for tones in isolation and in combination. However, much still remains to be

discovered. Important trends include the use of more naturalistic stimuli in experi-

ments and for testing computational models of perception, as well as the simulta-

neous combination of perceptual and neural measures when attempting to elucidate

the underlying neural mechanisms of auditory perception. Using the building

blocks provided by the psychoacoustics of individual and simultaneous musicaltones, it is possible to proceed to answering much more sophisticated questions

regarding the perception of music as it unfolds over time. These and other issues

are tackled in the remaining chapters of this volume.

Acknowledgments

Emily Allen, Christophe Micheyl, and John Oxenham provided helpful comments on an

earlier version of this chapter. The work from the author’s laboratory is supported by fundingfrom the National Institutes of Health (Grants R01 DC 05216 and R01 DC 07657).

References

American National Standards Institute. (1994). Acoustical terminology. ANSI S1.1-1994.

New York, NY: Author.

Arieh, Y., & Marks, L. E. (2003a). Recalibrating the auditory system: A speed-accuracy

analysis of intensity perception. Journal of Experimental Psychology: Human

Perception and Performance, 29, 523536.

Arieh, Y., & Marks, L. E. (2003b). Time course of loudness recalibration: Implications for

loudness enhancement. Journal of the Acoustical Society of America, 114, 15501556.

Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical

scaling. American Journal of Psychology, 84, 147166.



26/33

Beerends, J. G., & Houtsma, A. J. M. (1989). Pitch identification of simultaneous diotic and

dichotic two-tone complexes. Journal of the Acoustical Society of America, 85,

813819.

Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditorycortex. Nature, 436 , 11611165.

Bernstein, J. G., & Oxenham, A. J. (2003). Pitch discrimination of diotic and dichotic tone

complexes: Harmonic resolvability or harmonic number? Journal of the Acoustical

Society of America, 113, 33233334.

Bernstein, J. G., & Oxenham, A. J. (2005). An autocorrelation model with place dependence

to account for the effect of harmonic number on fundamental frequency discrimination.

Journal of the Acoustical Society of America, 117 , 38163831.

Bernstein, J. G., & Oxenham, A. J. (2006a). The relationship between frequency selectivity

and pitch discrimination: Effects of stimulus level. Journal of the Acoustical Society of

America, 120, 3916

3928.Bernstein, J. G., & Oxenham, A. J. (2006b). The relationship between frequency selectivity

and pitch discrimination: Sensorineural hearing loss. Journal of the Acoustical Society

of America, 120, 39293945.

Bizley, J. K., Walker, K. M., Silverman, B. W., King, A. J., & Schnupp, J. W. (2009).

Interdependent encoding of pitch, timbre, and spatial location in auditory cortex.

Journal of Neuroscience, 29, 20642075.

Borchert, E. M., Micheyl, C., & Oxenham, A. J. (2011). Perceptual grouping affects pitch

judgments across time and frequency. Journal of Experimental Psychology: Human

Perception and Performance, 37 , 257269.

Burns, E. M., & Viemeister, N. F. (1976). Nonspectral pitch. Journal of the AcousticalSociety of America, 60, 863869.

Burns, E. M., & Viemeister, N. F. (1981). Played again SAM: Further observations on the

pitch of amplitude-modulated noise. Journal of the Acoustical Society of America, 70,

16551660.

Buus, S., Muesch, H., & Florentine, M. (1998). On loudness at threshold. Journal of the

Acoustical Society of America, 104, 399410.

Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones.

I. Pitch and pitch salience. Journal of Neurophysiology, 76 , 16981716.

Carlyon, R. P. (1996). Encoding the fundamental frequency of a complex tone in the pres-

ence of a spectrally overlapping masker. Journal of the Acoustical Society of America,99, 517524.

Carlyon, R. P. (1998). Comments on “A unitary model of pitch perception” [ Journal of the

Acoustical Society of America, 102, 18111820 (1997)]. Journal of the Acoustical


Carlyon, R. P., & Shackleton, T. M. (1994). Comparing the fundamental frequencies of

resolved and unresolved harmonics: Evidence for two pitch mechanisms? Journal of the


Cedolin, L., & Delgutte, B. (2010). Spatiotemporal representation of the pitch of harmonic

complex tones in the auditory nerve. Journal of Neuroscience, 30, 1271212724.

Chalupper, J., & Fastl, H. (2002). Dynamic loudness model (DLM) for normal and hearing-

impaired listeners. Acta Acustica united with Acustica, 88, 378386.

Chen, Z., Hu, G., Glasberg, B. R., & Moore, B. C. (2011). A new method of calculating

auditory excitation patterns and loudness for steady sounds. Hearing Research, 282

(12), 204215.



27/33


28/33

Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from

notched-noise data. Hearing Research, 47 , 103138.

Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying

sounds. Journal of the Audio Engineering Society, 50, 331

341.Gockel, H., Carlyon, R. P., & Plack, C. J. (2004). Across-frequency interference effects in

fundamental frequency discrimination: Questioning evidence for two pitch mechanisms.


Goldstein, J. L. (1973). An optimum processor theory for the central formation of

the pitch of complex tones. Journal of the Acoustical Society of America, 54,

14961516.

Griffiths, T. D., Buchel, C., Frackowiak, R. S., & Patterson, R. D. (1998). Analysis of tem-

poral structure in sound by the human brain. Nature Neuroscience, 1, 422427.

Griffiths, T. D., Uppenkamp, S., Johnsrude, I., Josephs, O., & Patterson, R. D. (2001).

Encoding of the temporal regularity of sound in the human brainstem. Nature Neuroscience, 4, 633637.

Hall, D. A., & Plack, C. J. (2009). Pitch processing sites in the human auditory brain.

Cerebral Cortex, 19, 576585.

Hartmann, W. M., & Goupell, M. J. (2006). Enhancing and unmasking the harmonics of a

complex tone. Journal of the Acoustical Society of America, 120, 21422157.

Heinz, M. G., Colburn, H. S., & Carney, L. H. (2001). Evaluating auditory performance

limits: I. One-parameter discrimination using a computational model for the auditory

nerve. Neural Computation, 13, 22732316.

Hellman, R. P. (1976). Growth of loudness at 1000 and 3000 Hz. Journal of the Acoustical

Society of America, 60, 672

679.Hellman, R. P., & Zwislocki, J. (1964). Loudness function of a 1000-cps tone in the presence

of a masking noise. Journal of the Acoustical Society of America, 36 , 16181627.

Helmholtz, H. L. F. (1885/1954). On the sensations of tone (A. J. Ellis, Trans.). New York,

NY: Dover.

Henning, G. B. (1966). Frequency discrimination of random amplitude tones. Journal of the


Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for complex

tones with many harmonics. Journal of the Acoustical Society of America, 87 , 304310.

Huron, D. (1989). Voice denumerability in polyphonic music of homogenous timbres. Music

Perception, 6 , 361

382.Jesteadt, W., Wier, C. C., & Green, D. M. (1977). Intensity discrimination as a function of

frequency and sensation level. Journal of the Acoustical Society of America, 61,

169177.

Kaernbach, C., & Bering, C. (2001). Exploring the temporal mechanism involved in the

pitch of unresolved harmonics. Journal of the Acoustical Society of America, 110,

10391048.

Kameoka, A., & Kuriyagawa, M. (1969a). Consonance theory part I: Consonance of dyads.

Journal of the Acoustical Society of America, 45, 14511459.

Kameoka, A., & Kuriyagawa, M. (1969b). Consonance theory part II: Consonance of com-

plex tones and its calculation method. Journal of the Acoustical Society of America, 45,

14601469.

Keuss, P. J., & van der Molen, M. W. (1982). Positive and negative effects of stimulus

intensity in auditory reaction tasks: Further studies on immediate arousal. Acta

Psychologica, 52, 6172.



29/33

Kohfeld, D. L. (1971). Simple reaction time as a function of stimulus intensity in decibels of

light and sound. Journal of Experimental Psychology, 88, 251257.

Kohlrausch, A., Fassel, R., & Dau, T. (2000). The influence of carrier level and frequency

on modulation and beat-detection thresholds for sinusoidal carriers. Journal of the Acoustical Society of America, 108, 723734.

Langner, G., & Schreiner, C. E. (1988). Periodicity coding in the inferior colliculus of the

cat. I. Neuronal mechanisms. Journal of Neurophysiology, 60, 17991822.

Liberman, A. M., Isenberg, D., & Rakerd, B. (1981). Duplex perception of cues for stop con-

sonants: Evidence for a phonetic mode. Perception & Psychophysics, 30, 133143.

Licklider, J. C., Webster, J. C., & Hedlun, J. M. (1950). On the frequency limits of binaural

beats. Journal of the Acoustical Society of America, 22, 468473.

Licklider, J. C. R. (1951). A duplex theory of pitch perception. Experientia, 7 , 128133.

Loeb, G. E., White, M. W., & Merzenich, M. M. (1983). Spatial cross correlation: A pro-

posed mechanism for acoustic pitch perception. Biological Cybernetics, 47 , 149

163.Luce, R. D., & Green, D. M. (1972). A neural timing theory for response times and the psy-

chophysics of intensity. Psychological Review, 79, 1457.

Mapes-Riordan, D., & Yost, W. A. (1999). Loudness recalibration as a function of level.


Marks, L. E. (1994). “Recalibrating” the auditory system: The perception of loudness.

Journal of Experimental Psychology: Human Perception and Performance, 20,

382396.

Mauermann, M., Long, G. R., & Kollmeier, B. (2004). Fine structure of hearing threshold and

loudness perception. Journal of the Acoustical Society of America, 116 , 10661080.

McDermott, J. H., Lehr, A. J., & Oxenham, A. J. (2010). Individual differences reveal thebasis of consonance. Current Biology, 20, 10351041.

Meddis, R., & Hewitt, M. (1991). Virtual pitch and phase sensitivity studied of a computer

model of the auditory periphery. I: Pitch identification. Journal of the Acoustical


Meddis, R., & O’Mard, L. (1997). A unitary model of pitch perception. Journal of the


Micheyl, C., Bernstein, J. G., & Oxenham, A. J. (2006). Detection and F0 discrimination of

harmonic complex tones in the presence of competing tones or noise. Journal of the


Micheyl, C., Delhommeau, K., Perrot, X., & Oxenham, A. J. (2006). Influence of musicaland psychoacoustical training on pitch discrimination. Hearing Research, 219,

3647.

Micheyl, C., Keebler, M. V., & Oxenham, A. J. (2010). Pitch perception for mixtures of

spectrally overlapping harmonic complex tones. Journal of the Acoustical Society of

America, 128, 257269.

Micheyl, C., & Oxenham, A. J. (2003). Further tests of the “two pitch mechanisms” hypothe-

sis. Journal of the Acoustical Society of America, 113, 2225.

Miller, G. A. (1956). The magic number seven, plus or minus two: Some limits on our

capacity for processing information. Psychology Review, 63, 8196.

Moore, B. C. J. (1973). Frequency difference limens for short-duration tones. Journal of the


Moore, B. C. J., & Glasberg, B. R. (1990). Frequency discrimination of complex tones with

overlapping and non-overlapping harmonics. Journal of the Acoustical Society of

America, 87 , 21632177.



30/33

Moore, B. C. J., & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model.

Acustica, 82, 335345.

Moore, B. C. J., & Glasberg, B. R. (1997). A model of loudness perception applied to

cochlear hearing loss. Auditory Neuroscience, 3, 289

311.Moore, B. C. J., Glasberg, B. R., & Baer, T. (1997). A model for the prediction of thresholds,

loudness, and partial loudness. Journal of the Audio Engineering Society, 45, 224240.

Moore, B. C. J., Glasberg, B. R., & Peters, R. W. (1985). Relative dominance of individual

partials in determining the pitch of complex tones. Journal of the Acoustical Society of

America, 77 , 18531860.

Moore, B. C. J., Glasberg, B. R., & Peters, R. W. (1986). Thresholds for hearing mistuned

partials as separate tones in harmonic complexes. Journal of the Acoustical Society of

America, 80, 479483.

Moore, B. C. J., Glasberg, B. R., & Vickers, D. A. (1999). Further evaluation of a model of

loudness perception applied to cochlear hearing loss. Journal of the Acoustical Societyof America, 106 , 898907.

Moore, B. C. J., & Gockel, H. E. (2011). Resolvability of components in complex tones and

implications for theories of pitch perception. Hearing Research, 276 , 8897.

Moore, B. C. J., & Peters, R. W. (1992). Pitch discrimination and phase sensitivity in young

and elderly subjects and its relationship to frequency selectivity. Journal of the


Moore, B. C. J., & Sęk, A. (2009). Sensitivity of the human auditory system to temporal fine

structure at high frequencies. Journal of the Acoustical Society of America, 125,

31863193.

Noesselt, T., Tyll, S., Boehler, C. N., Budinger, E., Heinze, H. J., & Driver, J. (2010).Sound-induced enhancement of low-intensity vision: Multisensory influences on human

sensory-specific cortices and thalamic bodies relate to perceptual enhancement of visual

detection sensitivity. Journal of Neuroscience, 30, 1360913623.

Oberfeld, D. (2007). Loudness changes induced by a proximal sound: Loudness enhance-

ment, loudness recalibration, or both? Journal of the Acoustical Society of America,

121, 21372148.

Odgaard, E. C., Arieh, Y., & Marks, L. E. (2003). Cross-modal enhancement of perceived

brightness: Sensory interaction versus response bias. Perception & Psychophysics, 65,

123132.

Odgaard, E. C., Arieh, Y., & Marks, L. E. (2004). Brighter noise: Sensory enhancement of perceived loudness by concurrent visual stimulation. Cognitive, Affective, & Behavioral

Neuroscience, 4, 127132.

Oxenham, A. J., Bernstein, J. G. W., & Penagos, H. (2004). Correct tonotopic representation

is necessary for complex pitch perception. Proceedings of the National Academy of

Sciences USA, 101, 14211425.

Oxenham, A. J., & Buus, S. (2000). Level discrimination of sinusoids as a function of dura-

tion and level for fixed-level, roving-level, and across-frequency conditions. Journal of

the Acoustical Society of America, 107 , 16051614.

Oxenham, A. J., Micheyl, C., Keebler, M. V., Loper, A., & Santurette, S. (2011). Pitch per-

ception beyond the traditional existence region of pitch. Proceedings of the National

Academy of Sciences USA, 108, 76297634.

Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig

and its relation to the receptor potential of inner hair-cells. Hearing Research, 24,

115.



31/33


32/33

Shofner, W. P. (2005). Comparative aspects of pitch perception. In C. J. Plack, A. J. Oxenham,

R. Fay, & A. N. Popper (Eds.), Pitch: Neural coding and perception (pp. 5698). New

York, NY: Springer Verlag.

Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement of perceivedvisual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive

Neuroscience, 8, 497506.

Stevens, S. S. (1957). On the psychophysical law. Psychology Review, 64, 153181.

Suzuki, Y., & Takeshima, H. (2004). Equal-loudness-level contours for pure tones. Journal

of the Acoustical Society of America, 116 , 918933.

Terhardt, E. (1974a). On the perception of periodic sound fluctuations (roughness). Acustica,

30, 201213.

Terhardt, E. (1974b). Pitch, consonance, and harmony. Journal of the Acoustical Society of

America, 55, 10611069.

Terhardt, E. (1976). Psychoakustich begründetes Konzept der musikalischen Konsonanz. Acustica, 36 , 121137.

Terhardt, E. (1984). The concept of musical consonance, a link between music and psycho-

acoustics. Music Perception, 1, 276295.

Trainor, L. J., & Heinmiller, B. M. (1998). The development of evaluative responses to

music: Infants prefer to listen to consonance over dissonance. Infant Behavior and

Development , 21, 7788.

Tramo, M. J., Cariani, P. A., Delgutte, B., & Braida, L. D. (2001). Neurobiological founda-

tions for the theory of harmony in western tonal music. Annals of the New York

Academy of Sciences, 930, 92116.

van de Par, S., & Kohlrausch, A. (1997). A new approach to comparing binaural maskinglevel differences at low and high frequencies. Journal of the Acoustical Society of

America, 101, 16711680.

Verschuure, J., & van Meeteren, A. A. (1975). The effect of intensity on pitch. Acustica, 32,

3344.

Viemeister, N. F. (1983). Auditory intensity discrimination at high frequencies in the pres-

ence of noise. Science, 221, 12061208.

Viemeister, N. F., & Bacon, S. P. (1988). Intensity discrimination, increment detection, and

magnitude estimation for 1-kHz tones. Journal of the Acoustical Society of America, 84,

172178.

Wallace, M. N., Rutkowski, R. G., Shackleton, T. M., & Palmer, A. R. (2000). Phase-lockedresponses to pure tones in guinea pig auditory cortex. Neuroreport , 11, 39893993.

Warren, R. M. (1970). Elimination of biases in loudness judgements for tones. Journal of the


Wightman, F. L. (1973). The pattern-transformation model of pitch. Journal of the


Winckel, F. W. (1962). Optimum acoustic criteria of concert halls for the performance of

classical music. Journal of the Acoustical Society of America, 34, 8186.

Winter, I. M. (2005). The neurophysiology of pitch. In C. J. Plack, A. J. Oxenham, R.

Documents

1 - The Perception of Musical Tones, Pages 1-33