Upload
at
View
222
Download
0
Embed Size (px)
Citation preview
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
1/33
1The Perception of Musical Tones
Andrew J. Oxenham
Department of Psychology, University of Minnesota, Minneapolis
I. Introduction
A. What Are Musical Tones?
The definition of a tone—a periodic sound that elicits a pitch sensation—encompasses
the vast majority of musical sounds. Tones can be either pure—sinusoidal variations
in air pressure at a single frequency—or complex. Complex tones can be divided into
two categories, harmonic and inharmonic. Harmonic complex tones are periodic,
with a repetition rate known as the fundamental frequency (F0), and are composed of
a sum of sinusoids with frequencies that are all integer multiples, or harmonics,
of the F0. Inharmonic complex tones are composed of multiple sinusoids that arenot simple integer multiples of any common F0. Most musical instrumental or
vocal tones are more or less harmonic but some, such as bell chimes, can be
inharmonic.
B. Measuring Perception
The physical attributes of a sound, such as its intensity and spectral content, can be
readily measured with modern technical instrumentation. Measuring the perceptionof sound is a different matter. Gustav Fechner, a 19th-century German scientist,
is credited with founding the field of psychophysics—the attempt to establish a
quantitative relationship between physical variables (e.g., sound intensity and fre-
quency) and the sensations they produce (e.g., loudness and pitch; Fechner, 1860).
The psychophysical techniques that have been developed since Fechner’s time to
tap into our perceptions and sensations (involving hearing, vision, smell, touch, and
taste) can be loosely divided into two categories of measures, subjective and objec-
tive. The subjective measures typically require participants to estimate or produce
magnitudes or ratios that relate to the dimension under study. For instance, inestablishing a loudness scale, participants may be presented with a series of tones
at different intensities and then asked to assign a number to each tone, correspond-
ing to its loudness. This method of magnitude estimation thus produces a psycho-
physical function that directly relates loudness to sound intensity. Ratio estimation
follows the same principle, except that participants may be presented with two
The Psychology of Music. DOI: http://dx.doi.org/10.1016/B978-0-12-381460-9.00001-8
© 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/B978-0-12-381460-9.00001-8http://dx.doi.org/10.1016/B978-0-12-381460-9.00001-8
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
2/33
sounds and then asked to judge how much louder (e.g., twice or three times) one
sound is than the other. The complementary methods are magnitude production and
ratio production. In these production techniques, the participants are required to
vary the relevant physical dimension of a sound until it matches a given magnitude(number), or until it matches a specific ratio with respect to a reference sound.
In the latter case, the instructions may be something like “adjust the level of the
second sound until it is twice as loud as the first sound.” All four techniques have
been employed numerous times in attempts to derive appropriate psychophysical
scales (e.g., Buus, Muesch, & Florentine, 1998; Hellman, 1976; Hellman &
Zwislocki, 1964; Stevens, 1957; Warren, 1970). Other variations on these methods
include categorical scaling and cross-modality matching. Categorical scaling involves
asking participants to assign the auditory sensation to one of a number of fixed
categories; following our loudness example, participants might be asked to select acategory ranging from very quiet to very loud (e.g., Mauermann, Long, & Kollmeier,
2004). Cross-modality matching avoids the use of numbers by, for instance, asking
participants to adjust the length of a line, or a piece of string, to match the perceived
loudness of a tone (e.g., Epstein & Florentine, 2005). Although all these methods
have the advantage of providing a more-or-less direct estimate of the relationship
between the physical stimulus and the sensation, they have a number of disadvan-
tages also. First, they are subjective and rely on introspection on the part of the
subject. Perhaps because of this they can be somewhat unreliable, variable across
and within participants, and prone to various biases (e.g., Poulton, 1977).The other approach is to use an objective measure, where a right and wrong
answer can be verified externally. This approach usually involves probing the limits
of resolution of the sensory system, by measuring absolute threshold (the smallest
detectable stimulus), relative threshold (the smallest detectable change in a stimulus),
or masked threshold (the smallest detectable stimulus in the presence of another
stimulus). There are various ways of measuring threshold, but most involve a forced-
choice procedure, where the subject has to pick the interval that contains the target
sound from a selection of two or more. For instance, in an experiment measuring
absolute threshold, the subject might be presented with two successive time intervals,marked by lights; the target sound is played during one of the intervals, and the
subject has to decide which one it was. One would expect performance to change
with the intensity of the sound: at very low intensities, the sound will be completely
inaudible, and so performance will be at chance (50% correct in a two-interval task);
at very high intensities, the sound will always be clearly audible, so performance will
be near 100%, assuming that the subject continues to pay attention. A psychometric
function can then be derived, which plots the performance of a subject as a function
of the stimulus parameter. An example of a psychometric function is shown in
Figure 1, which plots percent correct as a function of sound pressure level. This typeof forced-choice paradigm is usually preferable (although often more time-consuming)
than more subjective measures, such as the method of limits, which is often used today
to measure audiograms. In the method of limits, the intensity of a sound is decreased
until the subject reports no longer being able to hear it, and then the intensity
of the sound is increased until the subject again reports being able to hear it.
2 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
3/33
The trouble with such measures is that they rely not just on sensitivity but also on
criterion—how willing the subject is to report having heard a sound if he or she is
not sure. A forced-choice procedure eliminates that problem by forcing participants
to guess, even if they are unsure which interval contained the target sound. Clearly,
testing the perceptual limits by measuring thresholds does not tell us everything
about human auditory perception; a primary concern is that these measures are typi-
cally indirect—the finding that people can detect less than a 1% change in frequency
does not tell us much about the perception of much larger musical intervals, such as
an octave. Nevertheless it has proved extremely useful in helping us to gain a deeperunderstanding of perception and its relation to the underlying physiology of the
ear and brain.
Measures of reaction time, or response time (RT), have also been used to probe
sensory processing. The two basic forms of response time are simple response time
(SRT), where participants are instructed to respond as quickly as possible by push-
ing a single button once a stimulus is presented, and choice response time (CRT),
where participants have to categorize the stimulus (usually into one of two catego-
ries) before responding (by pressing button 1 or 2).
Although RT measures are more common in cognitive tasks, they also dependon some basic sound attributes, such as sound intensity, with higher intensity
sounds eliciting faster reactions, measured using both SRTs (Kohfeld, 1971;
Luce & Green, 1972) and CRTs (Keuss & van der Molen, 1982).
Finally, measures of perception are not limited to the quantitative or numerical
domain. It is also possible to ask participants to describe their percepts in words.
This approach has clear applications when dealing with multidimensional attributes,
such as timbre (see below, and Chapter 2 of this volume), but also has some inherent
difficulties, as different people may use descriptive words in different ways.
To sum up, measuring perception is a thorny issue that has many solutions, allwith their own advantages and shortcomings. Perceptual measures remain a crucial
“systems-level” analysis tool that can be combined in both human and animal stud-
ies with various physiological and neuroimaging techniques, to help us discover
more about how the ears and brain process musical sounds in ways that elicit
music’s powerful cognitive and emotional effects.
100
90
80
70
60
50
–5 0 5
Signal level (dB SPL)
P e r c e n t c o r r e c t
10 15
Figure 1 A schematic example of
a psychometric function, plotting
percent correct in a two-alternative
forced-choice task against thesound pressure level of a test tone.
31. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
4/33
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
5/33
broadband sounds remains roughly constant when expressed as a ratio or in deci-
bels is in line with the well-known Weber’s law, which states that the JND between
two stimuli is proportional to the magnitude of the stimuli.
In contrast to our ability to judge differences in sound level between two soundspresented one after another, our ability to categorize or label sound levels is rather
poor. In line with Miller’s (1956) famous “7 plus or minus 2” postulate for infor-
mation processing and categorization, our ability to categorize sound levels accu-
rately is fairly limited and is subject to a variety of influences, such as the context
of the preceding sounds. This may explain why the musical notation of loudness
(in contrast to pitch) has relatively few categories between pianissimo and
fortissimo—typically just six ( pp, p, mp, mf , f , and ff ).
2. Equal Loudness Contours and the Loudness Weighting Curves
There is no direct relationship between the physical sound level (in dB SPL) and
the sensation of loudness. There are many reasons for this, but an important one is
that loudness depends heavily on the frequency content of the sound. Figure 2
shows what are known as equal loudness contours. The basic concept is that two
pure tones with different frequencies, but with levels that fall on the same loudness
contour, have the same loudness. For instance, as shown in Figure 2, a pure tone
with a frequency of 1 kHz and a level of 40 dB SPL has the same loudness as a
pure tone with a frequency of 100 Hz and a level of about 64 dB SPL; in other words,
a 100-Hz tone has to be 24 dB higher in level than a 40-dB SPL 1-kHz tone in order
130
100 phons
90
80
70
60
50
40
30
20
10
Hearing threshold
120
110
100
90
80
70
60
50
40
30
20
10
0
–1016 31,5 63 125 250 500 1000
Frequency in Hz
S o u n d p r e s s u r e l e v e l i n d B
2000 4000 8000 16000
Figure 2 The equal-loudness contours, taken from ISO 226:2003.
Original figure kindly provided by Brian C. J. Moore.
51. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
6/33
to be perceived as being equally loud. The equal loudness contours are incorporated
into an international standard (ISO 226) that was initially established in 1961 and was
last revised in 2003.
These equal loudness contours have been derived several times from painstakingpsychophysical measurements, not always with identical outcomes (Fletcher &
Munson, 1933; Robinson & Dadson, 1956; Suzuki & Takeshima, 2004). The mea-
surements typically involve either loudness matching, where a subject adjusts the
level of one tone until it sounds as loud as a second tone, or loudness comparisons,
where a subject compares the loudness of many pairs of tones and the results are
compiled to derive points of subjective equality (PSE). Both methods are highly
susceptible to nonsensory biases, making the task of deriving a definitive set of
equal loudness contours a challenging one (Gabriel, Kollmeier, & Mellert, 1997).
The equal loudness contours provide the basis for the measure of “loudnesslevel,” which has units of “phons.” The phon value of a sound is the dB SPL value
of a 1-kHz tone that is judged to have the same loudness as the sound. So, by defi-
nition, a 40-dB SPL tone at 1 kHz has a loudness level of 40 phons. Continuing the
preceding example, the 100-Hz tone at a level of about 64 dB SPL also has a loud-
ness level of 40 phons, because it falls on the same equal loudness contour as the
40-dB SPL 1-kHz tone. Thus, the equal loudness contours can also be termed the
equal phon contours.
Although the actual measurements are difficult, and the results somewhat conten-
tious, there are many practical uses for the equal loudness contours. For instance, inissues of community noise annoyance from rock concerts or airports, it is more use-
ful to know about the perceived loudness of the sounds in question, rather than just
their physical level. For this reason, an approximation of the 40-phon equal loudness
contour is built into most modern sound level meters and is referred to as the
“A-weighted” curve. A sound level that is quoted in dB (A) is an overall sound level
that has been filtered with the inverse of the approximate 40-phon curve. This means
that very low and very high frequencies, which are perceived as being less loud, are
given less weight than the middle of the frequency range.
As with all useful tools, the A-weighted curve can be misused. Because it isbased on the 40-phon curve, it is most suitable for low-level sounds; however, that
has not prevented it from being used in measurements of much higher-level sounds,
where a flatter filter would be more appropriate, such as that provided by the
much-less-used C-weighted curve. The ubiquitous use of the dB (A) scale for all
levels of sound therefore provides an example of a case where the convenience of a
single-number measure (and one that minimizes the impact of difficult-to-control
low frequencies) has outweighed the desire for accuracy.
3. Loudness Scales
Equal loudness contours and phons tell us about the relationship between loudness
and frequency. They do not, however, tell us about the relationship between loud-
ness and sound level. For instance, the phon, based as it is on the decibel scale at
1 kHz, says nothing about how much louder a 60-dB SPL tone is than a 30-dB
6 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
7/33
SPL tone. The answer, according to numerous studies of loudness, is not twice as
loud. There have been numerous attempts since Fechner’s day to relate the physical
sound level to loudness. Fechner (1860), building on Weber’s law, reasoned that if
JNDs were constant on a logarithmic scale, and if equal numbers of JNDs reflectedan equal change in loudness, then loudness must be related logarithmically to sound
intensity. Harvard psychophysicist S. S. Stevens disagreed, claiming that JNDs
reflected “noise” in the auditory system, which did not provide direct insight into
the function relating loudness to sound intensity (Stevens, 1957). Stevens’s
approach was to use magnitude and ratio estimation and production techniques, as
described in Section I of this chapter, to derive a relationship between loudness and
sound intensity. He concluded that loudness ( L ) was related to sound intensity ( I )
by a power law:
L 5 kI α (Eq. 1)
where the exponent, α, has a value of about 0.3 at medium frequencies and for
moderate and higher sound levels. This law implies that a 10-dB increase in level
results in a doubling of loudness. At low levels, and at lower frequencies, the expo-
nent is typically larger, leading to a steeper growth-of-loudness function. Stevens
used this relationship to derive loudness units, called “sones.” By definition, 1 sone
is the loudness of a 1-kHz tone presented at a level of 40 dB SPL; 2 sones is twice
as loud, corresponding roughly to a 1-kHz tone presented at 50 dB SPL, and 4sones corresponds to the same tone at about 60 dB SPL.
Numerous studies have supported the basic conclusion that loudness can be
related to sound intensity by a power law. However, in part because of the variability
of loudness judgments, and the substantial effects of experimental methodology
(Poulton, 1979), different researchers have found different values for the best-fitting
exponent. For instance, Warren (1970) argued that presenting participants with sev-
eral sounds to judge invariably results in bias. He therefore presented each subject
with only one trial. Based on these single-trial judgments, Warren also derived a
power law, but he found an exponent value of 0.5. This exponent value is what onemight expect if the loudness of sound were proportional to its distance from the
receiver, leading to a 6-dB decrease in level for every doubling of distance. Yet
another study, which tried to avoid bias effects by using the entire (100-dB) level
range within each experiment, derived an exponent of only 0.1, implying a doubling
of loudness for every 30-dB increase in sound level (Viemeister & Bacon, 1988).
Overall, it is generally well accepted that the relationship between loudness and
sound intensity can be approximated as a power law, although methodological issues
and intersubject and intrasubject variability have made it difficult to derive a defini-
tive and uncontroversial function relating the sensation to the physical variable.
4. Partial Loudness and Context Effects
Most sounds that we encounter, particularly in music, are accompanied by other
sounds. This fact makes it important to understand how the loudness of a sound is
71. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
8/33
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
9/33
(Moore & Glasberg, 1997), and others have been extended to explain the loudness
of sounds that fluctuate over time (Chalupper & Fastl, 2002; Glasberg & Moore,
2002). However, none has yet attempted to incorporate context effects, such as
loudness recalibration or loudness enhancement.
B. Pitch
Pitch is arguably the most important dimension for conveying music. Sequences of
pitches form a melody, and simultaneous combinations of pitches form harmony—
two foundations of Western music. There is a vast body of literature devoted to
pitch research, from both perceptual and neural perspectives (Plack, Oxenham,
Popper, & Fay, 2005). The clearest physical correlate of pitch is the periodicity, or
repetition rate, of sound, although other dimensions, such as sound intensity, canhave small effects (e.g., Verschuure & van Meeteren, 1975). For young people
with normal hearing, pure tones with frequencies between about 20 Hz and 20 kHz
are audible. However, only sounds with repetition rates between about 30 Hz and
5 kHz elicit a pitch percept that can be called musical and is strong enough to carry
a melody (e.g., Attneave & Olson, 1971; Pressnitzer, Patterson, & Krumbholz,
2001; Ritsma, 1962). Perhaps not surprisingly, these limits, which were determined
through psychoacoustical investigation, correspond quite well to the lower and
upper limits of pitch found on musical instruments: the lowest and highest notes of
a modern grand piano, which covers the ranges of all standard orchestral instru-ments, correspond to 27.5 Hz and 4186 Hz, respectively.
We tend to recognize patterns of pitches that form melodies (see Chapter 7 of
this volume). We do this presumably by recognizing the musical intervals between
successive notes (see Chapters 4 and 7 of this volume), and most of us seem rela-
tively insensitive to the absolute pitch values of the individual note, so long as the
pitch relationships between notes are correct. However, exactly how the pitch is
extracted from each note and how it is represented in the auditory system remain
unclear, despite many decades of intense research.
1. Pitch of Pure Tones
Pure tones produce a clear, unambiguous pitch, and we are very sensitive to
changes in their frequency. For instance, well-trained listeners can distinguish
between two tones with frequencies of 1000 and 1002 Hz—a difference of only
0.2% (Moore, 1973). A semitone, the smallest step in the Western scale system,
is a difference of about 6%, or about a factor of 30 greater than the JND of
frequency for pure tones. Perhaps not surprisingly, musicians are generally better
than nonmusicians at discriminating small changes in frequency; what is moresurprising is that it does not take much practice for people with no musical train-
ing to “catch up” with musicians in terms of their performance. In a recent study,
frequency discrimination abilities of trained classical musicians were compared
with those of untrained listeners with no musical background, using both pure
tones and complex tones (Micheyl, Delhommeau, Perrot, & Oxenham, 2006).
Initially thresholds were about a factor of 6 worse for the untrained listeners.
91. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
10/33
However, it took only between 4 and 8 hours of practice for the thresholds of the
untrained listeners to match those of the trained musicians, whereas the trained
musicians did not improve with practice. This suggests that most people are able
to discriminate very fine differences in frequency with very little in the way of specialized training.
Two representations of a pure tone at 440 Hz (the orchestral A) are shown in
Figure 3. The upper panel shows the waveform—variations in sound pressure as a
function of time—that repeats 440 times a second, and so has a period of 1/440 s,
or about 2.27 ms. The lower panel provides the spectral representation, showing
that the sound has energy only at 440 Hz. This spectral representation is for an
“ideal” pure tone—one that has no beginning or end. In practice, spectral energy
spreads above and below the frequency of the pure tone, reflecting the effects of
onset and offset. These two representations (spectral and temporal) provide a goodintroduction to two ways in which pure tones are represented in the peripheral
auditory system.
The first potential code, known as the “place” code, reflects the mechanical fil-
tering that takes place in the cochlea of the inner ear. The basilar membrane, which
runs the length of the fluid-filled cochlea from the base to the apex, vibrates in
1
0.80.6
0.4
0.2
0
–0.2
–0.4
–0.6
–0.8
–10 2 4 6
Time (ms)
P r e s s u r e ( a r b i t r a r y u n i t s )
8 10 12
1
0.8
0.6
0.4
0.2
00 1000 2000 3000
Frequency (Hz)
M a g n i t u d e ( a r b i t r a r y u n i t s )
4000 5000
Figure 3 Schematic diagram
of the time waveform (upperpanel) and power spectrum
(lower panel) of a pure tone
with a frequency of 440 Hz.
10 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
11/33
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
12/33
considerably worse when the low-frequency temporal information was presented to
the “wrong” place in the cochlea, suggesting that place information is important.
In light of this mixed evidence, it may be safest to assume that the auditory sys-
tem uses both place and timing information from the auditory nerve in order toextract the pitch of pure tones. Indeed some theories of pitch explicitly require both
accurate place and timing information (Loeb, White, & Merzenich, 1983). Gaining
a better understanding of how the information is extracted remains an important
research goal. The question is of particular clinical relevance, as deficits in pitch
perception are a common complaint of people with hearing loss and people with
cochlear implants. A clearer understanding of how the brain uses information from
the cochlea will help researchers to improve the way in which auditory prostheses,
such as hearing aids and cochlear implants, present sound to their users.
2. Pitch of Complex Tones
A large majority of musical sounds are complex tones of one form or another, and
most have a pitch associated with them. Most common are harmonic complex
tones, which are composed of the F0 (corresponding to the repetition rate of the
entire waveform) and upper partials, harmonics, or overtones, spaced at integer
multiples of the F0. The pitch of a harmonic complex tone usually corresponds to
the F0. In other words, if a subject is asked to match the pitch of a complex tone to
the pitch of a single pure tone, the best match usually occurs when the frequencyof the pure tone is the same as the F0 of the complex tone. Interestingly, this is
true even when the complex tone has no energy at the F0 or the F0 is masked
(de Boer, 1956; Licklider, 1951; Schouten, 1940; Seebeck, 1841). This phenome-
non has been given various terms, including pitch of the missing fundamental, peri-
odicity pitch, residue pitch, and virtual pitch. The ability of the auditory system to
extract the F0 of a sound is important from the perspective of perceptual constancy:
imagine a violin note being played in a quiet room and then again in a room with a
noisy air-conditioning system. The low-frequency noise of the air-conditioning sys-
tem might well mask some of the lower-frequency energy of the violin, includingthe F0, but we would not expect the pitch (or identity) of the violin to change
because of it.
Although the ability to extract the periodicity pitch is clearly an important one,
and one that is shared by many different species (Shofner, 2005), exactly how the
auditory system extracts the F0 remains for the most part unknown. The initial
stages in processing a harmonic complex tone are shown in Figure 4. The upper
two panels show the time waveform and the spectral representation of a harmonic
complex tone. The third panel depicts the filtering that occurs in the cochlea—each
point along the basilar membrane can be represented as a band-pass filter thatresponds to only those frequencies close to its center frequency. The fourth panel
shows the “excitation pattern” produced by the sound. This is the average response
of the bank of band-pass filters, plotted as a function of the filters’ center frequency
(Glasberg & Moore, 1990). The fifth panel shows an excerpt of the time waveform
at the output of some of the filters along the array. This is an approximation of the
12 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
13/33
0
–10
–20
E x c i t a t i o n ( d B )
B M v
i b r a t i o n
T i m e ( m s )
0
2
4
6
8
1 0
1 2
–30
–400 1000 2000 3000 4000
Center frequency (Hz)
5000 6000 7000 8000
0
–10
–20
R e s p o n s e ( d B )
–30
–400 1000 2000 3000 4000
Frequency (Hz)
5000 6000 7000 8000
0
–10
–20
L e v e l ( d B )
–30
–400 1000 2000 3000 4000
Frequency (Hz)
5000 6000 7000 8000
2
1
0
–1
–20
Spectrum
Auditory filterbank
Excitation pattern
Time waveform
2 4 6
Time (ms)
P r e
s s u r e ( a r b i t r a r y u n i t s )
8 10 12
Figure 4 Representations of a harmonic complex tone with a fundamental frequency (F0)
of 440 Hz. The upper panel shows the time waveform. The second panel shows the power
spectrum of the same waveform. The third panel shows the auditory filter bank, representing
the filtering that occurs in the cochlea. The fourth panel shows the excitation pattern, or the
time-averaged output of the filter bank. The fifth panel shows some sample time waveforms
at the output of the filter bank, including filters centered at the F0 and the fourth harmonic,
illustrating resolved harmonics, and filters centered at the 8th and 12th harmonic of the
complex, illustrating harmonics that are less well resolved and show amplitude modulations
at a rate corresponding to the F0.
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
14/33
waveform that drives the inner hair cells in the cochlea, which in turn synapse with
the auditory nerve fibers to produce the spike trains that the brain must interpret.
Considering the lower two panels of Figure 4, it is possible to see a transition
as one moves from the low-numbered harmonics on the left to the high-numbered harmonics on the right: The first few harmonics generate distinct peaks
in the excitation pattern, because the filters in that frequency region are narrower
than the spacing between successive harmonics. Note also that the time waveforms
at the outputs of filters centered at the low-numbered harmonics resemble pure
tones. At higher harmonic numbers, the bandwidths of the auditory filters become
wider than the spacing between successive harmonics, and so individual peaks in
the excitation pattern are lost. Similarly, the time waveform at the output of higher-
frequency filters no longer resembles a pure tone, but instead reflects the interac-
tion of multiple harmonics, producing a complex waveform that repeats at a ratecorresponding to the F0.
Harmonics that produce distinct peaks in the excitation pattern and/or produce
quasi-sinusoidal vibrations on the basilar membrane are referred to as being
“resolved.” Phenomenologically, resolved harmonics are those that can be “heard
out” as separate tones under certain circumstances. Typically, we do not hear the
individual harmonics when we listen to a musical tone, but our attention can be
drawn to them in various ways, for instance by amplifying them or by switching
them on and off while the other harmonics remain continuous (e.g., Bernstein &
Oxenham, 2003; Hartmann & Goupell, 2006). The ability to resolve or hear outindividual low-numbered harmonics as pure tones was already noted by Hermann
von Helmholtz in his classic work, On the Sensations of Tone Perception
(Helmholtz, 1885/1954).
The higher-numbered harmonics, which do not produce individual peaks of
excitation and cannot typically be heard out, are often referred to as being “unre-
solved.” The transition between resolved and unresolved harmonics is thought to
lie somewhere between the 5th and 10th harmonic, depending on various factors,
such as the F0 and the relative amplitudes of the components, as well as on how
resolvability is defined (e.g., Bernstein & Oxenham, 2003; Houtsma &Smurzynski, 1990; Moore & Gockel, 2011; Shackleton & Carlyon, 1994).
Numerous theories and models have been devised to explain how pitch is extracted
from the information present in the auditory periphery (de Cheveigné, 2005). As with
pure tones, the theories can be divided into two basic categories—place and temporal
theories. The place theories generally propose that the auditory system uses the
lower-order, resolved harmonics to calculate the pitch (e.g., Cohen, Grossberg, &
Wyse, 1995; Goldstein, 1973; Terhardt, 1974b; Wightman, 1973). This could be
achieved by way of a template-matching process, with either “hard-wired” harmonic
templates or templates that develop through repeated exposure to harmonic series,which eventually become associated with the F0. Temporal theories typically involve
evaluating the time intervals between auditory-nerve spikes, using a form of autocor-
relation or all-interval spike histogram (Cariani & Delgutte, 1996; Licklider, 1951;
Meddis & Hewitt, 1991; Meddis & O’Mard, 1997; Schouten, Ritsma, & Cardozo,
1962). This information can be obtained from both resolved and unresolved harmonics.
14 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
15/33
Pooling these spikes from across the nerve array results in a dominant interval
emerging that corresponds to the period of the waveform (i.e., the reciprocal of the
F0). A third alternative involves using both place and temporal information. In one
version, coincident timing between neurons with harmonically related CFs is postu-lated to lead to a spatial network of coincidence detectors—a place-based template
that emerges through coincident timing information (Shamma & Klein, 2000). In
another version, the impulse-response time of the auditory filters, which depends on
the CF, is postulated to determine the range of periodicities that a certain tonotopic
location can code (de Cheveigné & Pressnitzer, 2006). Recent physiological studies
have supported at the least the plausibility of place-time mechanisms to code pitch
(Cedolin & Delgutte, 2010).
Distinguishing between place and temporal (or place-time) models of pitch has
proved very difficult. In part, this is because spectral and temporal representationsof a signal are mathematically equivalent: any change in the spectral representation
will automatically lead to a change in the temporal representation, and vice versa.
Psychoacoustic attempts to distinguish between place and temporal mechanisms
have focused on the limits imposed by the peripheral physiology in the cochlea and
auditory nerve. For instance, the limits of frequency selectivity can be used to test
the place theory: if all harmonics are clearly unresolved (and therefore providing
no place information) and a pitch is still heard, then pitch cannot depend solely on
place information. Similarly, the putative limits of phase-locking can be used: if
the periodicity of the waveform and the frequencies of all the resolved harmonicsare all above the limit of phase locking in the auditory nerve and a pitch is still
heard, then temporal information is unlikely to be necessary for pitch perception.
A number of studies have shown that pitch perception is possible even when
harmonic tone complexes are filtered to remove all the low-numbered, resolved
harmonics (Bernstein & Oxenham, 2003; Houtsma & Smurzynski, 1990;
Kaernbach & Bering, 2001; Shackleton & Carlyon, 1994). A similar conclusion
was reached by studies that used amplitude-modulated broadband noise, which has
no spectral peaks in its long-term spectrum (Burns & Viemeister, 1976, 1981).
These results suggest that pitch can be extracted from temporal information alone,thereby ruling out theories that consider only place coding. However, the pitch sen-
sation produced by unresolved harmonics or modulated noise is relatively weak
compared with the pitch of musical instruments, which produce full harmonic
complex tones.
The more salient pitch that we normally associate with music is provided by
the lower-numbered resolved harmonics. Studies that have investigated the
relative contributions of individual harmonics have found that harmonics 3 to 5
(Moore, Glasberg, & Peters, 1985), or frequencies around 600 Hz (Dai, 2000),
seem to have the most influence on the pitch of the overall complex. This is wherecurrent temporal models also encounter some difficulty: they are able to extract the
F0 of a complex tone as well from unresolved harmonics as from resolved harmo-
nics, and therefore they do not predict the large difference in pitch salience and
accuracy between low- and high-numbered harmonics that is observed in psycho-
physical studies (Carlyon, 1998). In other words, place models do not predict good
151. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
16/33
enough performance with unresolved harmonics, whereas temporal models predict
performance that is too good. The apparently qualitative and quantitative difference
in the pitch produced by low-numbered and high-numbered harmonics has led to the
suggestion that there may be two pitch mechanisms at work, one to code the tem-poral envelope repetition rate from high-numbered harmonics and one to code the
F0 from the individual low-numbered harmonics (Carlyon & Shackleton, 1994),
although subsequent work has questioned some of the evidence proposed for the two
mechanisms (Gockel, Carlyon, & Plack, 2004; Micheyl & Oxenham, 2003).
The fact that low-numbered, resolved harmonics are important suggests that
place coding may play a role in everyday pitch. Further evidence comes from a
variety of studies. The study mentioned earlier that used tones with low-frequency
temporal information transposed into a high-frequency range (Oxenham et al.,
2004) studied complex-tone pitch perception by transposing the information fromharmonics 3, 4, and 5 of a 100-Hz F0 to high-frequency regions of the cochlea—
roughly 4 kHz, 6 kHz, and 10 kHz. If temporal information was sufficient to elicit
a periodicity pitch, then listeners should have been able to hear a pitch correspond-
ing to 100 Hz. In fact, none of the listeners reported hearing a low pitch or was
able to match the pitch of the transposed tones to that of the missing fundamental.
This suggests that, if temporal information is used, it may need to be presented to
the “correct” place along the cochlea.
Another line of evidence has come from revisiting early conclusions that no
pitch is heard when all the harmonics are above about 5 kHz (Ritsma, 1962). Theinitial finding led researchers to suggest that timing information was crucial and
that at frequencies above the limits of phase locking, periodicity pitch was not per-
ceived. A recent study revisited this conclusion and found that, in fact, listeners
were well able to hear pitches between 1 and 2 kHz, even when all the harmonics
were filtered to be above 6 kHz, and were sufficiently resolved to ensure that no
temporal envelope cues were available (Oxenham et al., 2011). This outcome leads
to an interesting dissociation: tones above 6 kHz on their own do not produce a
musically useful pitch; however, those same tones when combined with others in a
harmonic series can produce a musical pitch sufficient to convey a melody. Theresults suggest that the upper limit of musical pitch may not in fact be explained by
the upper limit of phase locking: the fact that pitch can be heard even when all
tones are above 5 kHz suggests either that temporal information is not necessary
for musical pitch or that usable phase locking in the human auditory nerve extends
to much higher frequencies than currently believed (Heinz, Colburn, & Carney,
2001; Moore & Sęk, 2009).
A further line of evidence for the importance of place information has come from
studies that have investigated the relationship between pitch accuracy and auditory
filter bandwidths. Moore and Peters (1992) investigated the relationship betweenauditory filter bandwidths, measured using spectral masking techniques (Glasberg &
Moore, 1990), pure-tone frequency discrimination, and complex-tone F0 discrimi-
nation in young and elderly people with normal and impaired hearing. People
with hearing impairments were tested because they often have auditory filter band-
widths that are broader than normal. A wide range of results were found—some
16 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
17/33
participants with normal filter bandwidths showed impaired pure-tone and
complex-tone pitch discrimination thresholds; others with abnormally wide filters
still had relatively normal pure-tone pitch discrimination thresholds. However,
none of the participants with broadened auditory filters had normal F0 discrimina-tion thresholds, suggesting that perhaps broader filters resulted in fewer or no
resolved harmonics and that resolved harmonics are necessary for accurate F0 dis-
crimination. This question was pursued later by Bernstein and Oxenham (2006a,
2006b), who systematically increased the lowest harmonic present in a harmonic
complex tone and measured the point at which F0 discrimination thresholds wors-
ened. In normal-hearing listeners, there is quite an abrupt transition from good
to poor pitch discrimination as the lowest harmonic present is increased from the
9th to the 12th (Houtsma & Smurzynski, 1990). Bernstein and Oxenham reasoned
that if the transition point is related to frequency selectivity and the resolvability of the harmonics, then the transition point should decrease to lower harmonic numbers
as the auditory filters become wider. They tested this in hearing-impaired listeners
and found a significant correlation between the transition point and the estimated
bandwidth of the auditory filters (Bernstein & Oxenham, 2006b), suggesting that
harmonics may need to be resolved in order to elicit a strong musical pitch.
Interestingly, even though resolved harmonics may be necessary for accurate pitch
perception, they may not be sufficient . Bernstein and Oxenham (2003) increased
the number of resolved harmonics available to listeners by presenting alternating
harmonics to opposite ears. In this way, the spacing between successive compo-nents in each ear was doubled, thereby doubling the number of peripherally
resolved harmonics. Listeners were able to hear out about twice as many harmonics
in this new condition, but that did not improve their pitch discrimination thresholds
for the complex tone. In other words, providing access to harmonics that are
not normally resolved does not improve pitch perception abilities. These results are
consistent with theories that rely on pitch templates. If harmonics are not normally
available to the auditory system, they would be unlikely to be incorporated
into templates and so would not be expected to contribute to the pitch percept
when presented by artificial means, such as presenting them to alternate ears.Most sounds in our world, including those produced by musical instruments,
tend to have more energy at low frequencies than at high; on average, spectral
amplitude decreases at a rate of about 1/ f , or -6 dB/octave. It therefore makes sense
that the auditory system would rely on the lower numbered harmonics to determine
pitch, as these are the ones that are most likely to be audible. Also, resolved harmo-
nics—ones that produce a peak in the excitation pattern and elicit a sinusoidal tem-
poral response—are much less susceptible to the effects of room reverberation than
are unresolved harmonics. Pitch discrimination thresholds for unresolved harmonics
are relatively good (B
2%) when all the components have the same starting phase(as in a stream of pulses). However, thresholds are much worse when the phase
relationships are scrambled, as they would be in a reverberant hall or church, and
listeners’ discrimination thresholds can be as poor as 10%—more than a musical
semitone. In contrast, the response to resolved harmonics is not materially affected
by reverberation: changing the starting phase of a single sinusoid does not affect its
171. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
18/33
waveshape—it still remains a sinusoid, with frequency discriminations thresholds
of considerably less than 1%.
A number of physiological and neuroimaging studies have searched for represen-
tations of pitch beyond the cochlea (Winter, 2005). Potential correlates of periodicityhave been found in single- and multi-unit studies of the cochlear nucleus (Winter,
Wiegrebe, & Patterson, 2001), in the inferior colliculus (Langner & Schreiner,
1988), and auditory cortex (Bendor & Wang, 2005). Human neuroimaging studies
have also found correlates of periodicity in the brainstem (Griffiths, Uppenkamp,
Johnsrude, Josephs, & Patterson, 2001) as well as in auditory cortical structures
(Griffiths, Buchel, Frackowiak, & Patterson, 1998). More recently, Penagos,
Melcher, and Oxenham (2004) identified a region in human auditory cortex that
seemed sensitive to the degree of pitch salience, as opposed to physical parameters,
such as F0 or spectral region. However, these studies are also not without some con-troversy. For instance, Hall and Plack (2009) failed to find any single region in the
human auditory cortex that responded to pitch, independent of other stimulus para-
meters. Similarly, in a physiological study of the ferret’s auditory cortex, Bizley,
Walker, Silverman, King, and Schnupp (2009) found interdependent coding of pitch,
timbre, and spatial location and did not find any pitch-specific region.
In summary, the pitch of single harmonic complex tones is determined primarily
by the first 5 to 8 harmonics, which are also those thought to be resolved in the
peripheral auditory system. To extract the pitch, the auditory system must somehow
combine and synthesize information from these harmonics. Exactly how this occursin the auditory system remains a matter of ongoing research.
C. Timbre
The official ANSI definition of timbre is: “That attribute of auditory sensation
which enables a listener to judge that two nonidentical sounds, similarly presented
and having the same loudness and pitch, are dissimilar” (ANSI, 1994). The stan-
dard goes on to note that timbre depends primarily on the frequency spectrum of the sound, but can also depend on the sound pressure and temporal characteristics.
In other words, anything that is not pitch or loudness is timbre. As timbre has its
own chapter in this volume (Chapter 2), it will not be discussed further here.
However, timbre makes an appearance in the next section, where its influence on
pitch and loudness judgments is addressed.
D. Sensory Interactions and Cross-Modal Influences
The auditory sensations of loudness, pitch, and timbre are for the most part studied
independently. Nevertheless, a sizeable body of evidence suggests that these sen-
sory dimensions are not strictly independent. Furthermore, other sensory modali-
ties, in particular vision, can have sizeable effects on auditory judgments of
musical sounds.
18 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
19/33
1. Pitch and Timbre Interactions
Pitch and timbre are the two dimensions most likely to be confused, particularly bypeople without any musical training. Increasing the F0 of the complex tone results in
an increase in pitch, whereas changing the spectral center of gravity of tone increases
its brightness—one aspect of timbre (Figure 5). In both cases, when asked to describe
the change, many listeners would simply say that the sound was “higher.”
In general, listeners find it hard to ignore changes in timbre when making pitch
judgments. Numerous studies have shown that the JND for F0 increases when
the two sounds to be compared also vary in spectral content (e.g., Borchert,
Micheyl, & Oxenham, 2011; Faulkner, 1985; Moore & Glasberg, 1990). In principle,
this could be because the change in spectral shape actually affects pitch or becauselisteners have difficulty ignoring timbre changes and concentrating solely on pitch.
Studies using pitch matching have generally found that harmonic complex tones are
best matched with a pure-tone frequency corresponding to the F0, regardless of
the spectral content of the complex tone (e.g., Patterson, 1973), which means that the
detrimental effects of differing timbre may be related more to a “distraction” effect
than to a genuine change in pitch (Moore & Glasberg, 1990).
2. Effects of Pitch or Timbre Changes on the Accuracy of Loudness
Judgments
Just as listeners have more difficulty judging pitch in the face of varying timbre,
loudness comparisons between two sounds become much more challenging when
either the pitch or timbre of the two sounds differs. Examples include the difficulty
of making loudness comparisons between two pure tones of different frequency
High F0, High spectral peakHigh F0, Low spectral peak
Low F0, High spectral peak
Frequency
Low F0, Low spectral peak I n c r e a s i n g p i t c h
Increasing brightness
L e v e l ( d B )
Figure 5 Representations of F0 and spectral peak, which primarily affect the sensations of
pitch and timbre, respectively.
191. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
20/33
(Gabriel et al., 1997; Oxenham & Buus, 2000), and the difficulty of making loud-
ness comparisons between tones of differing duration, even when they have the
same frequency (Florentine, Buus, & Robinson, 1998).
3. Visual Influences on Auditory Attributes
As anyone who has watched a virtuoso musician will know, visual input affects the
aesthetic experience of the audience. More direct influences of vision on auditory
sensations, and vice versa, have also been reported in recent years. For instance,
noise that is presented simultaneously with a light tends to be rated as louder than
noise presented without light (Odgaard, Arieh, & Marks, 2004). Interestingly, this
effect appears to be sensory in nature, rather than a “late-stage” decisional effect,
or shift in criterion; in contrast, similar effects of noise on the apparent brightness
of light (Stein, London, Wilkinson, & Price, 1996) seem to stem from higher-level
decisional and criterion-setting mechanisms (Odgaard, Arieh, & Marks, 2003).
On the other hand, recent combinations of behavioral and neuroimaging techniques
have suggested that the combination of sound with light can result in increased sen-
sitivity to low-level light, which is reflected in changes in activation of sensory cor-
tices (Noesselt et al., 2010).
Visual cues can also affect other attributes of sound. For instance, Schutz and
colleagues (Schutz & Kubovy, 2009; Schutz & Lipscomb, 2007) have shown that
the gestures made in musical performance can affect the perceived duration of a
musical sound: a short or “staccato” gesture by a marimba player led to shorter
judged durations of the tone than a long gesture by the player, even though the
tone itself was identical. Interestingly, this did not hold for sustained sounds, such
as a clarinet, where visual information had much less impact on duration judg-
ments. The difference may relate to the exponential decay of percussive sounds,
which have no clearly defined end, allowing the listeners to shift their criterion for
the end point to better match the visual information.
III. Perception of Sound Combinations
A. Object Perception and Grouping
When a musical tone, such as a violin note or a sung vowel, is presented, we normally
hear a single sound with a single pitch, even though the note actually consists of
many different pure tones, each with its own frequency and pitch. This “perceptual
fusion” is partly because all the pure tones begin and end at roughly the same time,
and partly because they form a single harmonic series (Darwin, 2005). The impor-
tance of onset and offset synchrony can be demonstrated by delaying one of thecomponents relative to all the others. A delay of only a few tens of milliseconds is
sufficient for the delayed component to “pop out” and be heard as a separate
object. Similarly, if one component is mistuned compared to the rest of the com-
plex, it will be heard out as a separate object, provided the mistuning is sufficiently
large. For low-numbered harmonics, mistuning a harmonic by between 1 and 3% is
20 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
21/33
sufficient for it to “pop out” (Moore, Glasberg, & Peters, 1986). Interestingly, a
mistuned harmonic can be heard separately, but can still contribute to the overall
pitch of the complex; in fact a single mistuned harmonic continues to contribute to
the overall pitch of the complex, even when it is mistuned by as much as 8%—well above the threshold for hearing it out as a separate object (Darwin & Ciocca,
1992; Darwin, Hukin, & al-Khatib, 1995; Moore et al., 1985). This is an example
of a failure of “disjoint allocation”—a single component is not disjointly allocated
to just a single auditory object (Liberman, Isenberg, & Rakerd, 1981; Shinn-
Cunningham, Lee, & Oxenham, 2007).
B. Perceiving Multiple Pitches
How many tones can we hear at once? Considering all the different instruments in
an orchestra, one might expect the number to be quite high, and a well-trained con-
ductor will in many cases be able to hear a wrong note played by a single instru-
ment within that orchestra. But are we aware of all the pitches being presented at
once, and can we count them? Huron (1989) suggested that the number of indepen-
dent “voices” we can perceive and count is actually rather low. Huron (1989) used
sounds of homogenous timbre (organ notes) and played participants sections from a
piece of polyphonic organ music by J. S. Bach with between one and five voices
playing simultaneously. Despite the fact that most of the participants were musi-cally trained, their ability to judge accurately the number of voices present
decreased dramatically when the number of voices actually present exceeded three.
Using much simpler stimuli, consisting of several simultaneous pure tones,
Demany and Ramos (2005) made the interesting discovery that participants could
not tell whether a certain tone was present or absent from the chord, but they
noticed if its frequency was changed in the next presentation. In other words, lis-
teners detected a change in the frequency of a tone that was itself undetected.
Taken together with the results of Huron (1989), the data suggest that the pitches
of many tones can be processed simultaneously, but that listeners may only be con-sciously aware of a subset of between three and four at any one time.
C. The Role of Frequency Selectivity in the Perception of Multiple Tones
1. Roughness
When two pure tones of differing frequency are added, the resulting waveform
fluctuates in amplitude at a rate corresponding to the difference of the two frequen-
cies. These amplitude fluctuations, or “beats,” are illustrated in Figure 6, whichshows how the two tones are sometimes in phase, and add constructively (A), and
sometimes out of phase, and so cancel (B). At beat rates of less than about 10 Hz,
we hear the individual fluctuations, but once the rate increases above about 12 Hz,
we are no longer able to follow the individual fluctuations and instead perceive a
“rough” sound (Daniel & Weber, 1997; Terhardt, 1974a).
211. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
22/33
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
23/33
2. Pitch Perception of Multiple Sounds
Despite the important role of tone combinations or chords in music, relatively few
psychoacoustic studies have examined their perception. Beerends and Houtsma
(1989) used complex tones consisting of just two consecutive harmonics each.
Although the pitch of these two-component complexes is relatively weak, with prac-
tice, listeners can learn to accurately identify the F0 of such complexes. Beerends
and Houtsma found that listeners were able to identify the pitches of the two com-
plex tones, even if the harmonics from one sound were presented to different ears.
The only exception was when all the components were presented to one ear and
none of the four components was deemed to be “resolved.” In that case, listeners
were not able to identify either pitch accurately.
Carlyon (1996) used harmonic tone complexes with more harmonics and filtered
them so that they had completely overlapping spectral envelopes. He found that
when both complexes were composed of resolved harmonics, listeners were able to
hear out the pitch of one complex in the presence of the other. However, the sur-
prising finding was that when both complexes comprised only unresolved harmo-
nics, then listeners did not hear a pitch at all, but described the percept as an
unmusical “crackle.” To avoid ambiguity, Carlyon (1996) used harmonics that
were either highly resolved or highly unresolved. Because of this, it remained
unclear whether it is the resolvability of the harmonics before or after the two
sounds are mixed that determines whether each tone elicits a clear pitch. Micheyl
and colleagues addressed this issue, using a variety of combinations of spectral
region and F0 to vary the relative resolvability of the components (Micheyl,
Bernstein, & Oxenham, 2006; Micheyl, Keebler, & Oxenham, 2010). By compar-
ing the results to simulations of auditory filtering, they found that good pitch dis-
crimination was only possible when at least two of the harmonics from the target
sound were deemed resolved after being mixed with the other sound (Micheyl
et al., 2010). The results are consistent with place theories of pitch that rely on
resolved harmonics; however, it may be possible to adapt timing-based models of
pitch to similarly explain the phenomena (e.g., Bernstein & Oxenham, 2005).
D. Consonance and Dissonance
The question of how certain combinations of tones sound when played together
is central to many aspects of music theory. Combinations of two tones that form
certain musical intervals, such as the octave and the fifth, are typically deemed as
sounding pleasant or consonant, whereas others, such as the augmented fourth (tri-
tone), are often considered unpleasant or dissonant. These types of percepts involv-
ing tones presented in isolation of a musical context have been termed sensoryconsonance or dissonance. The term musical consonance (Terhardt, 1976, 1984)
subsumes sensory factors, but also includes many other factors that contribute to
whether a sound combination is judged as consonant or dissonant, including the
context (what sounds preceded it), the style of music (e.g., jazz or classical), and
presumably also the personal taste and musical history of the individual listener.
231. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
24/33
There has been a long-standing search for acoustic and physiological correlates
of consonance and dissonance, going back to the observations of Pythagoras that
strings whose lengths had a small-number ratio relationship (e.g., 2:1 or 3:2)
sounded pleasant together. Helmholtz (1885/1954) suggested that consonance maybe related to the absence of beats (perceived as roughness) in musical sounds.
Plomp and Levelt (1965) developed the idea further by showing that the ranking by
consonance of musical intervals within an octave was well predicted by the number
of component pairs within the two complex tones that fell within the same auditory
filters and therefore caused audible beats (see also Kameoka & Kuriyagawa,
1969a, 1969b). When two complex tones form a consonant interval, such as an
octave or a fifth, the harmonics are either exactly coincident, and so do not produce
beats, or are spaced so far apart as to not produce strong beats. In contrast, when
the tones form a dissonant interval, such as a minor second, none of the compo-nents are coincident, but many are close enough to produce beats.
Another alternative theory of consonance is based on the “harmonicity” of the
sound combination, or how closely it resembles a single harmonic series. Consider,
for instance, two complex tones that form the interval of a perfect fifth, with F0s of
440 and 660 Hz. All the components from both tones are multiples of a single
F0—220 Hz—and so, according to the harmonicity account of consonance, should
sound consonant. In contrast, the harmonics of two tones that form an augmented
fourth, with F0s of 440 Hz and 622 Hz, do not approximate any single harmonic
series within the range of audible pitches and so should sound dissonant, as foundempirically. The harmonicity theory of consonance can be implemented by using a
spectral template model (Terhardt, 1974b) or by using temporal information,
derived for instance from spikes in the auditory nerve (Tramo, Cariani, Delgutte, &
Braida, 2001).
Because the beating and harmonicity theories of consonance and dissonance pro-
duce very similar predictions, it has been difficult to distinguish between them
experimentally. A recent study took a step toward this goal by examining individ-
ual differences in a large group (.200) of participants (McDermott, Lehr, &
Oxenham, 2010). First, listeners were asked to provide preference ratings for “diag-nostic” stimuli that varied in beating but not harmonicity, or vice versa. Next,
listeners were asked to provide preference ratings for various musical sound
combinations, including dyads (two-note chords) and triads (three-note chords),
using natural and artificial musical instruments and voices. When the ratings in the
two types of tasks were compared, the correlations between the ratings for the har-
monicity diagnostic tests and the musical sounds were significant, but the correla-
tions between the ratings for the beating diagnostic tests and the musical sounds
were not. Interestingly, the number of years of formal musical training also corre-
lated with both the harmonicity and musical preference ratings, but not with thebeating ratings. Overall, the results suggested that harmonicity, rather than lack of
beating, underlies listeners’ consonance preferences and that musical training may
amplify the preference for harmonic relationships.
Developmental studies have shown that infants as young as 3 or 4 months show
a preference for consonant over dissonant musical intervals (Trainor & Heinmiller,
24 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
25/33
1998; Zentner & Kagan, 1996, 1998). However, it is not yet known whether infants
are responding more to beats or inharmonicity, or both. It would be interesting to
discover whether the adult preferences for harmonicity revealed by McDermott
et al. (2010) are shared by infants, or whether infants initially base their preferenceson acoustic beats.
IV. Conclusions and Outlook
Although the perception of musical tones should be considered primarily in musical
contexts, much about the interactions between acoustics, auditory physiology, and
perception can be learned through psychoacoustic experiments using relativelysimple stimuli and procedures. Recent findings using psychoacoustics, alone or in
combination with neurophysiology and neuroimaging, have extended our knowl-
edge of how pitch, timbre, and loudness are perceived and represented neurally,
both for tones in isolation and in combination. However, much still remains to be
discovered. Important trends include the use of more naturalistic stimuli in experi-
ments and for testing computational models of perception, as well as the simulta-
neous combination of perceptual and neural measures when attempting to elucidate
the underlying neural mechanisms of auditory perception. Using the building
blocks provided by the psychoacoustics of individual and simultaneous musicaltones, it is possible to proceed to answering much more sophisticated questions
regarding the perception of music as it unfolds over time. These and other issues
are tackled in the remaining chapters of this volume.
Acknowledgments
Emily Allen, Christophe Micheyl, and John Oxenham provided helpful comments on an
earlier version of this chapter. The work from the author’s laboratory is supported by fundingfrom the National Institutes of Health (Grants R01 DC 05216 and R01 DC 07657).
References
American National Standards Institute. (1994). Acoustical terminology. ANSI S1.1-1994.
New York, NY: Author.
Arieh, Y., & Marks, L. E. (2003a). Recalibrating the auditory system: A speed-accuracy
analysis of intensity perception. Journal of Experimental Psychology: Human
Perception and Performance, 29, 523536.
Arieh, Y., & Marks, L. E. (2003b). Time course of loudness recalibration: Implications for
loudness enhancement. Journal of the Acoustical Society of America, 114, 15501556.
Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical
scaling. American Journal of Psychology, 84, 147166.
251. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
26/33
Beerends, J. G., & Houtsma, A. J. M. (1989). Pitch identification of simultaneous diotic and
dichotic two-tone complexes. Journal of the Acoustical Society of America, 85,
813819.
Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditorycortex. Nature, 436 , 11611165.
Bernstein, J. G., & Oxenham, A. J. (2003). Pitch discrimination of diotic and dichotic tone
complexes: Harmonic resolvability or harmonic number? Journal of the Acoustical
Society of America, 113, 33233334.
Bernstein, J. G., & Oxenham, A. J. (2005). An autocorrelation model with place dependence
to account for the effect of harmonic number on fundamental frequency discrimination.
Journal of the Acoustical Society of America, 117 , 38163831.
Bernstein, J. G., & Oxenham, A. J. (2006a). The relationship between frequency selectivity
and pitch discrimination: Effects of stimulus level. Journal of the Acoustical Society of
America, 120, 3916
3928.Bernstein, J. G., & Oxenham, A. J. (2006b). The relationship between frequency selectivity
and pitch discrimination: Sensorineural hearing loss. Journal of the Acoustical Society
of America, 120, 39293945.
Bizley, J. K., Walker, K. M., Silverman, B. W., King, A. J., & Schnupp, J. W. (2009).
Interdependent encoding of pitch, timbre, and spatial location in auditory cortex.
Journal of Neuroscience, 29, 20642075.
Borchert, E. M., Micheyl, C., & Oxenham, A. J. (2011). Perceptual grouping affects pitch
judgments across time and frequency. Journal of Experimental Psychology: Human
Perception and Performance, 37 , 257269.
Burns, E. M., & Viemeister, N. F. (1976). Nonspectral pitch. Journal of the AcousticalSociety of America, 60, 863869.
Burns, E. M., & Viemeister, N. F. (1981). Played again SAM: Further observations on the
pitch of amplitude-modulated noise. Journal of the Acoustical Society of America, 70,
16551660.
Buus, S., Muesch, H., & Florentine, M. (1998). On loudness at threshold. Journal of the
Acoustical Society of America, 104, 399410.
Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones.
I. Pitch and pitch salience. Journal of Neurophysiology, 76 , 16981716.
Carlyon, R. P. (1996). Encoding the fundamental frequency of a complex tone in the pres-
ence of a spectrally overlapping masker. Journal of the Acoustical Society of America,99, 517524.
Carlyon, R. P. (1998). Comments on “A unitary model of pitch perception” [ Journal of the
Acoustical Society of America, 102, 18111820 (1997)]. Journal of the Acoustical
Society of America, 104, 11181121.
Carlyon, R. P., & Shackleton, T. M. (1994). Comparing the fundamental frequencies of
resolved and unresolved harmonics: Evidence for two pitch mechanisms? Journal of the
Acoustical Society of America, 95, 35413554.
Cedolin, L., & Delgutte, B. (2010). Spatiotemporal representation of the pitch of harmonic
complex tones in the auditory nerve. Journal of Neuroscience, 30, 1271212724.
Chalupper, J., & Fastl, H. (2002). Dynamic loudness model (DLM) for normal and hearing-
impaired listeners. Acta Acustica united with Acustica, 88, 378386.
Chen, Z., Hu, G., Glasberg, B. R., & Moore, B. C. (2011). A new method of calculating
auditory excitation patterns and loudness for steady sounds. Hearing Research, 282
(12), 204215.
26 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
27/33
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
28/33
Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from
notched-noise data. Hearing Research, 47 , 103138.
Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying
sounds. Journal of the Audio Engineering Society, 50, 331
341.Gockel, H., Carlyon, R. P., & Plack, C. J. (2004). Across-frequency interference effects in
fundamental frequency discrimination: Questioning evidence for two pitch mechanisms.
Journal of the Acoustical Society of America, 116 , 10921104.
Goldstein, J. L. (1973). An optimum processor theory for the central formation of
the pitch of complex tones. Journal of the Acoustical Society of America, 54,
14961516.
Griffiths, T. D., Buchel, C., Frackowiak, R. S., & Patterson, R. D. (1998). Analysis of tem-
poral structure in sound by the human brain. Nature Neuroscience, 1, 422427.
Griffiths, T. D., Uppenkamp, S., Johnsrude, I., Josephs, O., & Patterson, R. D. (2001).
Encoding of the temporal regularity of sound in the human brainstem. Nature Neuroscience, 4, 633637.
Hall, D. A., & Plack, C. J. (2009). Pitch processing sites in the human auditory brain.
Cerebral Cortex, 19, 576585.
Hartmann, W. M., & Goupell, M. J. (2006). Enhancing and unmasking the harmonics of a
complex tone. Journal of the Acoustical Society of America, 120, 21422157.
Heinz, M. G., Colburn, H. S., & Carney, L. H. (2001). Evaluating auditory performance
limits: I. One-parameter discrimination using a computational model for the auditory
nerve. Neural Computation, 13, 22732316.
Hellman, R. P. (1976). Growth of loudness at 1000 and 3000 Hz. Journal of the Acoustical
Society of America, 60, 672
679.Hellman, R. P., & Zwislocki, J. (1964). Loudness function of a 1000-cps tone in the presence
of a masking noise. Journal of the Acoustical Society of America, 36 , 16181627.
Helmholtz, H. L. F. (1885/1954). On the sensations of tone (A. J. Ellis, Trans.). New York,
NY: Dover.
Henning, G. B. (1966). Frequency discrimination of random amplitude tones. Journal of the
Acoustical Society of America, 39, 336339.
Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for complex
tones with many harmonics. Journal of the Acoustical Society of America, 87 , 304310.
Huron, D. (1989). Voice denumerability in polyphonic music of homogenous timbres. Music
Perception, 6 , 361
382.Jesteadt, W., Wier, C. C., & Green, D. M. (1977). Intensity discrimination as a function of
frequency and sensation level. Journal of the Acoustical Society of America, 61,
169177.
Kaernbach, C., & Bering, C. (2001). Exploring the temporal mechanism involved in the
pitch of unresolved harmonics. Journal of the Acoustical Society of America, 110,
10391048.
Kameoka, A., & Kuriyagawa, M. (1969a). Consonance theory part I: Consonance of dyads.
Journal of the Acoustical Society of America, 45, 14511459.
Kameoka, A., & Kuriyagawa, M. (1969b). Consonance theory part II: Consonance of com-
plex tones and its calculation method. Journal of the Acoustical Society of America, 45,
14601469.
Keuss, P. J., & van der Molen, M. W. (1982). Positive and negative effects of stimulus
intensity in auditory reaction tasks: Further studies on immediate arousal. Acta
Psychologica, 52, 6172.
28 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
29/33
Kohfeld, D. L. (1971). Simple reaction time as a function of stimulus intensity in decibels of
light and sound. Journal of Experimental Psychology, 88, 251257.
Kohlrausch, A., Fassel, R., & Dau, T. (2000). The influence of carrier level and frequency
on modulation and beat-detection thresholds for sinusoidal carriers. Journal of the Acoustical Society of America, 108, 723734.
Langner, G., & Schreiner, C. E. (1988). Periodicity coding in the inferior colliculus of the
cat. I. Neuronal mechanisms. Journal of Neurophysiology, 60, 17991822.
Liberman, A. M., Isenberg, D., & Rakerd, B. (1981). Duplex perception of cues for stop con-
sonants: Evidence for a phonetic mode. Perception & Psychophysics, 30, 133143.
Licklider, J. C., Webster, J. C., & Hedlun, J. M. (1950). On the frequency limits of binaural
beats. Journal of the Acoustical Society of America, 22, 468473.
Licklider, J. C. R. (1951). A duplex theory of pitch perception. Experientia, 7 , 128133.
Loeb, G. E., White, M. W., & Merzenich, M. M. (1983). Spatial cross correlation: A pro-
posed mechanism for acoustic pitch perception. Biological Cybernetics, 47 , 149
163.Luce, R. D., & Green, D. M. (1972). A neural timing theory for response times and the psy-
chophysics of intensity. Psychological Review, 79, 1457.
Mapes-Riordan, D., & Yost, W. A. (1999). Loudness recalibration as a function of level.
Journal of the Acoustical Society of America, 106 , 35063511.
Marks, L. E. (1994). “Recalibrating” the auditory system: The perception of loudness.
Journal of Experimental Psychology: Human Perception and Performance, 20,
382396.
Mauermann, M., Long, G. R., & Kollmeier, B. (2004). Fine structure of hearing threshold and
loudness perception. Journal of the Acoustical Society of America, 116 , 10661080.
McDermott, J. H., Lehr, A. J., & Oxenham, A. J. (2010). Individual differences reveal thebasis of consonance. Current Biology, 20, 10351041.
Meddis, R., & Hewitt, M. (1991). Virtual pitch and phase sensitivity studied of a computer
model of the auditory periphery. I: Pitch identification. Journal of the Acoustical
Society of America, 89, 28662882.
Meddis, R., & O’Mard, L. (1997). A unitary model of pitch perception. Journal of the
Acoustical Society of America, 102, 18111820.
Micheyl, C., Bernstein, J. G., & Oxenham, A. J. (2006). Detection and F0 discrimination of
harmonic complex tones in the presence of competing tones or noise. Journal of the
Acoustical Society of America, 120, 14931505.
Micheyl, C., Delhommeau, K., Perrot, X., & Oxenham, A. J. (2006). Influence of musicaland psychoacoustical training on pitch discrimination. Hearing Research, 219,
3647.
Micheyl, C., Keebler, M. V., & Oxenham, A. J. (2010). Pitch perception for mixtures of
spectrally overlapping harmonic complex tones. Journal of the Acoustical Society of
America, 128, 257269.
Micheyl, C., & Oxenham, A. J. (2003). Further tests of the “two pitch mechanisms” hypothe-
sis. Journal of the Acoustical Society of America, 113, 2225.
Miller, G. A. (1956). The magic number seven, plus or minus two: Some limits on our
capacity for processing information. Psychology Review, 63, 8196.
Moore, B. C. J. (1973). Frequency difference limens for short-duration tones. Journal of the
Acoustical Society of America, 54, 610619.
Moore, B. C. J., & Glasberg, B. R. (1990). Frequency discrimination of complex tones with
overlapping and non-overlapping harmonics. Journal of the Acoustical Society of
America, 87 , 21632177.
291. The Perception of Musical Tones
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
30/33
Moore, B. C. J., & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model.
Acustica, 82, 335345.
Moore, B. C. J., & Glasberg, B. R. (1997). A model of loudness perception applied to
cochlear hearing loss. Auditory Neuroscience, 3, 289
311.Moore, B. C. J., Glasberg, B. R., & Baer, T. (1997). A model for the prediction of thresholds,
loudness, and partial loudness. Journal of the Audio Engineering Society, 45, 224240.
Moore, B. C. J., Glasberg, B. R., & Peters, R. W. (1985). Relative dominance of individual
partials in determining the pitch of complex tones. Journal of the Acoustical Society of
America, 77 , 18531860.
Moore, B. C. J., Glasberg, B. R., & Peters, R. W. (1986). Thresholds for hearing mistuned
partials as separate tones in harmonic complexes. Journal of the Acoustical Society of
America, 80, 479483.
Moore, B. C. J., Glasberg, B. R., & Vickers, D. A. (1999). Further evaluation of a model of
loudness perception applied to cochlear hearing loss. Journal of the Acoustical Societyof America, 106 , 898907.
Moore, B. C. J., & Gockel, H. E. (2011). Resolvability of components in complex tones and
implications for theories of pitch perception. Hearing Research, 276 , 8897.
Moore, B. C. J., & Peters, R. W. (1992). Pitch discrimination and phase sensitivity in young
and elderly subjects and its relationship to frequency selectivity. Journal of the
Acoustical Society of America, 91, 28812893.
Moore, B. C. J., & Sęk, A. (2009). Sensitivity of the human auditory system to temporal fine
structure at high frequencies. Journal of the Acoustical Society of America, 125,
31863193.
Noesselt, T., Tyll, S., Boehler, C. N., Budinger, E., Heinze, H. J., & Driver, J. (2010).Sound-induced enhancement of low-intensity vision: Multisensory influences on human
sensory-specific cortices and thalamic bodies relate to perceptual enhancement of visual
detection sensitivity. Journal of Neuroscience, 30, 1360913623.
Oberfeld, D. (2007). Loudness changes induced by a proximal sound: Loudness enhance-
ment, loudness recalibration, or both? Journal of the Acoustical Society of America,
121, 21372148.
Odgaard, E. C., Arieh, Y., & Marks, L. E. (2003). Cross-modal enhancement of perceived
brightness: Sensory interaction versus response bias. Perception & Psychophysics, 65,
123132.
Odgaard, E. C., Arieh, Y., & Marks, L. E. (2004). Brighter noise: Sensory enhancement of perceived loudness by concurrent visual stimulation. Cognitive, Affective, & Behavioral
Neuroscience, 4, 127132.
Oxenham, A. J., Bernstein, J. G. W., & Penagos, H. (2004). Correct tonotopic representation
is necessary for complex pitch perception. Proceedings of the National Academy of
Sciences USA, 101, 14211425.
Oxenham, A. J., & Buus, S. (2000). Level discrimination of sinusoids as a function of dura-
tion and level for fixed-level, roving-level, and across-frequency conditions. Journal of
the Acoustical Society of America, 107 , 16051614.
Oxenham, A. J., Micheyl, C., Keebler, M. V., Loper, A., & Santurette, S. (2011). Pitch per-
ception beyond the traditional existence region of pitch. Proceedings of the National
Academy of Sciences USA, 108, 76297634.
Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig
and its relation to the receptor potential of inner hair-cells. Hearing Research, 24,
115.
30 Andrew J. Oxenham
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
31/33
8/16/2019 1 - The Perception of Musical Tones, Pages 1-33
32/33
Shofner, W. P. (2005). Comparative aspects of pitch perception. In C. J. Plack, A. J. Oxenham,
R. Fay, & A. N. Popper (Eds.), Pitch: Neural coding and perception (pp. 5698). New
York, NY: Springer Verlag.
Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement of perceivedvisual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive
Neuroscience, 8, 497506.
Stevens, S. S. (1957). On the psychophysical law. Psychology Review, 64, 153181.
Suzuki, Y., & Takeshima, H. (2004). Equal-loudness-level contours for pure tones. Journal
of the Acoustical Society of America, 116 , 918933.
Terhardt, E. (1974a). On the perception of periodic sound fluctuations (roughness). Acustica,
30, 201213.
Terhardt, E. (1974b). Pitch, consonance, and harmony. Journal of the Acoustical Society of
America, 55, 10611069.
Terhardt, E. (1976). Psychoakustich begründetes Konzept der musikalischen Konsonanz. Acustica, 36 , 121137.
Terhardt, E. (1984). The concept of musical consonance, a link between music and psycho-
acoustics. Music Perception, 1, 276295.
Trainor, L. J., & Heinmiller, B. M. (1998). The development of evaluative responses to
music: Infants prefer to listen to consonance over dissonance. Infant Behavior and
Development , 21, 7788.
Tramo, M. J., Cariani, P. A., Delgutte, B., & Braida, L. D. (2001). Neurobiological founda-
tions for the theory of harmony in western tonal music. Annals of the New York
Academy of Sciences, 930, 92116.
van de Par, S., & Kohlrausch, A. (1997). A new approach to comparing binaural maskinglevel differences at low and high frequencies. Journal of the Acoustical Society of
America, 101, 16711680.
Verschuure, J., & van Meeteren, A. A. (1975). The effect of intensity on pitch. Acustica, 32,
3344.
Viemeister, N. F. (1983). Auditory intensity discrimination at high frequencies in the pres-
ence of noise. Science, 221, 12061208.
Viemeister, N. F., & Bacon, S. P. (1988). Intensity discrimination, increment detection, and
magnitude estimation for 1-kHz tones. Journal of the Acoustical Society of America, 84,
172178.
Wallace, M. N., Rutkowski, R. G., Shackleton, T. M., & Palmer, A. R. (2000). Phase-lockedresponses to pure tones in guinea pig auditory cortex. Neuroreport , 11, 39893993.
Warren, R. M. (1970). Elimination of biases in loudness judgements for tones. Journal of the
Acoustical Society of America, 48, 13971403.
Wightman, F. L. (1973). The pattern-transformation model of pitch. Journal of the
Acoustical Society of America, 54, 407416.
Winckel, F. W. (1962). Optimum acoustic criteria of concert halls for the performance of
classical music. Journal of the Acoustical Society of America, 34, 8186.
Winter, I. M. (2005). The neurophysiology of pitch. In C. J. Plack, A. J. Oxenham, R.