Upload
lamkhue
View
221
Download
0
Embed Size (px)
Citation preview
CARNEGIE’ MELLONDepartment of Electrical and Computer Engineering~
Synthesis of an Acoustic Guitarwith a Digital String Model and
Linear Prediction
Kevin Bradley
1995
Advisor: Prof. Stonick
Synthesis of an Acoustic Guitar With aDigital String Model and Linear
Prediction
Kevin Bradley
submitted for partial fulfillment of the Master ofScience degree requirements in the
Department of Electrical and Computer Engineering
Carnegie Institute of Technology
Carnegie Mellon University
May 1, 1995
Advisor: V.L StonickReader: R.M. Stern
Introduction
1. Introduction
With the development of electronic synthesizers, much of the focus in music produc-
tion is shifting from the skilled performer, knowledgeable and facile on an instrument, to
the composer aided by an electronic music box. The composer may not be individually
skilled at each instrument, but with a synthesizer, may simultaneously compose and listen
to an entire orchestra at his fingertips. The beauty, quality, and creativity of the musical
composition relies on the capabilities of the synthesizer to create both natural-sounding
instruments, particularly the common acoustic instruments in orchestras and bands, and
innovative new sounds.
Music synthesizers currently implement acoustic guitars using a wide range of tech-
niques, including the creation of artificial spectra, as in FM synthesis; the playback of dig-
itally recorded segments of guitar sounds, as found on sampler-based systems; and
systems based on physical models of the guitar. The term "sampler" in this context refers
to an electronic instrument that digitally records segments of acoustic waveforms and then
can recreate these waveforms at a variety of playback rates, changing both pitch and dura-
tion of the original recorded sound. In contrast, the "sampling" of acoustic waveforms
refers to the process of converting continuous-time signals to discrete-time representa-
tions.
Music synthesis algorithms that are based on models of the physical behavior of musi-
cal instruments attempt to capture the major attributes of the instrument in response to
some ideal input, such as an impulse. Physical models are generally non-linear dynamic
systems and often require large mainframe computers to synthesize sounds in a reason-
able time-frame. Alternative methods have been proposed that use more computationally
efficient models typically based on linear system models. Such models are more computa-
tionally efficient and can be implemented using low-cost DSP-based systems.
egieon
2 of 49
Current Music Synthesis Algorithms
The goal of this Master’s Project is the development and implementation of a computa-
tional model for music synthesis that produces realistic acoustic guitar sounds. The com-
putationally-efflcient, physically-based model of an acoustic guitar developed here has
four parts: a digital string model, based on a digital waveguide filter; the guitar steady-
state response analysis and synthesis; the guitar transient analysis and synthesis; and a
linear IIR filter that models the impulse response of the guitar body. Model parameters--
including note frequency and feedback decay rate--and guitar characteristics--including
the guitar body impulse response and note pluck point--are determined from automated
analysis of plucked guitar strings and "thumped" guitar body sampled at 44.1 kHz with 16
bits of quantization, the CD standard. The analysis and synthesis of acoustic guitar
sounds using this model does not require special-purpose hardware and can be imple-
mented with relative simplicity and low cost in software.
In this report, a brief overview of current synthesizer practices and technology is pre-
sented in section 2, and the physical characteristics of an acoustic guitar are discussed in
section 3. Linear filter models for digital strings are developed in section 4, and the com-
plete guitar model is presented in section 5. Methods for estimating string model parame-
ters from sampled data are developed in section 6, followed by a discussion of linear body
modeling in section 7. A complete analysis of a sampled string is presented in section 8.
Section 9 discusses how the model parameters affect the resulting string sound. Finally, a
rough computational expense analysis of the complete synthesis system is presented in
section 10.
2. Current Music Synthesis Algorithms
Electronic music synthesis has had a long, variegated history. In the early 1900’s elec-
tronic organs produced sound by spinning disks at various rates with electric motors. In
later years, the development of the vacuum tube amplifier and oscillator circuits allowed
~~ egieon
3 of 49
Current Music Synthesis Algorithms
electronic instruments in a variety of forms [3]. The real development of synthesizers, how-
ever, has come about in the last 25 years following the invention of the microprocessor and
the corresponding surge in the microelectronics industry.
Current methods of electronic music synthesis have taken two distinct directions:
physical modeling, and "recording’, sampling and playback of data with some modifica-
tion. For the musician, samplers (and wavetable synthesizers) offer the most acoustically
accurate individual sounds, but the blending of sounds and potential musical expression
are limited. The musician cannot obtain the full range of expressivlty on the synthesized
instrument, simply because the required sounds were not pre-recorded. To have greater
variety in instrumental expression, more samples must be stored in memory. In contrast,
physical modeling does allow for such variety, including the blending of sounds. While pre-
cise physical models produce excellent sounds, the computational complexity required to
implement these models is prohibitive for commercial synthesizers. Also, the production of
high quality sounds requires the same degree of skill as that required to play the physical
instrument, and does not easily allow for creative design of new sounds.
In recent years, much effort has gone into the development of synthesis techniques
that implement physical models of acoustic instruments and produce sounds similar to
the instruments themselves. This work has been bolstered in the last decade by the devel-
opment of digital filters that, when excited, simulate physically vibrating strings ([8], [I 1]).
Researchers have attempted to use these easily-implemented digital filters in combination
with other filters and techniques to produce a wide range of realistic sounds [7].
The following sections will discuss frequency modulation (FM) and linear-additive (LA)
synthesis, both examples of artificial spectral synthesis, samplers and sample playback,
and current physical modeling methods. Emphasis is placed on comparing and contrast-
ing these different methods based on the quality of the sound produced for an acoustic
guitar, ease of creation of realistic sounds, flexibility in use, and computational complexity.
egieon
4 of 49
Current Music Synthesis Algorithms
2.1 FM Synthesis
Frequency modulation (FM) synthesis, pioneered by Dr. John Chowning of Stanford
1973, was made a commercial success by the Yamaha DX7, first introduced in 1980 [15].
FM synthesis works on a principle similar to FM radio: a cosine waveform, called the car-
rier, has a frequency that is dependent on another source. The DX7 has as its basic unit a
simple block, consisting of an oscillator and a time-varying amplifier whose gain is con-
trolled by an attack-decay-sustain-release (A/D/S/R) envelope, as shown in Figure 2.1.
InputWaveform(reflectinginput fromsimilar FM
blocks)Carrier
Frequency
TNDIS/REnvelope
Output
Figure 2.1.1 DX7 FM Synthesis Unit
By combining six of these units in varying combinations, a wide range of sounds can be
produced. As a simple example, consider cascading two such units. The first is set to pro-
duce a cosine wave at 75% of the maximum possible amplitude and to have a carrier fre-
quency twice as high as the second unit, which is scaled to be proportional to the key
pressed on the synthesizer. The resulting output is given by:
y [n] = cos (ton + 0.75cos (2ran))
The resulting y[n] resembles a square wave, although it does not have as many har-
monics as a true square wave, nor are the harmonics at exactly the right height. This is
the main drawback of FM synthesis: the inability to accurately reproduce a desired har-
monic spectrum. Perceived acoustical accuracy in instrument sounds can only be pro-
duced by trial and error, varying FM parameters until the synthesized sound appears
reasonably close to a desired instrumental sound (based on listening tests). The main
egie s of 49ion
Current Music Synthesis Algorithms
advantage of FM synthesis, on the other hand, is the ability to produce rich spectra that
cannot be produced any other way. FM synthesis has been used in many other Yamaha
synthesizers after the introduction of the DX7.
The acoustical accuracy of an FM synthesizer, when compared to an actual instrument,
is clearly very poor. Using FM techniques, it is quite difficult to produce a specific desired
harmonic relationship since the analysis is nonlinear and cannot be automated. However,
FM synthesis is extremely easy to implement on a single chip and does produce harmoni-
cally rich sounds.
2.2 Linear-Additive Synthesis
The linear-additive (LA) synthesis method is found in Roland synthesizers. This method
works on the principle that a low-pass filtering of standard waveforms available on func-
tion generators, such as a triangle or square wave, can produce a variety of waveforms
having less power in high harmonics. This technique is commonly known as subtractive
synthesis. The desired output sound is formed by a linear combination of several
smoothed waveforms. If the filters have a time-varying cutoff frequency then a variety of
sounds can be produced.
The Roland D-50 is an example of a synthesizer that uses LA synthesis. The D-50 has
four tone generators; each tone generator can be either a PCM sample playback unit or a
filtered waveform (synthesized). The synthesized tone generator has a time-varying low-
pass filter, in which the cutoff frequency varies with time according to an ADSR envelope,
and a time-varying amplifier, in which the gain is controlled by an ADSR envelope. The
PCM wave generator produces pre-recorded samples at the desired pitch and is followed by
a time-varying amplifier in which the gain is controlled by an ADSR envelope. Two tone
generators (called patches) are grouped into a tone structure, and two tone structures
make a sound. Tones are stereo constructs, and can have different reverberation and
~~oe~ie s of 49
Current Music Synthesis Algorithms
crossfade arrangements. The LA algorithm for the Roland D-50 synthesizer is shown in
Figure 2.2. I.
Using LA synthesis, a plucked guitar sound is created by combining a "pluck" PCM
sample and a filtered triangle wave. The resulting output is harmonically rich since the trl-
angle wave has many harmonics and sounds similar to an actual instrument since a trian-
gle wave is a possible waveform for a plucked string. The PCM waveform mimics the attack
characteristics. However, the spectra and temperament of a specific instrument is impossi-
ble to capture exactly since only the waveforms produced by the oscillators are usable.
Computationally, the LA algorithm is quite simple, and can be implemented with a DSP or
specialized hardware.
I-Patches
Synthesized
Synthesized
IorPCMW~,wl I
Output
Figure 2.2.1 Roland D-50 LA Synthesis
2.3 Samplers and Sample Playback
A sampler works on a simple principle: record a sound digitally, and then play the sam-
ple back at varying rates. Samplers are becoming more commonplace in the music indus-
try, and sampled-data synthesizers (often called wavetable synthesizers) are also
appearing. Samplers record data and use it immediately for playback, while the wavetable
ie7 of 49
Current Music Synthesis Algorithms
synthesizers have several waveforms stored in ROM and use a sample loop to extend the
sound duration.
The beauty of a sampler is that the sound reproduction is almost exact. What was
played will be faithfully reproduced when desired, and at different pitches--within reason.
Using a 440-Hz tone (A above middle C) recorded at 12 kHz used to generate a low A of
Hz requires a factor of 8 decrease in the playback rate (yielding 1.5 kHz) since 55
is one-eighth of 440 Hz. The change in playback rate relative to sample rate introduces
distortion: sharp transient effects become less transient (e.g. a sharp pistol shot becomes a
sustained cannon roar), and acoustic properties of instruments become distorted. Within
reasonable limits, samplers can accurately reproduce the instrument recorded.
The major limitation of sampling technology is that instrumental versatility is lost.
There is only so much one can do to a recorded sound to generate interesting, realistic
effects. To have a variety of instrumental playing styles, the instrument must be recorded
playing in each style.
Samplers are very computationally inexpensive, but do require large amounts of mem-
ory, which has become relatively cost-effective in recent years. The ability of the musician
to create realistically expressive sounds is constrained only by the number of instrument
sounds able to be sampled and by the limitations of pitch shifting.
2.4 Computational Physical Modeling Efforts
Current efforts (most notably [I 4]) at using physical models to generate acoustic guitar
sounds focus on two aspects of the guitar: the plucked string signal, and the guitar body
system. These methods attempt to capture the essence of the guitar system without over-
burdening the calculation requirements of the overall system. The method in [7] uses a
waveguide string model with Lagrange interpolaUon to implement non-integer periods and
uses a linear prediction error input to capture transient effects. The effect of the guitar
8 of 49
Current Music Synthesis Algorithms
body on the various harmonics is modeled by a digital filter in the waveguide feedback
loop.
This method is relatively computationally inexpensive. Up to 3000 linear prediction
error samples are stored per note, and four multiplication and addition operations per
sample are required to produce output. An entire synthesizer was implemented on a single
DSP board in software, requiring no specialized hardware. The sound quality is extremely
accurate since the inputs and the model parameters are determined from the characteris-
tics of the guitar.
2.5 Summary of Current Synthesis Techniques
Below is a table summarizing the computational complexity, acoustic accuracy, and
the ease of designing a synthesizer sound to produce the sounds of a specific instrument
(Le. matching an acousUc prototype).
ComputationalComplexity
Computational Physical Modeling
AcousticAccuracy
Ease ofMatchingAcousticPrototype
FM Synthesis Simple: done in hard- Poor Poorware
LA Synthesis Simple Good GoodSampling Simple Excellent Excellent
Fair Excellent Excellent
Sampling and computational physical modeling produce the best sounds and have the
best ability to match an acoustic prototype. While sampling is a less complex method, it
has inherent limitations that are overcome by physical modeling. These limitations are:
instrumental variety (sampling requires that samples of every instrument to be reproduced
be stored); instrumental versatility [sampling only allows one playing style for an instru-
ment, Le. the style in which the recording was generated, and can only change by storing
egieon
9 of 49
The Acoustic Guitar
more recordings); and acoustic accuracy (sampling loses accuracy as the samples are
pitch-shlfted further from the recording rate, and so require more samples to cover the
possible range of an instrument). Physical modeling allows a performer to change model
parameters "on the fly" and, although requiring some knowledge of what the model param-
eters affect, does not require the performer to be an expert on the actual instrument to
produce accurate, realistic sounds.
3. The Acoustic Guitar
Guitar Body
Neck and Fretboard
Figure 3.0.1 Acoustic Guitar Anatomy
The instrument of interest is the acoustic guitar, which has the standard construction
and terminology as shown in Figure 3.0.1. Acoustic guitars come in a variety of styles,
from classical to folk, and each guitar is subtly different due to variations in the construc-
tion of the guitar body. Guitar strings are either nylon or steel. The guitar body is roughly
hourglass in shape, with a round hole in the middle of the top plate of the guitar body. The
strings of the guitar run along the neck of the guitar to the bridge, which is positioned just
below the sound hole in the top plate.
3.1 Guitar Physics: The String
The standard guitar has six strings, each of similar length, but having different density
and tension. These differences produce the observed changes in frequency from one string
negielon
10 of 49
The Acoustic Guitar
to the next. We are particularly concerned with how the string behaves in response to
being plucked. The waveforms it produces have important implications for the quality of
the sound.
Waveforms emanating from the guitar string can be modeled as standing waves on a
medium with fixed ends [4]. The governing wave equation is shown in (2), where T is the
string tension mad ~t is the string density.
2 2Oy_ TOyOt
(2)
Solutions to this equation are known to have the form of (3) below.
y(t, x) = ~ Cnsin (Oant+ ¢~n) sin (knx)
Each Cn component measures the relative energy at the harmonics, o~rt. Note that these
solutions take the form of sinusoids in both space and time. The final waveform shape is
dictated by the initial deformation imposed--the pluck or striking of the stringmand the
changes imposed by the guitar body.
If we measure the output of an oscillating string over time, for example by recording a
plucked string, we only observe the response as a function of time. This response is
defined as the ideal plucked string response in that the pluck-point is modeled as an infi-
nltely sharp bend in the string.
The set of parameters Crt and (~n for an ideal response can be computed as in (4), where
L is the reciprocal of the proportion of the string length from the point where the string was
plucked to the bridge (e.g., 1/5 yields L=5) and h is the initial displacement.
egieon
11 of 49
The Acoustic Guitar
(Dn=
L2h .Ca = 2~-~2sln~ ~-), l~n<~
(4)
Such an Ideal string shape is shown in Figure 3. I. I for a pluck-point I/4 (L = 4) of the way
along the string.
0.2~
0.2
0.15
0.1
0.05
o ....... :- .........V’r"~ ......... ~ ........
-0.05 ........................I,...! ......... i...Halt,
-0.1 ......................... i .............: .........I
-0.15 ......................... i .......................
-0.2~, ........ : ......... .-....I ....................... :
One Pedod Of Icml Plumed String, L=,4
........ : .......... : ......i..; ......... ; ........
t
0 O.S 1 1.5 2 2.5 3 3.5 4 4.5Time x 10"~
Figure 3.1.1 Ideal Plucked String Resix~nse
Note that the Ca harmonic amplitudes fall off at 1/n 2, resulting in low power at high
harmonic frequencies. The sine term at n~/L allows no energy at the Lth harmonic fre-
quency, nor at integer multiples of that frequency. The overall shape of the spectrum is
dictated by L. For integer values of L, nulls occur at harmonic frequencies that are multi-
ples of L. For non-integer values of L, nulls in the spectral envelope occur at frequencies
that are not necessarily related to the harmonic frequencies; e.g. a value of L of 3.5 would
yield observable nulls at the 7th and 14th harmonics, but the spectral envelope null at the
"3.5th" and "10.5th" would not be as easily observed from the harmonic content of the
spectrum.
Of course, the amplitude of oscillations of a guitar string decay with time. This implies
egie 12 of 49on
Waveguide String Model
friction, or energy loss, from the system. This loss is best described as an output phenom-
enon--energy from the string is transferred to the air and to the guitar body.
A guitar string can vibrate in a three dimensional space, since it is suspended in air
between two endpoints, so there are generally more vibrations than represented by (4),
which describes vibration in one dimension only. A string tone can be thought of as a com-
bination of the two vibrational modes, one normal to the guitar body, called vertical, and
one parallel to the guitar body, called horizontal [4]. The coupling of the two modes is non-
linear and cannot be modeled completely, since it is dependent upon the guitar construc-
tion. For our purposes, however, it is adequate to assume that the two modes are
independent. Each mode is excited by a single guitar pluck, with some energy transferred
to each mode.
Each mode has different interactions with the bridge and the top plate of the guitar
body. The guitar tone for pure vertical plucking directions decays quite sharply, whereas
the tone for purely horizontal plucking decays quite slowly. The overall plucked string
response can be modeled as the sum of these two decaying modes, ignoring nonlinear cou-
pling. The relative amount of excitation transferred to horizontal and vertical modes varies
with the angle of the pluck.
4. Waveguide String Model
This section deveIops the digital string model solution to the wave equation and the
refinements necessary to allow for non-integer periods. This model, implemented as a digi-
tal filter, is used for both horizontal and vertical mode string oscillations.
4.1 Basic String Model
A digital simulation of a vibrating string can be constructed using digital waveguide
techniques, as in [I 1], yielding a general form for a string model, as shown in Figure 4.1. I.
egieon
13 of 49
Waveguide String Model
The expression g/V groups all energy loss in the string into one expression. Energy lost
generally goes through the bridge to the guitar body. The difference equation describing
the system is
y[n] = g’Vy[n-N] +x[n] ts~
which yields a z-transform representation:
1H(z) N-N1--g Z
x[n]~ Nsamplesdelay I yln-N]
Figure 4.1.1 General String Loopback Filter Model.
This filter essentially copies the signal sample values from N samples ago and multi-
plies by a decay factor of gN < 1. To generate a decaying oscillatory response, the delay line
is initialized to all zeros, x[n] introduces the first period of the oscillating signal into the
delay line, and the recursion equation produces the remainder of the response with gN act-
ing as the decay rate (per period).
More general responses, including non-integer periods and frequency-dependent warp-
ing, can be achieved by replacing gN with a feedback FIR filter h/In] having z-transform
Hi(z). The overall feedback system is then governed by the difference equation
y (n) = u In] + t In] ,y In -N]
where u[n] is an input to the system and y[n] is the resulting output. This more general
string model enables frequency-dependent losses with decay factors included in h/[n].
14 of 49
Waveguide String Model
4.2 Non-integer Periods
A specific limitation of [ ] 1] is that only frequencies w~th integer periods can be repre-
sented. For very low frequencies, this is not a terrible restriction. At higher harmonic fre-
quencies, however, an integer period approximation causes increasing error in
representable pitch, as shown in Figure 4.2.1.
100o0
Figure 4.2.1 Effects of Integer-Only Periods
Allowing only integer pitch periods is quite restrictive; however, non-integer periods can
pose quite a problem [6]. A simple solution is to interpolate between the samples at the
integer delays to approximate the signal value one non-integer period ago. One choice for
Hi(z) is a Lagrange interpolation filter, with the constraint that the sum of the filter coeffi-
cients be equal to the desired decay rate gN. Assume that the non-integer period T = N + x~
where 0<x<I. Then the current sample value can be generated from the values in the prior
period by interpolaUng between the integer samples by defining an interpolation filter:
M/2
l-~l ( Z) ~ O~iz-i
Mi=-’~+ l
where ai is a Lagrange interpolaUon coefficient given by
egieion
15 of 49
Waveguide String Model
M/2
¢xi (x) = I-1 x -__.ij (~i-]jf--~+ l,j~i
and M is the number of coefficients (an even number). Note that the index i = 0 corre-
sponds to an integer delay of N. While Hi(z) is non-causal, it is only used at delay Nso that
the overall feedback loop filter is causal. We have found that L=6 is sufficiently accurate for
providing reasonable sound using a 44.1 kHz sampling rate and a usable bandwidth of 11
kHz.
0.5 1 1.5 2 2.5 3 3.5Frequency ~n rarYsec
Figure 4.2.2 Interpolation Loop Filter Frequency Response
In frequency domain, the interpolaUon loop filter looks like a periodic series of peaks,
inverted notch filters, with a peak appearing at each harmonic, as shown in Figure 4.2.3.
The Lagrange interpolation introduces a lowpass filtering effect onto the string model, as is
clearly visible in Figure 4.2.2.
It should be noted that the string model as shown enhances energy at harmonic fre-
quencies relative to that at non-harmonic frequencies. This effect is evident from the fre-
quency magnitude spectrum in Figure 4.2.3; signals at non-harmonic frequencies are
attenuated.
~~oegien
16 of 49
Waveguide String Model
This string model is ver~ similar to the Karplus-Strong model, as presented in [5], [8],
and [12], where either white noise or an ideal string waveform is used as excitation to the
model. The KS model removes energy at non-harmonic frequencies resulting in a set of
harmonic tones. The LPF operation of the averaging filter helps to attenuate unwanted
aperiodic wide-band energy in the synthesized signal. The resulting output has an initial
burst of white noise that fades rapidly into a decaying tone. Although the random excita-
tion produces interesting sounds, it is not well-suited to make real-sounding synthetic
instruments because the signal energy is randomly distributed at each harmonic, rather
than exhibiting the spectral structure enforced by physical constraints.
Figure 4.2.3 Close-up of String Frequency Response
4.3 Waveforms on the String Model
The ideal string shape, as represented in (3), contains only harmonic frequencies. If the
ideal string shape can be synthesized for one period and used as an excitation to the string
model, the resulting output essentially will be copies of the first period with continuing
exponential attenuation. [I 2]
If the waveform input to the string model contains only harmonic frequencies, the only
effect of the string model will be the attenuation of the input signal. For example, if a sinu-
soidal waveform with frequency fo is fed into the string model, all that will be apparent is
egieon
17 of 49
Overall System Model
an exponential decay of the amplitude depending on the value of gN. The frequency of the
signal will not change, and additional harmonic components will not appear.
5. Overall System Model
By combining the physical intuition and digital modeling techniques presented thus
far, we developed a computationally efficient and physically-based model for the guitar.
This model is an extension of that presented in [ 14]; here the model parameters from sam-
pled guitar string data, and the steady-state response of the guitar is analyzed separately
from the transient vertical response. The method in [14] makes several compromises in
accurate sound blending and overall reverberation, both of which are critical to generating
sounds characteristic of the guitar and to distinguish between different guitars. The
method presented in this report takes advantage of an advanced IIR filter design algorithm
to make a resonant filter that provides excellent mixing of the guitar string sounds and
resonance based on the physical characteristics of each specific guitar body.
The overall system model used is shown in Figure 5.0.1, with the excitations forming
the input and the guitar sound as the output. HI(Z) is a Lagrange interpolation filter, and T
is an integer period.
Horizontal Excitation
Vertical Excitation
:Horizontal~’~ |;[ String Model ]___]
~i
Guitar Body ~Model Synthesized
GuitarHa(z) Sound
Figure 5.0.1 Overall Synthesis Model
There are several important components to this model:
¯ Base interpolation filter HI(Z)
¯ Filter delay T (the integer pitch period)
oegi e 18 of 49
Overall System Model
¯ Filter decay rates p 1 and 92, the 9N values for the two modes
¯ Horizontal excitation
¯ Vertical excitation
¯ Guitar body model H3(z)
¯ Wet and dry gains, which control the sound energy ratio between direct string radi-ation and the resonant body
The fundamental frequency of the sampled data is used to determine T and HI(z). The
parameters p I and P2 are determined from the envelope of the sampled data and make the
interpolation filters Hi(z) and H2(z). pl is found from the steady-state decay, as shown
Figure 5.0.2, and P2 is found from the transient decay. H3(z), found by fitting an IIR model
to a sampled guitar body impulse response, provides reverberation and sound blending.
The horizontal excitation is a single period of the steady-state response of the guitar,
which in general is a short-time stationary waveform that has some variation in frequency,
amplitude, and harmonic content. The vertical excitation is a linear prediction error of the
beginning of the sampled data and captures the transient response of the guitar pluck.
1500
1000
-1000
-150O0
Figure 5.0.2
Horizontal and Vertical Mode Exponential Fits
Vertical Decay:. 0.9~1934
Vertical Mode Horizontal Decay:. 0.99¢991
0.5 1 1.5 2 2.5Time
Horizontal and Vertical Modes of Sampled Guitar String
egieon
19 of 49
Analysis: Estimating String Model Parameters
6. Analysis: Estimating String Model Parameters
From sampled guitar string data, the various model parameters (frequency, decay rates
for vertical and horizontal modes, and excitations) are found. The following sections detail
the analysis procedure, illustrated in Figure 6.0.1. Bold boxes contain the results of the
analysis.
JSarn,oledGuitarString
. Data
.~}._~ Frequenc~Spectrum
--~ Autocorrelation ]
Enveiope---+lDetectionl
~Pluck Point /FundamentalI ~ J I . HorizontallFrequency J_.~ ~ ~ ’~Exc~ation]
L~ Linear I I Vertical ]L ~ Pre~tor~Excitation~
Figure 6.0.1 Analysis Outline
6.1 Note Frequency
Ideally, the pitch of each note has a strict mathematical relaUonship to the 440 Hz A.
An actual sampled sound is not likely to be of ideal pitch. Since the excitation calculation
depends upon matching the harmonic content of the signal, we must first match the fre-
quency of the sampled note.
Many techniques exist to determine the fundamental frequency of a note. Most of
these, however, are interested in only an integer approximation to the period. Since we
want a precise frequency estimate, interpolations on the integer results must be obtained.
Two methods for this are interpolation of autocorrelation values and interpolation of peaks
in the frequency spectrum.
The note frequency can be determined from the conUnuous autocorrelation function
egie 20 of 49
Analysis: Estimating String Model Parameters
rx~(~) of the sampled data. We assume that the signal is wlde-sense stationary for the data
analyzed. Because the signal is periodic, the autocorrelation is also periodic; finding z for
the largest value of rx~(~) for ~ > 0 yields the period of the note, and hence the frequency.
Since a precise note frequency is not likely to be captured by the discrete autocorrela-
tion function ~xx[m] obtained from sampled data, interpolation must be done to increase
the accuracy of the frequency estimate. A second order polynomial is fit to the autocorrela-
tion function using the points T- 1, T, and T + 1, where T is the integer index of the first
peak of ~xx[m]. The maximum of this polynomial then is defined as the maximum of the
continuous autocorrelation function r~o~(z), and the corresponding argument ~ is assumed
to be the pitch period as illustrated in Figure 6. I. 1.
An alternative method involves interpolation on the samples of detected peaks in the
DFT spectrum. A rough guess at the frequency is made using the integer approximation to
the period from the autocorrelation, and then a peak in the frequency spectrum is detected
near this value as shown in Figure 6.1.2. The windowing function introduces local smooth-
ness to the data, so a second order polynomial approximation is used.
The autocorrelation method is less likely to have a gross error in the frequency calcula-
tion, in that there is only one peak of interest possible so that the detection problem is
quite simple. If no guess is made of the initial frequency, the frequency spectrum interpo-
lation could potentially pick the wrong peak, even if there is a large SNR and small spuri-
ous data peaks.
For the examples shown in Figure 6.1.1 and Figure 6.1.2, the signal used is a cosine
wave at 198 Hz. The autocorrelation interpolation returned a frequency of 198.071235 Hz,
an error of 0.071235 Hz; the frequency spectrum interpolation returned a frequency of
197.98821 Hz, an error of 0.011789279 Hz. For a discussion of the accuracy and regions
of convergence for the two methods, see Appendix B.
egieon
21 of 49
Analysis: Estimating String Model Parameters
Aulocon~a|Orl FurtiVe1 and Intorpo~l~on Interpolation on F~uon~
~1 ~15 ~ ~ 5 ~3 ~3 5 ~4 ~4 5 ~Ped~ Fr~u~ ~ HZ
Rgure6.1.1 Inte~lation onCorrelation Function Figure 6.1.2 Inte~lation on Fr~ue~y Sp~trum
The fundamental frequency is used to determine the pitch period and the Lagrange
interpolator Hi(z). The interpolator is then used in the linear predictor and in the creation
of the ideal string response.
6.2 Decay Rates and Initial Amplitudes
The decay rates and initial string amplitudes of the horizontal and vertical responses
can be found by a multistep process. Consider the sampled ’G’ string data shown in
Figure 6.2.1. The desired result for the horizontal response is the initial condition and
decay rate that best fit the envelope of the signal towards the end of the sample, in this
case from time 0.5 to 1.0 seconds. This choice of samples is made on the assumption that
the vertical response decays to a negligible amount by this point in time.
The decay process occurs once per period, so the model of the envelope is given by
n/Ty[n] = C(a)
where T is the period of the note in samples, a is the decay rate, and C is the initial value
of the exponential fit (for n = 0). This model is chosen since it best fits the behavior of the
~~ egie 22 of 49
Analysis: Estimating String Model Parameters
string model, which introduces a decay of gN once a period. The values of a and C can be
estimated from the data by first performing envelope detection (since the envelope con-
tains the decay information) and then taking logarithms:
log (y In] ) = log (C) + ~3og
so we can express ~ = log(y) as a linear combination of ~ = log(C] and ~ = Iog(a].
N is the number of samples occurring between 0.5 and 1.0 sec. The matrix A is an
Nx 2 matrix containing a column of all 1’s and a column of values of n/T, where n ranges
from 0.5fs to fs. The vector x is a 2 x 1 matrix with elements ~ and ~- The variable e repre-
sents error between the model and the sampled data, and is represented as an additive
term.
~, = Ax + e 02)
The ideal C and a are computed in x to minimize the squared error between the esti-
mate Ax and ~, that is, minimizes eTe:
Train (e e) = rnin [ (y-Ax) r (y-Ax) (13)
Solving for x yields the least-squares estimate:
~ = (ATA)-IAT~ 0~)
and the optimal ~ and ~ can be found from the elements of x~ C = e~ is the initial value of
the horizontal response, and a = e~ is the decay rate p I for the horizontal response.
The horizontal decay rate is used to create the ideal string response having the desired
egieon
23 of 49
Analysis: Estimating String Model Parameters
decay rate and is also used in synthesizing the desired horizontal response.
4.6
4.8
-1~ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9T~e
Figure 6.2.1 Sampled G String
6.3 Horizontal String Model Excitation
Using the "copy" property of string models presented in section 4.3, we can formulate
the horizontal excitation as a single period of a representative portion of the steady-state
string response. The steady-state behavior is best determined from the frequency spec-
trum of the sampled data, so the excitation is formed by a Fourier Series; the Fourier
Series coefficients are determined from the analysis of the spectrum. By assuming the
body of the guitar provides only amplification to the harmonics of the string, warping them
from the ideal response, the actual information determined in the analysis procedure is
the magnitude and phase change for each harmonic from the ideal string Fourier Series,
given by (4), to the sampled string frequency spectrum.
Referring to Figure 6.0.1, three pieces of information are needed for the horizontal
string excitation: the pluck-point, the sampled-data harmonics, and the ideal string har-
monics. The sampled-data harmonics are found from a section of the guitar steady-state
response and then used to identify the opUmal pluck-point.
l egie 24 of 49
Analysis: Estimating String Model Parameters
A short-time statlonar~ section of the sampled waveform is chosen for analysis. The
harmonics are identified in the signal spectrum, as shown in Figure 6.3.1. This spectrum
was generated by windowing the waveform and performing a very long FFT--65536
points--resulting in 0.67 Hz / bin resolution. Since the window length is on the order of
10,000 samples, the width of the main lobe of the peak in the spectrum is at most 26 Hz,
so two peaks in the spectrum will not overlap (recall that the lowest frequency on a guitar
is 82 Hz). The harmonic peaks are extracted by searching for local maxima around the
expected frequencies. We assume that harmonics within 50 dB of the largest harmonic are
not significant perceptually. Analysis has determined that less than 25 harmonics are
needed to synthesize the horizontal excitation for each note from our sampled data. See
Appendix A for a listing of notes and significant harmonics.
The pluck-point, L in (4), is determined from the data by finding the value of L that
minimizes the absolute value of the error between the ideal string spectrum and the sam-
pled-dam spectrum. The [I norm shown in (15) yields accurate results in this application.
Detected Harmonic Components
1001
-2 0 I 5OO ’ ’ 4OO0Frequency in Hz
Figure 6.3.1 String Spectrum With Harmonics
negieIon
25 of 49
Analysis: Estimating String Model Parameters
The values ha form a set of harmonics obtained from the spectral analysis of sampled
data as in Figure 6.3.1, and N is the number of significant harmonics.
(15)
An error surface for varied L is constructed by evaluating the error criterion for values of L
between 1 and 20 in intervals of 0.01, as illustrated in Figure 6.3.2, and the minimum of
this error is chosen as the pluck point. This method works well when a known pluck point
is to be estimated, and so we can only assume it works well in the case of sampled data.
Minimum Error
1,4
1.2
1
~0.8
0.6
0,4
O0 2 4 6 8 10 12 14 16 ’~ 20Guess at L
Figure 6.3.2 Error For Varying L
Referring to Figure 6.0.1, once L is determined, an ideal string response using the esti-
mated initial amplitude and decay rate is calculated. The harmonics of the ideal response
are compared to the harmonics of the sampled data (using the same sample points for the
spectral analysis), and the magnitude and phase changes for each harmonic are calcu-
lated. These changes represent the response of the guitar body and any non-idealities in
the guitar string.
The magnitude and phase of the change from ideal string to sampled data for each har-
egieon
26 of 49
Analysis: Estimating String Model Parameters
monic is the only information that needs to be stored. This means that a waveform shape
can be characterized by a maximum of 50 numbers. The waveform can then be recreated
by incorporating these changes in the excitation calculation---evaluating the Fourier Series
by using the deal waveform shape from (4) and changing each a as necessary. Since t he
Fourier Series can evaluated for any fundamental frequency, the waveform shape can be
calculated for any note as well.
An alternative to the Fourier Series would be to design a digital filter that matches the
measured magnitude and phase of the change at each harmonic frequency for a number of
notes, but this has proven to be extremely difficult since the composite changes from anal-
ysis of different notes on the same string have inconsistent results. See Appendix C for a
discussion of the analysis results.
6.4 Vertical String Model and Excitation
Referring to Figure 6.0.1, the vertical excitation depends on the decay rates and the
sampled data. The same string model as was used for the horizontal response is used for
the vertical response, but with a different decay rate. The main difference between the ver-
tical and horizontal responses is that the vertical is of much shorter duration due to its
rapid decay rate, and thus must be identified from the initial segment of the signal.
The beginning of the sampled signal is used for analysis. The horizontal response is
subtracted so that, when the outputs of the two string models are added together, the
sampled string response is best approximated. The decay rate of the vertical response is
calculated in a similar fashion to the horizontal decay rate as detailed in section 6.2. In
this case, however, the starting point is chosen to be the maximum of the vertical response
since this initial value is available (in contrast to the horizontal response, where the initial
value must be estimated).
The string model is inverted and used as a linear predictor, as in [I I]. The prediction
egieon
27 of 49
I
Analysis: Estimating A Linear Body Model
error equation is given by (16),
e[n] = x[n] -h[n]*x[n-N] 0~
where h[n] is the sixth-order Lagrange interpolator (8) multiplied by the vertical decay rate
and xln] is the initial segment of the sampled string data.
The prediction error signal, if fed into the string model in its entirety, would allow the
exact signal to be recreated. Since storing error signals becomes prohibitive, some trunca-
tion is necessary. Fortunately, the error itself dies away exponentially, so a representative
version of it can be chosen. As an example, approximately 5000 samples of the error signal
are adequate to recreate the sampled G string.
7. Analysis: Estimating A Linear Body Model
The main motivation for a model of the guitar body is to provide resonance and sound
blending of the string sounds, similar to the acoustics of a resonant cavity, and to provide
a more physically-based method for the design of such a filter. Although an all-pass rever-
berator system provides good resonance and sound blending, it has little correlation with
the behavior of a physical instrument. Note that the exact spectral shaping of the guitar
body is not required since the horizontal and vertical excitation contain this information.
7.1 Difficulties In Body Modeling
Previous reported attempts at modeling the body of an acoustic guitar with a linear fil-
ter ([14], 17]) report that either a very long FIR filter (1000+ taps) or a very high order
pole model (200+ poles) are required to adequately model the impulse response of a guitar
body. These filters cannot be used for real-time synthesis since they are prohibitively com-
putationally expensive. Using IIR filters (with both poles and zeros) yields a smaller model.
Mean-square optimal IIR filters are difficult to design since the optimal coefficients are
I oegie2e of 49
n
Analysis: Estimating A Linear Body Model
described by systems of nonlinear equations, the designed filter approximation has exces-
sive spectral smoothing due to the mean-square curve fits, and hlgh-order filters succumb
to fixed-preclslon numerical computation effects and become unstable. To alleviate some
of these problems, prior work in fIR modeling explored perceptual preprocessing and non-
linear optimization methods with limited success.
The current method uses the design algorithm proposed in [2]. Good real-time and
numerically robust performance is obtained by using parallel fourth order filters designed
with a hierarchical algorithm; this algorithm solves problems with multiple equivalent
solutions and captures spectral peaks and dips using simple filters and an iterated least-
squares procedure.
7.2 IIR Design Algorithm
The response of the guitar body is experimentally determined from a sampled impulse
response h[n]. The guitar is "thumped" on the bridge by a sharp instrument, and the
resulting sound is recorded with high-fidelity microphones and sampled at 44.1 kHz with
16 bit quantization. The sampled data is then used as an impulse response to an IIR filter
design algorithm [2]. Three fourth-order parallel filter structures are designed from the
impulse response using the following procedure: the filter is designed from the input
impulse response; the impulse response of the filter is calculated, and subtracted from the
+ Algorithm 1.11 IIR Filterl I Response ~
+
Figure 7.2.1 Parallel Filter Design Algorithm.
oegie 29 of 49
Analysis: Estimating A Linear Body Model
~ (Hz)
2000 3000 4000 ~00fr~pe,’w (Hz)
Figure 7.2.2 Impulse Response FrequencySpectrum and IIRFilter Frequency Response
t
m d~t~
_lI I I I I I I I I I
-0.
Figure 7.23 Sampl~ Impulse Nes~nse andFi~er Impulse Res~nse
240 ~
P¢de= (’x’) end Zero~ (’o’) Of Body Filter~0
~70
Figure 7.2.4 Poles and Zeros Of Linear Body Filter
input response. This error response is then used as input to the design algorithm, and the
procedure is repeated, as shown in Figure 7.2.1. This method builds higher-order struc-
tures from very low order filters, and therefore the filter performance is very robust with
respect to filter coefficient quantization. The repetition of the procedure allows for captur-
ing most of the resonant frequencies using a MSE measure. Experimental results show
that fourth-order filter increments are best for characterizing the resonant modes of the
guitar body.
egieion
30 of 49
Example: Guitar ’G’ String Analysis
Consider the frequency spectrum in Figure 7.2.2, obtained from a sampled impulse
response, and the fIR filter response that was designed from it. The fIR filter picks out
most of the important resonant modes evident in the sampled impulse response. This IIR
filter has an impulse response that closely matches the sampled impulse response, as
shown in Figure 7.2.3.
Note that the impulse response of the IIR filter has the same characteristics as the
sampled impulse response, but does not have the high-frequency components of the sam-
pled impulse response. This IIR filter has the resonant properties of the impulse response,
and so will provide resonance to the string model sound. The poles and zeros of this filter
are, for the most part, within the unit circle, as shown in Figure 7.2.4; two very large zeros
are at angles 0 and n, not shown on the plot. Note that most of the poles and zeros are at
low frequency, since the interesting portion of the spectrum is at low frequency.
8. Example: Guitar ’G’ String Analysis
As an example of the analysis technique, the analysis of a sampled guitar ’G’ string
(196 Hz, ideally) is presented below.
8.1 Horizontal Excitation
The data was sampled at 44.1 kHz using 16 bits of resolution for one second. A plot of
the waveform is shown in Figure 8. I. I, which also shows the horizontal and vertical decay
rates calculated for this data. The two components are clearly visible: the horizontal mode
is best represented by the section 0.5 to 2 seconds, while the vertical mode is best repre-
sented by the section 0 to 0.25 seconds. Samples 35000 to 88200 are chosen for calcula-
tion of the horizontal excitation, and are shown in detail in Figure 8.1.2. Note that the
signal appears to be short-time stationary over this time period.
A 65,536-point DFT is computed using this data multiplied by a raised cosine window.
oegi e 31 of 49
Example: Guitar ’G’ String Analysis
The frequency of this note was estimated to be 204.187 Hz, which is close to the expected
196 Hz. The harmonics are clearly visible in Figure 8.1.3; the lines show where the first
twelve harmonics were expected based on the calculated frequency, and the ’o’s show
where a local maximum in the spectrum was found. From Figure 8.1.3, two things are
noted. First, twelve harmonics are significant since none are visible at frequencies above
2500 Hz. Secondly, there is an obvious spectral "null" at the eighth harmonic (around
1600 Hz)nbut there is no null at the fourth harmonic, where there would be if L in (4)
were 8. The optimal value of L came out to be 7.487, so a low point occurs at the eighth
harmonic, as expected.
Using the approach in section 6.2, the decay rate and initial amplitude of the horizon-
tal waveform were calculated to be 0.995 and 114.5 after normalizing the signal power
over one second. The vertical waveform was calculated to decay at 0.9519 and have an ini-
tial amplitude of 1255.1. The mixture between horizontal and vertical modes indicates that
the string was plucked in a mostly vertical direction and that the overall string sound has
a sharp "pluck" sound.
1~00Horlzonlal and Vertical Mode Exponential Fits
10~C Vedical Mode
Vedical Decay:. 0.951934
Horizontal Decay:. 0.994991
-10~ Hodzonml Mode
-15000 0.5 1 1.5 2 2.5
Time
Figure 8.1.1 Sampled Guitar ’G’ String And Exponential Curve Fit
oegien
32 of 49
Example: Guitar ’G’ String Analysis
0 8 0.805 0.51 0.815 0,82 0.825 0.83 0.835Time
Figure 8.1.2 Samples aSO00 to aTO00
-20~0 500 1000 1500 2000 2500 3000 3500 4000
Frequency in Hz
Figure 8.1.3 ’G’ String Spectrum
0 0,05
Figure 8.1.4
Change in dB: Ideal and Sampled String
0.1 0.15 0.2 0.25 0.3 0.35Frequency in rad/sec
Change In dB From Ideal String
1
0.8
O.6
0‘4
~.0‘~
-0.4
-1
Horizontal Exc~ati~: One Period Q 204.6 Hz -> 217 samples
20 40 60 80 100 120 140 160 180 200Sample Number (t * 44100)
Figure 8.1.5 Horizontal Exc/tation
In order to get the correct Fourier Series coefficients for the horizontal excitation, we
need to include the decay rate and other effects of the string model. We calculate an ideal
excitation using (4), and simulate the string response using the ideal excitation as input.
The ideal response is then compared to the sampled response at the 12 harmonics, and
the differences in magnitude and phase are calculated. The changes in spectral magnitude
from the ideal to the sampled string response is shown in Figure 8.1.4, with the ’x’s repre-
~~oegien
33 of 49
Example: Guitar ’G’ String Analysis
senting the change in dB at that harmonic.
By using the first twelve Fourier Series coefficients of the ideal excitation, and incorpo-
rating the changes in magnitude and phase just calculated, the horizontal waveform can
be calculated using the Fourier Series; the horizontal excitation is shown in Figure 8.1.5.
From this analysis, we have determined the sampled note frequency, decay rates for
the vertical and horizontal modes, and an excitation to the horizontal string model. The
vertical excitation is all that remains.
8.2 Vertical Excitation
The vertical excitation is the simplest to calculate since it is a linear prediction error
generated by filtering. The horizontal string response is subtracted from the sampled data
resulting in the signal x[n] shown in Figure 8.2.1.
The long-term prediction filter in (17) has the same coefficients as the vertical string
model, where a~ is the sixth-order Lagrange interpolation coefficient multiplied by the
exponential decay of the vertical mode.
3
e[n] = x[n] - ~.a tzix[n-T-i] (17)
The linear prediction error e[n] obtained from [17) is the vertical mode excitation. A plot
of this error for the ’G’ string is in Figure 8.2.2. The error signal has an exponential decay,
which is a good sign, implying that the linear prediction is a reasonable model. The first
0.11 seconds are chosen to represent the significant part of the error signal. Note that the
signal has a value close to zero at t = 0.11 sec.
The vertical excitation has a fairly high bandwidth (13 kHz before the spectrum is con-
sistenfly 50 dB from its maximum value). It is stored in a table to save computational costs
ien
34 of 49
Synthesis
in creating this error signal from a Fourier Series or other spectral representation.
Mode For Pmdielio~
-1 SO00 0.02 0.04 0.06 O.Oe 0 1
Figure 8.2.1 Data For Vertical Excitation Calculation
linear Prodictio~ Error of VeStal Mode
0.~ 0.~ 0.~ 0.~ 01.
Figure 8.2.2 Error Signal for ’G’ String
9. Synthesis
Synthesis is much easier than analysis; the excitations and filter coefficients are
retrieved from a table lookup or calculated, then are modified by parameters specified by
the user, and the excitations are then processed through the string model and guitar body
filters.
egieon
35 of 49
Synthesis
User-specified parameters allow for flexibility, expression, and creativity in synthesis.
For example, different plucking styles result in different string sounds, different string
types (nylon versus steel) change the overall tone, and note volume, pitch, and external
decay factors modify the overall sound as well. In addition, the choice of guitar body affects
the overall guitar tone.
9.1 String Type and Plucking Style
The easiest way to replicate the sounds generated by different playing methods is to
very the overall excitation: the percentage of each excitation (vertical and horizontal). For
example, a harsh, mostly vertical pluck would be composed mostly of a vertical excitation
with little horizontal; in contrast, a softer gentler excitation would have more equal por-
tions. The type of string determines the actual excitation used.
9.2 Volume
Volume is controlled by the excitation gain. Essentially, this gain corresponds to "how
hard was the string plucked"? Since we use a linear model, no distortion parameters, such
as saturation, distension, or spatial limitations, are considered.
9.3 Pitch
The desired pitch affects both the interpolation coefficient values and required excita-
t.ions. Interpolation coefficients are modified to account for the different pitches, and the
excitations are modified to have the necessary frequency characteristics associated with
each note. For example, to play a G# using the model obtained from sampled G note data,
the horizontal excitation is pitch shifted by a factor of 1.05946, a relatively simple opera-
tion requiring only recalculation of the Fourier Series using the new fundamental fre-
quency. The string model converts the vertical excitation to the correct pitch by removing
energy that is not at the harmonic frequencies.
oegie 36 of 49
Computational Analysis of Synthesis
9.4 Decay Rate
The string decay rate is used to reflect changes in the damping of the string caused by
either a different string type or playing style. A slower decay rate on the horizontal mode
allows the string to resonate for a longer period of time, while a shorter decay rate attenu-
ates the string sound more rapidly.
9.5 Body Model
The guitar sound is greatly affected by the choice of the guitar body. Without any guitar
body the strings sound disjoint and harsh. The sound blending provided by the guitar
body model combines the string sounds smoothly, and the resonant frequencies in the
body model add life and fullness to the string sounds. Different body models have different
characteristics, so a different body results in a different guitar sound. The synthesist has a
choice of body models depending on the type of guitar (e.g. folk, classical, Spanish, etc.)
and a choice of sound mixture to allow for different guitar tones.
10. Computational Analysis of Synthesis
Each sample calculation using this model requires 14 multiplications (six for each
interpolation, and one for the gain term on the input) and ] 4 additions (the two inputs and
the 12 interpolation results). This is extremely cheap. The string models need to be dupli-
cated for polyphonic synthesis; six string models running in parallel require 84 multiplica-
tions and 84 additions. The linear body model requires 26 multiplications and 26
additions, for a total of 110 multiplies and 110 additions. 220 floating point operations at
44. I kHz requires 9.7 Mflops, a reasonable computational load for a DSP or general pur-
pose processor.
There is, however, the overhead in calculating and/or storing the excitations. If it is
possible to do the excitation calculations in real-time, then only the data for the basic six
egie 37 of 49
Conclusion
strings needs to be stored. Realistically, the horizontal and vertical excitations for each
note to be played would be stored in a table and looked up as necessary. At a maximum of
5000 16-bit samples per vertical excitation, and 14 excitations (allowing for poor interpola-
tion on lower frequency notes), plus 120 horizontal excitations (20 frets per string times six
strings) at an average of 176 samples each, a single instrument requires 140 kB of mem-
ory. This allows for the storage of a greater variety of excitations, with very interesting pos-
slbflities. Instead of having one acoustic guitar on a synthesizer, it would be possible to
have a folk guitar, a classical guitar, an unplugged electric guitar, and many others.
11. Conclusion
This thesis has presented a technique using analysis, synthesis, and physically-based
computational modeling to cost-effectively synthesize realistic acoustic guitar sounds from
prior analysis of sampled data. Since the physical modeling was based on easily-obtained
instrument information (the plucked-string sound and the "thump" of the instrument
body), this technique is not limited to guitars only, and can be applied to almost any
plucked-string instrument, such as harpsichords, string basses, and pizzicatto violins. A
variety of parameters are available to the performer to change the characteristics of the
sound produced.
Since the computational requirements per sample are quite small, this system can be
easily implemented in software and run in real-time. Although there is a considerable
overhead in calculation and storage of the excitations to the model, the resulting output is
extremely realistic when compared to artificial methods like FM and LA synthesis, and is
comparable to the sampling and other computational modeling methods in terms of overall
quality, computational complexity, and data storage. This method has more variety in the
number of sounds it can produce compared to sampling since the parameters of the syn-
thesis algorithm can be changed readily to produce new guitar sounds from the same
~~ egieon
38 of 49
References
input data. In addition, the analysis is automated, so new instruments can be designed
very easily and quickly, allowing for more accurate synthesis of different guitars.
Future work in realistic synthesis is in: modeling non-linear string properties; incorpo-
rating other guitar string effects, like inter-string frequency stimulation ("wol~’ notes] and
beat patterns; including the effect of the guitar body in the analysis procedure with an
inverse of the body filter; applying the analysis and synthesis techniques to other instru-
ments; implementing the synthesis procedure in a real-time environment and providing a
user interface to the model parameters; modeling the vertical excitation as a combination
of deterministic and stochastic signals for further compression; and investigating the
response of the guitar body in greater detail.
At the present time, the synthesis produces good guitar notes, but a usable synthesizer
is not yet sufficiently advanced to produce high-quality real-time synthesis. More work in
interfacing to the algorithm is required to make it a viable synthesis method.
12. References
[1] Borin, G., et. al. Sound Synthesis by Dynamic Systems Interaction. From Readings inComputer Generated Music, D. Baggi, Ed. IEEE Computer Society Press, 1992.
[2] Cheng, M. Analysis of Least-Squares Approaches With Applications For Pole-Zero Model-ing. Ph.D. Thesis, Carnegie Mellon University, 1995.
[3] Dorf, Richard H. Electronic Musical Instruments, Third Edition. New York: Radiofile,1968.
[4] Fletcher, N.H. and Rossing, T.D. The Physics of Musical Instruments. New York:Springer-Verlag, 199 I.
[5] Jaffe, D. A. and Smith, J.O. Ill. Extensions of the Karplus-Strong Plucked-String Algo-rithm. Computer Music Journal, Vol. 7, No. 2. MIT Press, 1983.
[6] Karjalainen, M. and Laine, U. A Model for Real-Time Sound Synthesis of Guitar On aFloating-Point Signal Processor. IEEE Transactions on Signal Processing, 1991.
[7] Karjalainen, M. et aL Towards High-Quality Sound Synthesis of the Guitar and StringInstruments. Proceedings of the ICMC, 1993. pg. 56-63
egieon
39 of 49
Appendix A: Note Information
[81 Karplus, K. and Strong, .& Digital Synthesis of Plucked-String and Drum Timbres. Com-puter Music Journal, Vol. 7, No. 2. MIT Press, 1983.
[9] Marple, S. Lawrence Jr. D/g/ta/SpectraIAnalysis. Englewood Cliffs: Prentice-Hall, Inc.,1987.
[10] Oppenheim, Alan V. and Schafer, Ronald W. Discrete-Time Signal Processing. Engle-wood Cliffs: Prentice-Hail, 1989
[1 I] Smith, J. O. Ill. E~ and Physically Accurate Simulation of Strings, Bores, andHorns using Digital Waveguide Techniques. From the CCRMA Associates Conference,Stanford University, May 1991.
[I 2] Stonick, V. L. and Massie, D., "ARMA Filter Design for Music Analysls/Synthesls,"Proceedings of lEEE International Conference on Acoustics, Speech and Signal Process-ing], March 1992, vol. II, pg. 256-260.
[ 13] Sullivan, C. Extending the Karplus-Strong Algorithm to Synthesize Electric Guitar Tim-bres with Distortion and Feedback. Computer Music Journal, Vol. 14, No. 3 MIT Press,1990.
[ 14] Valimaki, V., et aL "Physical Modeling of Plucked String Instruments with Applicationto Real-Time Sound Synthesis", Presented at the 98th Convention of the Audio Engineer-ing Soc/ety, Paris, 1995
[ 15] Yelton, Geary. The Rock Synthesizer Manual Woodstock, G~ Rock Tech Publications,1986.
13. Appendix A: Note Information
For our sampled data, the following characteristics were noted and parameters were cho-sen to fit the model.
String
EADGBHigh E
TABLE 1. String Analysis, Open Strings
Length ofNumber of Linear
Horizontal Vertical Significant PredictionFrequency Decay Decay Harmonics Error
82.5073 Hz 0.9930 0.9650 14 3255109.3729 Hz 0.9925 0.9400 16 2851145.9846 Hz 0.9975 0.9300 11 2203195.5931Hz 0.9928 0.9450 14 3266246.1802 Hz 0.9950 0.9850 17 1724326.1793 Hz 0.9972 0.9850 21 3236
~~ egieon
40 of 49
Appendix B: Notes on Interpolation
14. Appendix B: Notes on Interpolation
In section 6. I, two methods of interpolation for estimation of the fundamental fre-
quency of a note were presented. The accuracy of these methods and the constraints on
this accuracy are of some interest.
14.1 Accuracy of Frequency Prediction With Autocorrelation
Spectral estimation theory tells us that unbiased autocorrelation estimates, as in (18),
have a variance [19) that is proportional to the length of the data sample [9]. This variance
introduces error in the estimation process.
1Lx[ml = (18)
(N’- rrl) 2 rx~ [k] + rx~ [k + m] r~x [k- m] 0~)k =--~
Unbiased estimates of the autocorrelation are calculated efficiently from the data by
using circular convolution and the FFT, as described by [10]. Finding the maximum of the
autocorrelation yields the fundamental period of a note and hence the fundamental fre-
quency.
Consider Figure 14. I. 1, which shows an estimated autocorrelation from I0000 data
points and an analytical autocorrelation function. The largest error between the ideal func-
t_ion and the estimated function is 0.0025, which leads to an estimated frequency error of
0.08 Hz. Experimental results show that for a fixed frequency, the estimated frequency has
an error curve that is proportional to the number of sample points taken, as shown in
Figure 14.1.2.
egieon
41 of 49
Appendix B: Notes on Interpolation
Estimated and Ideal Autoco~relation Functions
0.9g~5 I-
0.999
O.gg85 r
0.9975 ~221 221.5 222 222.5 223 223.5 224 224.5 225
Tau
Figure 14.1.1 Estimated vs, Ideal Autocorrelation Functions
10°
0.5 1 1.5 2 2.S 3 3.~; 4 4.5 5Length of Sample x 10‘=
Figure 14.1.2 Error of Frequency Estimation vs. Sample Length
There is another consideration: the accuracy of this method over a range of frequen-
cles. In Figure 14.1.3 the error of the autocorrelation method is determined for a fixed
number of samples and for varying frequency.
~~ egieon
42 of 49
Appendix B: Notes on Interpolation
2
1.5
0.5
-1.5
-2.5
0100i I I
0100l I
"-30 4000 5000 6000 7 8000 9000 10000Frequency
Figure 14.1.3 Error of Estimated Frequency From Correlation vs. Frequency
From these plots we can learn several things:
¯ This method of frequency estimation approaches the true value from above.
¯ Error is proportional to sample length (this is also known from spectral estimationtheory).
¯ Accuracy is limited to regions where the autocorrelation function is fiat (Le., lessthan 2000 Hz). After this point the polynomial interpolation is not a valid operation.
14.2 Accuracy of Frequency Prediction With Frequency Interpolation
Frequency interpolation is performed on the magnitude of the estimated spectrum. A
local maximum is detected, and a polynomial is fit to the points containing this maximum
and the nearest neighbors. The maximum of this function is taken as the maximum of the
DFT for this region. Since windowing introduces local smoothness to the region, the poly-
nomial interpolation is a valid operation.
Consider Figure 14.2.1, where the interpolation of a 100 Hz cosine wave is performed.
The maximum of the interpolating polynomial occurs at 99.991 Hz, an error of 0.009 Hz.
43 of 49
Appendix B: Notes on Interpolation
64
63
62
61
~8
56 98 100 102 104 106Frequency in Hz
I08
Figure 14.2.1 Polynomial Interpolation On Frequency Spectrum
This method is more and more accurate as the length of the sample increases primarily
due to the greater frequency resolution of the sampled spectrum. As the sample length
increases, the calculation error was found to be as shown in Figure 14.2.2.
IO~
10-1 .
10-=
~o-~
10"*0.5Length of Sample x 10’~
Figure 14.2.2 Error of Estimated Frequency vs. Sample Length
For a fixed sample length, this method has roughly consistent error over a range of fre-
quencies.
egie = of 49on
Appendix B: Notes on Interpolation
¯ 0 1000 2000 3000 4000 S(~X) 6000 7000 8000 g0(X) 10000Note Frequency
Figure 14.2.3 Error of Estimated Frequency From Spectrum vs. Frequency
This method has some advantages over the autocorrelaUon method in that it’s accurate
over a much wider range and is more accurate in the actual prediction, but it suffers from
inaccuracy in the initial guess. If the two methods are combined, as in section 6.1, very
good results can be obtained.
ie45 of 49
Appendix C: Results of Body Response Estimation
15. Appendix C: Results of Body Response Estimation
In section 6.3 the horizontal mode of the guitar is synthesized from one period of a rep-
resentative waveform of the steady-state response of the guitar. It is assumed that the gui-
tar response is a result of taking an ideal string response (calculated from the string pluck
point) and passing it through a filter that modifies the shape of the harmonics.
By taking a section of the sampled waveform and the corresponding section of the
"ideal" response, Le. the response calculated from an ideal excitation, the harmonics of
each waveform can be compared to determine the necessary filter that would convert the
ideal string to the sampled string. Note that this assumption also includes the different
plucking possibilities (fingered, guitar pick, etc.) into the body response.
Data was obtained from a guitar by recording the plucked string sound onto DAT at
44. I kHz with 16 bit sampling. The data was transferred from DAT to computer disk for
analysis. The guitar used was a Yamaha C-55A. The microphone was placed approximately
six inches from the sound hole.
The same analysis procedure was run independently on each note played. The funda-
mental frequency was determined using the correlation estimate, and the vertical and hor-
izontal mode decay rates were estimated. The pluck point was determined using the
harmonics of the string and the method outlined in section 6.3. An ideal string response
was synthesized. Figure 15.0.1 and Figure 15.0.2 show the analysis and synthesis of the
ideal response. For this waveform, the initial amplitude was found to be 471.8 and the
decay rate to be 0.9886 (recall that this is the decay from one period to the next).
The harmonics of the sampled string and the ideal string are calculated from 10764
data points (corresponding to 50 periods of data) starting at the point the exponential
model is fit using a 65536-point FFT and a Nuttall (sum of cosines) window.
egie01~
46 of 49
Appendix C: Results of Body Response Estimation
Figure 15.0.1 Analysis Portion of SampledResponse
x tO’
Figure 15.0.2 Envelope of Ideal Response
The harmonics of the input and output are shown in Figure 15.0.3, with the ’x’ repre-
senting the change in dB to obtain the output from the input.
Change in dB: Ideal ~ Sampled Stdng
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35Freouencv In rad/sec
Figure 15.0.3 Change in Magnitude
This experiment was repeated for the first six notes along the ’G’ string (ideal 196 Hz).
The frequencies, decay rates, and other information detected are in Table 2, and the mag-
nitude change in dB is in Figure 15.0.4.
egieon
47 of 49
Appendix C: Results of Body Response Estimation
Note Number
Note 13Note 14Note 15Note 16Note 17Note 18
TABLE 2. Results of Analysis
NumberHorizontal Vertical of
Frequency Pluck Point Decay Decay Harmonics
204.8470 2.2787 0.9886 0.9598 14218.3551 2.6315 0.9899 0.9954 13231.1937 2.6615 0.9928 0.9854 9244.4782 6.6475 0.9947 0.9939 9259.0775 3.4460 0.9942 0.9856 12274.6258 4.7181 0.9948 0.9985 8
The results in the table are roughly consistent: the pluck point increases as the string
length decreases (with the exception of Note 16), which makes sense if there is a constant
pluck point on the string. The horizontal decay rates are very accurate, but the vertical
decay rates suffer from a lack of data (since the vertical decay is harder to detect). The
most interesting trend is the decrease in the number of significant harmonics (with the
exception of Note 17), which demonstrates the low-pass effect of the body since the har-
monics are higher and higher in frequency as the fundamental increases.
In Figure 15.0.4, the harmonics of each note are plotted as a group--i.e., the funda-
mentals are the first curve, the second harmonics are the second curve, etc. The curves
are disjoint since each note does not have all the harmonics. The magnitude change data
is more inconsistent. Trends are observable, but there are striking inconsistencies. The
data is muddled between 0.25 and 0.3 rad/scc (corresponding to 1.75 kHz to 2.1 kHz). The
trends are a strong indication of linearity in the guitar body, but the inconsistencies are an
indication that something else is afoot. The most consistent region is from 0.2 to 0.25 Hz
where the harmonics of lower notes are on the same curve as harmonics of higher notes
negieIon
48 of 49
Appendix C: Results of Body Response Estimation
20
-15
-20
-25
-300
I
0.05
+ X 0
I I I I I I
0.1 0.15 0.2 0.25 0.3 0.35Frequency in rad/sec
Figure 15.0.4 Magnitude Change for Six Notes
l egieon
49 of 49