Synthesis of an Acoustic Guitar with a Digital String ... · PDF fileSynthesis of an Acoustic Guitar With a Digital String Model and Linear Prediction Kevin Bradley submitted for partial

CARNEGIE’ MELLONDepartment of Electrical and Computer Engineering~

Synthesis of an Acoustic Guitarwith a Digital String Model and

Linear Prediction

Kevin Bradley

1995

Advisor: Prof. Stonick

Synthesis of an Acoustic Guitar With aDigital String Model and Linear

Prediction

Kevin Bradley

submitted for partial fulfillment of the Master ofScience degree requirements in the

Department of Electrical and Computer Engineering

Carnegie Institute of Technology

Carnegie Mellon University

May 1, 1995

Advisor: V.L StonickReader: R.M. Stern

Introduction

1. Introduction

With the development of electronic synthesizers, much of the focus in music produc-

tion is shifting from the skilled performer, knowledgeable and facile on an instrument, to

the composer aided by an electronic music box. The composer may not be individually

skilled at each instrument, but with a synthesizer, may simultaneously compose and listen

to an entire orchestra at his fingertips. The beauty, quality, and creativity of the musical

composition relies on the capabilities of the synthesizer to create both natural-sounding

instruments, particularly the common acoustic instruments in orchestras and bands, and

innovative new sounds.

Music synthesizers currently implement acoustic guitars using a wide range of tech-

niques, including the creation of artificial spectra, as in FM synthesis; the playback of dig-

itally recorded segments of guitar sounds, as found on sampler-based systems; and

systems based on physical models of the guitar. The term "sampler" in this context refers

to an electronic instrument that digitally records segments of acoustic waveforms and then

can recreate these waveforms at a variety of playback rates, changing both pitch and dura-

tion of the original recorded sound. In contrast, the "sampling" of acoustic waveforms

refers to the process of converting continuous-time signals to discrete-time representa-

tions.

Music synthesis algorithms that are based on models of the physical behavior of musi-

cal instruments attempt to capture the major attributes of the instrument in response to

some ideal input, such as an impulse. Physical models are generally non-linear dynamic

systems and often require large mainframe computers to synthesize sounds in a reason-

able time-frame. Alternative methods have been proposed that use more computationally

efficient models typically based on linear system models. Such models are more computa-

tionally efficient and can be implemented using low-cost DSP-based systems.

egieon

2 of 49

Current Music Synthesis Algorithms

The goal of this Master’s Project is the development and implementation of a computa-

tional model for music synthesis that produces realistic acoustic guitar sounds. The com-

putationally-efflcient, physically-based model of an acoustic guitar developed here has

four parts: a digital string model, based on a digital waveguide filter; the guitar steady-

state response analysis and synthesis; the guitar transient analysis and synthesis; and a

linear IIR filter that models the impulse response of the guitar body. Model parameters--

including note frequency and feedback decay rate--and guitar characteristics--including

the guitar body impulse response and note pluck point--are determined from automated

analysis of plucked guitar strings and "thumped" guitar body sampled at 44.1 kHz with 16

bits of quantization, the CD standard. The analysis and synthesis of acoustic guitar

sounds using this model does not require special-purpose hardware and can be imple-

mented with relative simplicity and low cost in software.

In this report, a brief overview of current synthesizer practices and technology is pre-

sented in section 2, and the physical characteristics of an acoustic guitar are discussed in

section 3. Linear filter models for digital strings are developed in section 4, and the com-

plete guitar model is presented in section 5. Methods for estimating string model parame-

ters from sampled data are developed in section 6, followed by a discussion of linear body

modeling in section 7. A complete analysis of a sampled string is presented in section 8.

Section 9 discusses how the model parameters affect the resulting string sound. Finally, a

rough computational expense analysis of the complete synthesis system is presented in

section 10.

2. Current Music Synthesis Algorithms

Electronic music synthesis has had a long, variegated history. In the early 1900’s elec-

tronic organs produced sound by spinning disks at various rates with electric motors. In

later years, the development of the vacuum tube amplifier and oscillator circuits allowed

~~ egieon

3 of 49


electronic instruments in a variety of forms [3]. The real development of synthesizers, how-

ever, has come about in the last 25 years following the invention of the microprocessor and

the corresponding surge in the microelectronics industry.

Current methods of electronic music synthesis have taken two distinct directions:

physical modeling, and "recording’, sampling and playback of data with some modifica-

tion. For the musician, samplers (and wavetable synthesizers) offer the most acoustically

accurate individual sounds, but the blending of sounds and potential musical expression

are limited. The musician cannot obtain the full range of expressivlty on the synthesized

instrument, simply because the required sounds were not pre-recorded. To have greater

variety in instrumental expression, more samples must be stored in memory. In contrast,

physical modeling does allow for such variety, including the blending of sounds. While pre-

cise physical models produce excellent sounds, the computational complexity required to

implement these models is prohibitive for commercial synthesizers. Also, the production of

high quality sounds requires the same degree of skill as that required to play the physical

instrument, and does not easily allow for creative design of new sounds.

In recent years, much effort has gone into the development of synthesis techniques

that implement physical models of acoustic instruments and produce sounds similar to

the instruments themselves. This work has been bolstered in the last decade by the devel-

opment of digital filters that, when excited, simulate physically vibrating strings ([8], [I 1]).

Researchers have attempted to use these easily-implemented digital filters in combination

with other filters and techniques to produce a wide range of realistic sounds [7].

The following sections will discuss frequency modulation (FM) and linear-additive (LA)

synthesis, both examples of artificial spectral synthesis, samplers and sample playback,

and current physical modeling methods. Emphasis is placed on comparing and contrast-

ing these different methods based on the quality of the sound produced for an acoustic

guitar, ease of creation of realistic sounds, flexibility in use, and computational complexity.

egieon

4 of 49


2.1 FM Synthesis

Frequency modulation (FM) synthesis, pioneered by Dr. John Chowning of Stanford

1973, was made a commercial success by the Yamaha DX7, first introduced in 1980 [15].

FM synthesis works on a principle similar to FM radio: a cosine waveform, called the car-

rier, has a frequency that is dependent on another source. The DX7 has as its basic unit a

simple block, consisting of an oscillator and a time-varying amplifier whose gain is con-

trolled by an attack-decay-sustain-release (A/D/S/R) envelope, as shown in Figure 2.1.

InputWaveform(reflectinginput fromsimilar FM

blocks)Carrier

Frequency

TNDIS/REnvelope

Output

Figure 2.1.1 DX7 FM Synthesis Unit

By combining six of these units in varying combinations, a wide range of sounds can be

produced. As a simple example, consider cascading two such units. The first is set to pro-

duce a cosine wave at 75% of the maximum possible amplitude and to have a carrier fre-

quency twice as high as the second unit, which is scaled to be proportional to the key

pressed on the synthesizer. The resulting output is given by:

y [n] = cos (ton + 0.75cos (2ran))

The resulting y[n] resembles a square wave, although it does not have as many har-

monics as a true square wave, nor are the harmonics at exactly the right height. This is

the main drawback of FM synthesis: the inability to accurately reproduce a desired har-

monic spectrum. Perceived acoustical accuracy in instrument sounds can only be pro-

duced by trial and error, varying FM parameters until the synthesized sound appears

reasonably close to a desired instrumental sound (based on listening tests). The main

egie s of 49ion


advantage of FM synthesis, on the other hand, is the ability to produce rich spectra that

cannot be produced any other way. FM synthesis has been used in many other Yamaha

synthesizers after the introduction of the DX7.

The acoustical accuracy of an FM synthesizer, when compared to an actual instrument,

is clearly very poor. Using FM techniques, it is quite difficult to produce a specific desired

harmonic relationship since the analysis is nonlinear and cannot be automated. However,

FM synthesis is extremely easy to implement on a single chip and does produce harmoni-

cally rich sounds.

2.2 Linear-Additive Synthesis

The linear-additive (LA) synthesis method is found in Roland synthesizers. This method

works on the principle that a low-pass filtering of standard waveforms available on func-

tion generators, such as a triangle or square wave, can produce a variety of waveforms

having less power in high harmonics. This technique is commonly known as subtractive

synthesis. The desired output sound is formed by a linear combination of several

smoothed waveforms. If the filters have a time-varying cutoff frequency then a variety of

sounds can be produced.

The Roland D-50 is an example of a synthesizer that uses LA synthesis. The D-50 has

four tone generators; each tone generator can be either a PCM sample playback unit or a

filtered waveform (synthesized). The synthesized tone generator has a time-varying low-

pass filter, in which the cutoff frequency varies with time according to an ADSR envelope,

and a time-varying amplifier, in which the gain is controlled by an ADSR envelope. The

PCM wave generator produces pre-recorded samples at the desired pitch and is followed by

a time-varying amplifier in which the gain is controlled by an ADSR envelope. Two tone

generators (called patches) are grouped into a tone structure, and two tone structures

make a sound. Tones are stereo constructs, and can have different reverberation and

~~oe~ie s of 49


crossfade arrangements. The LA algorithm for the Roland D-50 synthesizer is shown in

Figure 2.2. I.

Using LA synthesis, a plucked guitar sound is created by combining a "pluck" PCM

sample and a filtered triangle wave. The resulting output is harmonically rich since the trl-

angle wave has many harmonics and sounds similar to an actual instrument since a trian-

gle wave is a possible waveform for a plucked string. The PCM waveform mimics the attack

characteristics. However, the spectra and temperament of a specific instrument is impossi-

ble to capture exactly since only the waveforms produced by the oscillators are usable.

Computationally, the LA algorithm is quite simple, and can be implemented with a DSP or

specialized hardware.

I-Patches

Synthesized

Synthesized

IorPCMW~,wl I

Output

Figure 2.2.1 Roland D-50 LA Synthesis

2.3 Samplers and Sample Playback

A sampler works on a simple principle: record a sound digitally, and then play the sam-

ple back at varying rates. Samplers are becoming more commonplace in the music indus-

try, and sampled-data synthesizers (often called wavetable synthesizers) are also

appearing. Samplers record data and use it immediately for playback, while the wavetable

ie7 of 49


synthesizers have several waveforms stored in ROM and use a sample loop to extend the

sound duration.

The beauty of a sampler is that the sound reproduction is almost exact. What was

played will be faithfully reproduced when desired, and at different pitches--within reason.

Using a 440-Hz tone (A above middle C) recorded at 12 kHz used to generate a low A of

Hz requires a factor of 8 decrease in the playback rate (yielding 1.5 kHz) since 55

is one-eighth of 440 Hz. The change in playback rate relative to sample rate introduces

distortion: sharp transient effects become less transient (e.g. a sharp pistol shot becomes a

sustained cannon roar), and acoustic properties of instruments become distorted. Within

reasonable limits, samplers can accurately reproduce the instrument recorded.

The major limitation of sampling technology is that instrumental versatility is lost.

There is only so much one can do to a recorded sound to generate interesting, realistic

effects. To have a variety of instrumental playing styles, the instrument must be recorded

playing in each style.

Samplers are very computationally inexpensive, but do require large amounts of mem-

ory, which has become relatively cost-effective in recent years. The ability of the musician

to create realistically expressive sounds is constrained only by the number of instrument

sounds able to be sampled and by the limitations of pitch shifting.

2.4 Computational Physical Modeling Efforts

Current efforts (most notably [I 4]) at using physical models to generate acoustic guitar

sounds focus on two aspects of the guitar: the plucked string signal, and the guitar body

system. These methods attempt to capture the essence of the guitar system without over-

burdening the calculation requirements of the overall system. The method in [7] uses a

waveguide string model with Lagrange interpolaUon to implement non-integer periods and

uses a linear prediction error input to capture transient effects. The effect of the guitar

8 of 49


body on the various harmonics is modeled by a digital filter in the waveguide feedback

loop.

This method is relatively computationally inexpensive. Up to 3000 linear prediction

error samples are stored per note, and four multiplication and addition operations per

sample are required to produce output. An entire synthesizer was implemented on a single

DSP board in software, requiring no specialized hardware. The sound quality is extremely

accurate since the inputs and the model parameters are determined from the characteris-

tics of the guitar.

2.5 Summary of Current Synthesis Techniques

Below is a table summarizing the computational complexity, acoustic accuracy, and

the ease of designing a synthesizer sound to produce the sounds of a specific instrument

(Le. matching an acousUc prototype).

ComputationalComplexity

Computational Physical Modeling

AcousticAccuracy

Ease ofMatchingAcousticPrototype

FM Synthesis Simple: done in hard- Poor Poorware

LA Synthesis Simple Good GoodSampling Simple Excellent Excellent

Fair Excellent Excellent

Sampling and computational physical modeling produce the best sounds and have the

best ability to match an acoustic prototype. While sampling is a less complex method, it

has inherent limitations that are overcome by physical modeling. These limitations are:

instrumental variety (sampling requires that samples of every instrument to be reproduced

be stored); instrumental versatility [sampling only allows one playing style for an instru-

ment, Le. the style in which the recording was generated, and can only change by storing

egieon

9 of 49

The Acoustic Guitar

more recordings); and acoustic accuracy (sampling loses accuracy as the samples are

pitch-shlfted further from the recording rate, and so require more samples to cover the

possible range of an instrument). Physical modeling allows a performer to change model

parameters "on the fly" and, although requiring some knowledge of what the model param-

eters affect, does not require the performer to be an expert on the actual instrument to

produce accurate, realistic sounds.

3. The Acoustic Guitar

Guitar Body

Neck and Fretboard

Figure 3.0.1 Acoustic Guitar Anatomy

The instrument of interest is the acoustic guitar, which has the standard construction

and terminology as shown in Figure 3.0.1. Acoustic guitars come in a variety of styles,

from classical to folk, and each guitar is subtly different due to variations in the construc-

tion of the guitar body. Guitar strings are either nylon or steel. The guitar body is roughly

hourglass in shape, with a round hole in the middle of the top plate of the guitar body. The

strings of the guitar run along the neck of the guitar to the bridge, which is positioned just

below the sound hole in the top plate.

3.1 Guitar Physics: The String

The standard guitar has six strings, each of similar length, but having different density

and tension. These differences produce the observed changes in frequency from one string

negielon

10 of 49

The Acoustic Guitar

to the next. We are particularly concerned with how the string behaves in response to

being plucked. The waveforms it produces have important implications for the quality of

the sound.

Waveforms emanating from the guitar string can be modeled as standing waves on a

medium with fixed ends [4]. The governing wave equation is shown in (2), where T is the

string tension mad ~t is the string density.

2 2Oy_ TOyOt

(2)

Solutions to this equation are known to have the form of (3) below.

y(t, x) = ~ Cnsin (Oant+ ¢~n) sin (knx)

Each Cn component measures the relative energy at the harmonics, o~rt. Note that these

solutions take the form of sinusoids in both space and time. The final waveform shape is

dictated by the initial deformation imposed--the pluck or striking of the stringmand the

changes imposed by the guitar body.

If we measure the output of an oscillating string over time, for example by recording a

plucked string, we only observe the response as a function of time. This response is

defined as the ideal plucked string response in that the pluck-point is modeled as an infi-

nltely sharp bend in the string.

The set of parameters Crt and (~n for an ideal response can be computed as in (4), where

L is the reciprocal of the proportion of the string length from the point where the string was

plucked to the bridge (e.g., 1/5 yields L=5) and h is the initial displacement.

egieon

11 of 49

The Acoustic Guitar

(Dn=

L2h .Ca = 2~-~2sln~ ~-), l~n<~

(4)

Such an Ideal string shape is shown in Figure 3. I. I for a pluck-point I/4 (L = 4) of the way

along the string.

0.2~

0.2

0.15

0.1

0.05

o ....... :- .........V’r"~ ......... ~ ........

-0.05 ........................I,...! ......... i...Halt,

-0.1 ......................... i .............: .........I

-0.15 ......................... i .......................

-0.2~, ........ : ......... .-....I ....................... :

One Pedod Of Icml Plumed String, L=,4

........ : .......... : ......i..; ......... ; ........

t

0 O.S 1 1.5 2 2.5 3 3.5 4 4.5Time x 10"~

Figure 3.1.1 Ideal Plucked String Resix~nse

Note that the Ca harmonic amplitudes fall off at 1/n 2, resulting in low power at high

harmonic frequencies. The sine term at n~/L allows no energy at the Lth harmonic fre-

quency, nor at integer multiples of that frequency. The overall shape of the spectrum is

dictated by L. For integer values of L, nulls occur at harmonic frequencies that are multi-

ples of L. For non-integer values of L, nulls in the spectral envelope occur at frequencies

that are not necessarily related to the harmonic frequencies; e.g. a value of L of 3.5 would

yield observable nulls at the 7th and 14th harmonics, but the spectral envelope null at the

"3.5th" and "10.5th" would not be as easily observed from the harmonic content of the

spectrum.

Of course, the amplitude of oscillations of a guitar string decay with time. This implies

egie 12 of 49on

Waveguide String Model

friction, or energy loss, from the system. This loss is best described as an output phenom-

enon--energy from the string is transferred to the air and to the guitar body.

A guitar string can vibrate in a three dimensional space, since it is suspended in air

between two endpoints, so there are generally more vibrations than represented by (4),

which describes vibration in one dimension only. A string tone can be thought of as a com-

bination of the two vibrational modes, one normal to the guitar body, called vertical, and

one parallel to the guitar body, called horizontal [4]. The coupling of the two modes is non-

linear and cannot be modeled completely, since it is dependent upon the guitar construc-

tion. For our purposes, however, it is adequate to assume that the two modes are

independent. Each mode is excited by a single guitar pluck, with some energy transferred

to each mode.

Each mode has different interactions with the bridge and the top plate of the guitar

body. The guitar tone for pure vertical plucking directions decays quite sharply, whereas

the tone for purely horizontal plucking decays quite slowly. The overall plucked string

response can be modeled as the sum of these two decaying modes, ignoring nonlinear cou-

pling. The relative amount of excitation transferred to horizontal and vertical modes varies

with the angle of the pluck.

4. Waveguide String Model

This section deveIops the digital string model solution to the wave equation and the

refinements necessary to allow for non-integer periods. This model, implemented as a digi-

tal filter, is used for both horizontal and vertical mode string oscillations.

4.1 Basic String Model

A digital simulation of a vibrating string can be constructed using digital waveguide

techniques, as in [I 1], yielding a general form for a string model, as shown in Figure 4.1. I.

egieon

13 of 49


The expression g/V groups all energy loss in the string into one expression. Energy lost

generally goes through the bridge to the guitar body. The difference equation describing

the system is

y[n] = g’Vy[n-N] +x[n] ts~

which yields a z-transform representation:

1H(z) N-N1--g Z

x[n]~ Nsamplesdelay I yln-N]

Figure 4.1.1 General String Loopback Filter Model.

This filter essentially copies the signal sample values from N samples ago and multi-

plies by a decay factor of gN < 1. To generate a decaying oscillatory response, the delay line

is initialized to all zeros, x[n] introduces the first period of the oscillating signal into the

delay line, and the recursion equation produces the remainder of the response with gN act-

ing as the decay rate (per period).

More general responses, including non-integer periods and frequency-dependent warp-

ing, can be achieved by replacing gN with a feedback FIR filter h/In] having z-transform

Hi(z). The overall feedback system is then governed by the difference equation

y (n) = u In] + t In] ,y In -N]

where u[n] is an input to the system and y[n] is the resulting output. This more general

string model enables frequency-dependent losses with decay factors included in h/[n].

14 of 49


4.2 Non-integer Periods

A specific limitation of [ ] 1] is that only frequencies w~th integer periods can be repre-

sented. For very low frequencies, this is not a terrible restriction. At higher harmonic fre-

quencies, however, an integer period approximation causes increasing error in

representable pitch, as shown in Figure 4.2.1.

100o0

Figure 4.2.1 Effects of Integer-Only Periods

Allowing only integer pitch periods is quite restrictive; however, non-integer periods can

pose quite a problem [6]. A simple solution is to interpolate between the samples at the

integer delays to approximate the signal value one non-integer period ago. One choice for

Hi(z) is a Lagrange interpolation filter, with the constraint that the sum of the filter coeffi-

cients be equal to the desired decay rate gN. Assume that the non-integer period T = N + x~

where 0<x<I. Then the current sample value can be generated from the values in the prior

period by interpolaUng between the integer samples by defining an interpolation filter:

M/2

l-~l ( Z) ~ O~iz-i

Mi=-’~+ l

where ai is a Lagrange interpolaUon coefficient given by

egieion

15 of 49


M/2

¢xi (x) = I-1 x -__.ij (~i-]jf--~+ l,j~i

and M is the number of coefficients (an even number). Note that the index i = 0 corre-

sponds to an integer delay of N. While Hi(z) is non-causal, it is only used at delay Nso that

the overall feedback loop filter is causal. We have found that L=6 is sufficiently accurate for

providing reasonable sound using a 44.1 kHz sampling rate and a usable bandwidth of 11

kHz.

0.5 1 1.5 2 2.5 3 3.5Frequency ~n rarYsec

Figure 4.2.2 Interpolation Loop Filter Frequency Response

In frequency domain, the interpolaUon loop filter looks like a periodic series of peaks,

inverted notch filters, with a peak appearing at each harmonic, as shown in Figure 4.2.3.

The Lagrange interpolation introduces a lowpass filtering effect onto the string model, as is

clearly visible in Figure 4.2.2.

It should be noted that the string model as shown enhances energy at harmonic fre-

quencies relative to that at non-harmonic frequencies. This effect is evident from the fre-

quency magnitude spectrum in Figure 4.2.3; signals at non-harmonic frequencies are

attenuated.

~~oegien

16 of 49


This string model is ver~ similar to the Karplus-Strong model, as presented in [5], [8],

and [12], where either white noise or an ideal string waveform is used as excitation to the

model. The KS model removes energy at non-harmonic frequencies resulting in a set of

harmonic tones. The LPF operation of the averaging filter helps to attenuate unwanted

aperiodic wide-band energy in the synthesized signal. The resulting output has an initial

burst of white noise that fades rapidly into a decaying tone. Although the random excita-

tion produces interesting sounds, it is not well-suited to make real-sounding synthetic

instruments because the signal energy is randomly distributed at each harmonic, rather

than exhibiting the spectral structure enforced by physical constraints.

Figure 4.2.3 Close-up of String Frequency Response

4.3 Waveforms on the String Model

The ideal string shape, as represented in (3), contains only harmonic frequencies. If the

ideal string shape can be synthesized for one period and used as an excitation to the string

model, the resulting output essentially will be copies of the first period with continuing

exponential attenuation. [I 2]

If the waveform input to the string model contains only harmonic frequencies, the only

effect of the string model will be the attenuation of the input signal. For example, if a sinu-

soidal waveform with frequency fo is fed into the string model, all that will be apparent is

egieon

17 of 49

Overall System Model

an exponential decay of the amplitude depending on the value of gN. The frequency of the

signal will not change, and additional harmonic components will not appear.

5. Overall System Model

By combining the physical intuition and digital modeling techniques presented thus

far, we developed a computationally efficient and physically-based model for the guitar.

This model is an extension of that presented in [ 14]; here the model parameters from sam-

pled guitar string data, and the steady-state response of the guitar is analyzed separately

from the transient vertical response. The method in [14] makes several compromises in

accurate sound blending and overall reverberation, both of which are critical to generating

sounds characteristic of the guitar and to distinguish between different guitars. The

method presented in this report takes advantage of an advanced IIR filter design algorithm

to make a resonant filter that provides excellent mixing of the guitar string sounds and

resonance based on the physical characteristics of each specific guitar body.

The overall system model used is shown in Figure 5.0.1, with the excitations forming

the input and the guitar sound as the output. HI(Z) is a Lagrange interpolation filter, and T

is an integer period.

Horizontal Excitation

Vertical Excitation

:Horizontal~’~ |;[ String Model ]___]

~i

Guitar Body ~Model Synthesized

GuitarHa(z) Sound

Figure 5.0.1 Overall Synthesis Model

There are several important components to this model:

¯ Base interpolation filter HI(Z)

¯ Filter delay T (the integer pitch period)

oegi e 18 of 49

Overall System Model

¯ Filter decay rates p 1 and 92, the 9N values for the two modes

¯ Horizontal excitation

¯ Vertical excitation

¯ Guitar body model H3(z)

¯ Wet and dry gains, which control the sound energy ratio between direct string radi-ation and the resonant body

The fundamental frequency of the sampled data is used to determine T and HI(z). The

parameters p I and P2 are determined from the envelope of the sampled data and make the

interpolation filters Hi(z) and H2(z). pl is found from the steady-state decay, as shown

Figure 5.0.2, and P2 is found from the transient decay. H3(z), found by fitting an IIR model

to a sampled guitar body impulse response, provides reverberation and sound blending.

The horizontal excitation is a single period of the steady-state response of the guitar,

which in general is a short-time stationary waveform that has some variation in frequency,

amplitude, and harmonic content. The vertical excitation is a linear prediction error of the

beginning of the sampled data and captures the transient response of the guitar pluck.

1500

1000

-1000

-150O0

Figure 5.0.2

Horizontal and Vertical Mode Exponential Fits

Vertical Decay:. 0.9~1934

Vertical Mode Horizontal Decay:. 0.99¢991

0.5 1 1.5 2 2.5Time

Horizontal and Vertical Modes of Sampled Guitar String

egieon

19 of 49

Analysis: Estimating String Model Parameters

6. Analysis: Estimating String Model Parameters

From sampled guitar string data, the various model parameters (frequency, decay rates

for vertical and horizontal modes, and excitations) are found. The following sections detail

the analysis procedure, illustrated in Figure 6.0.1. Bold boxes contain the results of the

analysis.

JSarn,oledGuitarString

. Data

.~}._~ Frequenc~Spectrum

--~ Autocorrelation ]

Enveiope---+lDetectionl

~Pluck Point /FundamentalI ~ J I . HorizontallFrequency J_.~ ~ ~ ’~Exc~ation]

L~ Linear I I Vertical ]L ~ Pre~tor~Excitation~

Figure 6.0.1 Analysis Outline

6.1 Note Frequency

Ideally, the pitch of each note has a strict mathematical relaUonship to the 440 Hz A.

An actual sampled sound is not likely to be of ideal pitch. Since the excitation calculation

depends upon matching the harmonic content of the signal, we must first match the fre-

quency of the sampled note.

Many techniques exist to determine the fundamental frequency of a note. Most of

these, however, are interested in only an integer approximation to the period. Since we

want a precise frequency estimate, interpolations on the integer results must be obtained.

Two methods for this are interpolation of autocorrelation values and interpolation of peaks

in the frequency spectrum.

The note frequency can be determined from the conUnuous autocorrelation function

egie 20 of 49


rx~(~) of the sampled data. We assume that the signal is wlde-sense stationary for the data

analyzed. Because the signal is periodic, the autocorrelation is also periodic; finding z for

the largest value of rx~(~) for ~ > 0 yields the period of the note, and hence the frequency.

Since a precise note frequency is not likely to be captured by the discrete autocorrela-

tion function ~xx[m] obtained from sampled data, interpolation must be done to increase

the accuracy of the frequency estimate. A second order polynomial is fit to the autocorrela-

tion function using the points T- 1, T, and T + 1, where T is the integer index of the first

peak of ~xx[m]. The maximum of this polynomial then is defined as the maximum of the

continuous autocorrelation function r~o~(z), and the corresponding argument ~ is assumed

to be the pitch period as illustrated in Figure 6. I. 1.

An alternative method involves interpolation on the samples of detected peaks in the

DFT spectrum. A rough guess at the frequency is made using the integer approximation to

the period from the autocorrelation, and then a peak in the frequency spectrum is detected

near this value as shown in Figure 6.1.2. The windowing function introduces local smooth-

ness to the data, so a second order polynomial approximation is used.

The autocorrelation method is less likely to have a gross error in the frequency calcula-

tion, in that there is only one peak of interest possible so that the detection problem is

quite simple. If no guess is made of the initial frequency, the frequency spectrum interpo-

lation could potentially pick the wrong peak, even if there is a large SNR and small spuri-

ous data peaks.

For the examples shown in Figure 6.1.1 and Figure 6.1.2, the signal used is a cosine

wave at 198 Hz. The autocorrelation interpolation returned a frequency of 198.071235 Hz,

an error of 0.071235 Hz; the frequency spectrum interpolation returned a frequency of

197.98821 Hz, an error of 0.011789279 Hz. For a discussion of the accuracy and regions

of convergence for the two methods, see Appendix B.

egieon

21 of 49


Aulocon~a|Orl FurtiVe1 and Intorpo~l~on Interpolation on F~uon~

~1 ~15 ~ ~ 5 ~3 ~3 5 ~4 ~4 5 ~Ped~ Fr~u~ ~ HZ

Rgure6.1.1 Inte~lation onCorrelation Function Figure 6.1.2 Inte~lation on Fr~ue~y Sp~trum

The fundamental frequency is used to determine the pitch period and the Lagrange

interpolator Hi(z). The interpolator is then used in the linear predictor and in the creation

of the ideal string response.

6.2 Decay Rates and Initial Amplitudes

The decay rates and initial string amplitudes of the horizontal and vertical responses

can be found by a multistep process. Consider the sampled ’G’ string data shown in

Figure 6.2.1. The desired result for the horizontal response is the initial condition and

decay rate that best fit the envelope of the signal towards the end of the sample, in this

case from time 0.5 to 1.0 seconds. This choice of samples is made on the assumption that

the vertical response decays to a negligible amount by this point in time.

The decay process occurs once per period, so the model of the envelope is given by

n/Ty[n] = C(a)

where T is the period of the note in samples, a is the decay rate, and C is the initial value

of the exponential fit (for n = 0). This model is chosen since it best fits the behavior of the

~~ egie 22 of 49


string model, which introduces a decay of gN once a period. The values of a and C can be

estimated from the data by first performing envelope detection (since the envelope con-

tains the decay information) and then taking logarithms:

log (y In] ) = log (C) + ~3og

so we can express ~ = log(y) as a linear combination of ~ = log(C] and ~ = Iog(a].

N is the number of samples occurring between 0.5 and 1.0 sec. The matrix A is an

Nx 2 matrix containing a column of all 1’s and a column of values of n/T, where n ranges

from 0.5fs to fs. The vector x is a 2 x 1 matrix with elements ~ and ~- The variable e repre-

sents error between the model and the sampled data, and is represented as an additive

term.

~, = Ax + e 02)

The ideal C and a are computed in x to minimize the squared error between the esti-

mate Ax and ~, that is, minimizes eTe:

Train (e e) = rnin [ (y-Ax) r (y-Ax) (13)

Solving for x yields the least-squares estimate:

~ = (ATA)-IAT~ 0~)

and the optimal ~ and ~ can be found from the elements of x~ C = e~ is the initial value of

the horizontal response, and a = e~ is the decay rate p I for the horizontal response.

The horizontal decay rate is used to create the ideal string response having the desired

egieon

23 of 49


decay rate and is also used in synthesizing the desired horizontal response.

4.6

4.8

-1~ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9T~e

Figure 6.2.1 Sampled G String

6.3 Horizontal String Model Excitation

Using the "copy" property of string models presented in section 4.3, we can formulate

the horizontal excitation as a single period of a representative portion of the steady-state

string response. The steady-state behavior is best determined from the frequency spec-

trum of the sampled data, so the excitation is formed by a Fourier Series; the Fourier

Series coefficients are determined from the analysis of the spectrum. By assuming the

body of the guitar provides only amplification to the harmonics of the string, warping them

from the ideal response, the actual information determined in the analysis procedure is

the magnitude and phase change for each harmonic from the ideal string Fourier Series,

given by (4), to the sampled string frequency spectrum.

Referring to Figure 6.0.1, three pieces of information are needed for the horizontal

string excitation: the pluck-point, the sampled-data harmonics, and the ideal string har-

monics. The sampled-data harmonics are found from a section of the guitar steady-state

response and then used to identify the opUmal pluck-point.

l egie 24 of 49


A short-time statlonar~ section of the sampled waveform is chosen for analysis. The

harmonics are identified in the signal spectrum, as shown in Figure 6.3.1. This spectrum

was generated by windowing the waveform and performing a very long FFT--65536

points--resulting in 0.67 Hz / bin resolution. Since the window length is on the order of

10,000 samples, the width of the main lobe of the peak in the spectrum is at most 26 Hz,

so two peaks in the spectrum will not overlap (recall that the lowest frequency on a guitar

is 82 Hz). The harmonic peaks are extracted by searching for local maxima around the

expected frequencies. We assume that harmonics within 50 dB of the largest harmonic are

not significant perceptually. Analysis has determined that less than 25 harmonics are

needed to synthesize the horizontal excitation for each note from our sampled data. See

Appendix A for a listing of notes and significant harmonics.

The pluck-point, L in (4), is determined from the data by finding the value of L that

minimizes the absolute value of the error between the ideal string spectrum and the sam-

pled-dam spectrum. The [I norm shown in (15) yields accurate results in this application.

Detected Harmonic Components

1001

-2 0 I 5OO ’ ’ 4OO0Frequency in Hz

Figure 6.3.1 String Spectrum With Harmonics

negieIon

25 of 49


The values ha form a set of harmonics obtained from the spectral analysis of sampled

data as in Figure 6.3.1, and N is the number of significant harmonics.

(15)

An error surface for varied L is constructed by evaluating the error criterion for values of L

between 1 and 20 in intervals of 0.01, as illustrated in Figure 6.3.2, and the minimum of

this error is chosen as the pluck point. This method works well when a known pluck point

is to be estimated, and so we can only assume it works well in the case of sampled data.

Minimum Error

1,4

1.2

1

~0.8

0.6

0,4

O0 2 4 6 8 10 12 14 16 ’~ 20Guess at L

Figure 6.3.2 Error For Varying L

Referring to Figure 6.0.1, once L is determined, an ideal string response using the esti-

mated initial amplitude and decay rate is calculated. The harmonics of the ideal response

are compared to the harmonics of the sampled data (using the same sample points for the

spectral analysis), and the magnitude and phase changes for each harmonic are calcu-

lated. These changes represent the response of the guitar body and any non-idealities in

the guitar string.

The magnitude and phase of the change from ideal string to sampled data for each har-

egieon

26 of 49


monic is the only information that needs to be stored. This means that a waveform shape

can be characterized by a maximum of 50 numbers. The waveform can then be recreated

by incorporating these changes in the excitation calculation---evaluating the Fourier Series

by using the deal waveform shape from (4) and changing each a as necessary. Since t he

Fourier Series can evaluated for any fundamental frequency, the waveform shape can be

calculated for any note as well.

An alternative to the Fourier Series would be to design a digital filter that matches the

measured magnitude and phase of the change at each harmonic frequency for a number of

notes, but this has proven to be extremely difficult since the composite changes from anal-

ysis of different notes on the same string have inconsistent results. See Appendix C for a

discussion of the analysis results.

6.4 Vertical String Model and Excitation

Referring to Figure 6.0.1, the vertical excitation depends on the decay rates and the

sampled data. The same string model as was used for the horizontal response is used for

the vertical response, but with a different decay rate. The main difference between the ver-

tical and horizontal responses is that the vertical is of much shorter duration due to its

rapid decay rate, and thus must be identified from the initial segment of the signal.

The beginning of the sampled signal is used for analysis. The horizontal response is

subtracted so that, when the outputs of the two string models are added together, the

sampled string response is best approximated. The decay rate of the vertical response is

calculated in a similar fashion to the horizontal decay rate as detailed in section 6.2. In

this case, however, the starting point is chosen to be the maximum of the vertical response

since this initial value is available (in contrast to the horizontal response, where the initial

value must be estimated).

The string model is inverted and used as a linear predictor, as in [I I]. The prediction

egieon

27 of 49

I

Analysis: Estimating A Linear Body Model

error equation is given by (16),

e[n] = x[n] -h[n]*x[n-N] 0~

where h[n] is the sixth-order Lagrange interpolator (8) multiplied by the vertical decay rate

and xln] is the initial segment of the sampled string data.

The prediction error signal, if fed into the string model in its entirety, would allow the

exact signal to be recreated. Since storing error signals becomes prohibitive, some trunca-

tion is necessary. Fortunately, the error itself dies away exponentially, so a representative

version of it can be chosen. As an example, approximately 5000 samples of the error signal

are adequate to recreate the sampled G string.

7. Analysis: Estimating A Linear Body Model

The main motivation for a model of the guitar body is to provide resonance and sound

blending of the string sounds, similar to the acoustics of a resonant cavity, and to provide

a more physically-based method for the design of such a filter. Although an all-pass rever-

berator system provides good resonance and sound blending, it has little correlation with

the behavior of a physical instrument. Note that the exact spectral shaping of the guitar

body is not required since the horizontal and vertical excitation contain this information.

7.1 Difficulties In Body Modeling

Previous reported attempts at modeling the body of an acoustic guitar with a linear fil-

ter ([14], 17]) report that either a very long FIR filter (1000+ taps) or a very high order

pole model (200+ poles) are required to adequately model the impulse response of a guitar

body. These filters cannot be used for real-time synthesis since they are prohibitively com-

putationally expensive. Using IIR filters (with both poles and zeros) yields a smaller model.

Mean-square optimal IIR filters are difficult to design since the optimal coefficients are

I oegie2e of 49

n


described by systems of nonlinear equations, the designed filter approximation has exces-

sive spectral smoothing due to the mean-square curve fits, and hlgh-order filters succumb

to fixed-preclslon numerical computation effects and become unstable. To alleviate some

of these problems, prior work in fIR modeling explored perceptual preprocessing and non-

linear optimization methods with limited success.

The current method uses the design algorithm proposed in [2]. Good real-time and

numerically robust performance is obtained by using parallel fourth order filters designed

with a hierarchical algorithm; this algorithm solves problems with multiple equivalent

solutions and captures spectral peaks and dips using simple filters and an iterated least-

squares procedure.

7.2 IIR Design Algorithm

The response of the guitar body is experimentally determined from a sampled impulse

response h[n]. The guitar is "thumped" on the bridge by a sharp instrument, and the

resulting sound is recorded with high-fidelity microphones and sampled at 44.1 kHz with

16 bit quantization. The sampled data is then used as an impulse response to an IIR filter

design algorithm [2]. Three fourth-order parallel filter structures are designed from the

impulse response using the following procedure: the filter is designed from the input

impulse response; the impulse response of the filter is calculated, and subtracted from the

+ Algorithm 1.11 IIR Filterl I Response ~

+

Figure 7.2.1 Parallel Filter Design Algorithm.

oegie 29 of 49


~ (Hz)

2000 3000 4000 ~00fr~pe,’w (Hz)

Figure 7.2.2 Impulse Response FrequencySpectrum and IIRFilter Frequency Response

t

m d~t~

_lI I I I I I I I I I

-0.

Figure 7.23 Sampl~ Impulse Nes~nse andFi~er Impulse Res~nse

240 ~

P¢de= (’x’) end Zero~ (’o’) Of Body Filter~0

~70

Figure 7.2.4 Poles and Zeros Of Linear Body Filter

input response. This error response is then used as input to the design algorithm, and the

procedure is repeated, as shown in Figure 7.2.1. This method builds higher-order struc-

tures from very low order filters, and therefore the filter performance is very robust with

respect to filter coefficient quantization. The repetition of the procedure allows for captur-

ing most of the resonant frequencies using a MSE measure. Experimental results show

that fourth-order filter increments are best for characterizing the resonant modes of the

guitar body.

egieion

30 of 49

Example: Guitar ’G’ String Analysis

Consider the frequency spectrum in Figure 7.2.2, obtained from a sampled impulse

response, and the fIR filter response that was designed from it. The fIR filter picks out

most of the important resonant modes evident in the sampled impulse response. This IIR

filter has an impulse response that closely matches the sampled impulse response, as

shown in Figure 7.2.3.

Note that the impulse response of the IIR filter has the same characteristics as the

sampled impulse response, but does not have the high-frequency components of the sam-

pled impulse response. This IIR filter has the resonant properties of the impulse response,

and so will provide resonance to the string model sound. The poles and zeros of this filter

are, for the most part, within the unit circle, as shown in Figure 7.2.4; two very large zeros

are at angles 0 and n, not shown on the plot. Note that most of the poles and zeros are at

low frequency, since the interesting portion of the spectrum is at low frequency.

8. Example: Guitar ’G’ String Analysis

As an example of the analysis technique, the analysis of a sampled guitar ’G’ string

(196 Hz, ideally) is presented below.

8.1 Horizontal Excitation

The data was sampled at 44.1 kHz using 16 bits of resolution for one second. A plot of

the waveform is shown in Figure 8. I. I, which also shows the horizontal and vertical decay

rates calculated for this data. The two components are clearly visible: the horizontal mode

is best represented by the section 0.5 to 2 seconds, while the vertical mode is best repre-

sented by the section 0 to 0.25 seconds. Samples 35000 to 88200 are chosen for calcula-

tion of the horizontal excitation, and are shown in detail in Figure 8.1.2. Note that the

signal appears to be short-time stationary over this time period.

A 65,536-point DFT is computed using this data multiplied by a raised cosine window.

oegi e 31 of 49


The frequency of this note was estimated to be 204.187 Hz, which is close to the expected

196 Hz. The harmonics are clearly visible in Figure 8.1.3; the lines show where the first

twelve harmonics were expected based on the calculated frequency, and the ’o’s show

where a local maximum in the spectrum was found. From Figure 8.1.3, two things are

noted. First, twelve harmonics are significant since none are visible at frequencies above

2500 Hz. Secondly, there is an obvious spectral "null" at the eighth harmonic (around

1600 Hz)nbut there is no null at the fourth harmonic, where there would be if L in (4)

were 8. The optimal value of L came out to be 7.487, so a low point occurs at the eighth

harmonic, as expected.

Using the approach in section 6.2, the decay rate and initial amplitude of the horizon-

tal waveform were calculated to be 0.995 and 114.5 after normalizing the signal power

over one second. The vertical waveform was calculated to decay at 0.9519 and have an ini-

tial amplitude of 1255.1. The mixture between horizontal and vertical modes indicates that

the string was plucked in a mostly vertical direction and that the overall string sound has

a sharp "pluck" sound.

1~00Horlzonlal and Vertical Mode Exponential Fits

10~C Vedical Mode

Vedical Decay:. 0.951934

Horizontal Decay:. 0.994991

-10~ Hodzonml Mode

-15000 0.5 1 1.5 2 2.5

Time

Figure 8.1.1 Sampled Guitar ’G’ String And Exponential Curve Fit

oegien

32 of 49


0 8 0.805 0.51 0.815 0,82 0.825 0.83 0.835Time

Figure 8.1.2 Samples aSO00 to aTO00

-20~0 500 1000 1500 2000 2500 3000 3500 4000

Frequency in Hz

Figure 8.1.3 ’G’ String Spectrum

0 0,05

Figure 8.1.4

Change in dB: Ideal and Sampled String

0.1 0.15 0.2 0.25 0.3 0.35Frequency in rad/sec

Change In dB From Ideal String

1

0.8

O.6

0‘4

~.0‘~

-0.4

-1

Horizontal Exc~ati~: One Period Q 204.6 Hz -> 217 samples

20 40 60 80 100 120 140 160 180 200Sample Number (t * 44100)

Figure 8.1.5 Horizontal Exc/tation

In order to get the correct Fourier Series coefficients for the horizontal excitation, we

need to include the decay rate and other effects of the string model. We calculate an ideal

excitation using (4), and simulate the string response using the ideal excitation as input.

The ideal response is then compared to the sampled response at the 12 harmonics, and

the differences in magnitude and phase are calculated. The changes in spectral magnitude

from the ideal to the sampled string response is shown in Figure 8.1.4, with the ’x’s repre-

~~oegien

33 of 49


senting the change in dB at that harmonic.

By using the first twelve Fourier Series coefficients of the ideal excitation, and incorpo-

rating the changes in magnitude and phase just calculated, the horizontal waveform can

be calculated using the Fourier Series; the horizontal excitation is shown in Figure 8.1.5.

From this analysis, we have determined the sampled note frequency, decay rates for

the vertical and horizontal modes, and an excitation to the horizontal string model. The

vertical excitation is all that remains.

8.2 Vertical Excitation

The vertical excitation is the simplest to calculate since it is a linear prediction error

generated by filtering. The horizontal string response is subtracted from the sampled data

resulting in the signal x[n] shown in Figure 8.2.1.

The long-term prediction filter in (17) has the same coefficients as the vertical string

model, where a~ is the sixth-order Lagrange interpolation coefficient multiplied by the

exponential decay of the vertical mode.

3

e[n] = x[n] - ~.a tzix[n-T-i] (17)

The linear prediction error e[n] obtained from [17) is the vertical mode excitation. A plot

of this error for the ’G’ string is in Figure 8.2.2. The error signal has an exponential decay,

which is a good sign, implying that the linear prediction is a reasonable model. The first

0.11 seconds are chosen to represent the significant part of the error signal. Note that the

signal has a value close to zero at t = 0.11 sec.

The vertical excitation has a fairly high bandwidth (13 kHz before the spectrum is con-

sistenfly 50 dB from its maximum value). It is stored in a table to save computational costs

ien

34 of 49

Synthesis

in creating this error signal from a Fourier Series or other spectral representation.

Mode For Pmdielio~

-1 SO00 0.02 0.04 0.06 O.Oe 0 1

Figure 8.2.1 Data For Vertical Excitation Calculation

linear Prodictio~ Error of VeStal Mode

0.~ 0.~ 0.~ 0.~ 01.

Figure 8.2.2 Error Signal for ’G’ String

9. Synthesis

Synthesis is much easier than analysis; the excitations and filter coefficients are

retrieved from a table lookup or calculated, then are modified by parameters specified by

the user, and the excitations are then processed through the string model and guitar body

filters.

egieon

35 of 49

Synthesis

User-specified parameters allow for flexibility, expression, and creativity in synthesis.

For example, different plucking styles result in different string sounds, different string

types (nylon versus steel) change the overall tone, and note volume, pitch, and external

decay factors modify the overall sound as well. In addition, the choice of guitar body affects

the overall guitar tone.

9.1 String Type and Plucking Style

The easiest way to replicate the sounds generated by different playing methods is to

very the overall excitation: the percentage of each excitation (vertical and horizontal). For

example, a harsh, mostly vertical pluck would be composed mostly of a vertical excitation

with little horizontal; in contrast, a softer gentler excitation would have more equal por-

tions. The type of string determines the actual excitation used.

9.2 Volume

Volume is controlled by the excitation gain. Essentially, this gain corresponds to "how

hard was the string plucked"? Since we use a linear model, no distortion parameters, such

as saturation, distension, or spatial limitations, are considered.

9.3 Pitch

The desired pitch affects both the interpolation coefficient values and required excita-

t.ions. Interpolation coefficients are modified to account for the different pitches, and the

excitations are modified to have the necessary frequency characteristics associated with

each note. For example, to play a G# using the model obtained from sampled G note data,

the horizontal excitation is pitch shifted by a factor of 1.05946, a relatively simple opera-

tion requiring only recalculation of the Fourier Series using the new fundamental fre-

quency. The string model converts the vertical excitation to the correct pitch by removing

energy that is not at the harmonic frequencies.

oegie 36 of 49

Computational Analysis of Synthesis

9.4 Decay Rate

The string decay rate is used to reflect changes in the damping of the string caused by

either a different string type or playing style. A slower decay rate on the horizontal mode

allows the string to resonate for a longer period of time, while a shorter decay rate attenu-

ates the string sound more rapidly.

9.5 Body Model

The guitar sound is greatly affected by the choice of the guitar body. Without any guitar

body the strings sound disjoint and harsh. The sound blending provided by the guitar

body model combines the string sounds smoothly, and the resonant frequencies in the

body model add life and fullness to the string sounds. Different body models have different

characteristics, so a different body results in a different guitar sound. The synthesist has a

choice of body models depending on the type of guitar (e.g. folk, classical, Spanish, etc.)

and a choice of sound mixture to allow for different guitar tones.

10. Computational Analysis of Synthesis

Each sample calculation using this model requires 14 multiplications (six for each

interpolation, and one for the gain term on the input) and ] 4 additions (the two inputs and

the 12 interpolation results). This is extremely cheap. The string models need to be dupli-

cated for polyphonic synthesis; six string models running in parallel require 84 multiplica-

tions and 84 additions. The linear body model requires 26 multiplications and 26

additions, for a total of 110 multiplies and 110 additions. 220 floating point operations at

44. I kHz requires 9.7 Mflops, a reasonable computational load for a DSP or general pur-

pose processor.

There is, however, the overhead in calculating and/or storing the excitations. If it is

possible to do the excitation calculations in real-time, then only the data for the basic six

egie 37 of 49

Conclusion

strings needs to be stored. Realistically, the horizontal and vertical excitations for each

note to be played would be stored in a table and looked up as necessary. At a maximum of

5000 16-bit samples per vertical excitation, and 14 excitations (allowing for poor interpola-

tion on lower frequency notes), plus 120 horizontal excitations (20 frets per string times six

strings) at an average of 176 samples each, a single instrument requires 140 kB of mem-

ory. This allows for the storage of a greater variety of excitations, with very interesting pos-

slbflities. Instead of having one acoustic guitar on a synthesizer, it would be possible to

have a folk guitar, a classical guitar, an unplugged electric guitar, and many others.

11. Conclusion

This thesis has presented a technique using analysis, synthesis, and physically-based

computational modeling to cost-effectively synthesize realistic acoustic guitar sounds from

prior analysis of sampled data. Since the physical modeling was based on easily-obtained

instrument information (the plucked-string sound and the "thump" of the instrument

body), this technique is not limited to guitars only, and can be applied to almost any

plucked-string instrument, such as harpsichords, string basses, and pizzicatto violins. A

variety of parameters are available to the performer to change the characteristics of the

sound produced.

Since the computational requirements per sample are quite small, this system can be

easily implemented in software and run in real-time. Although there is a considerable

overhead in calculation and storage of the excitations to the model, the resulting output is

extremely realistic when compared to artificial methods like FM and LA synthesis, and is

comparable to the sampling and other computational modeling methods in terms of overall

quality, computational complexity, and data storage. This method has more variety in the

number of sounds it can produce compared to sampling since the parameters of the syn-

thesis algorithm can be changed readily to produce new guitar sounds from the same

~~ egieon

38 of 49

References

input data. In addition, the analysis is automated, so new instruments can be designed

very easily and quickly, allowing for more accurate synthesis of different guitars.

Future work in realistic synthesis is in: modeling non-linear string properties; incorpo-

rating other guitar string effects, like inter-string frequency stimulation ("wol~’ notes] and

beat patterns; including the effect of the guitar body in the analysis procedure with an

inverse of the body filter; applying the analysis and synthesis techniques to other instru-

ments; implementing the synthesis procedure in a real-time environment and providing a

user interface to the model parameters; modeling the vertical excitation as a combination

of deterministic and stochastic signals for further compression; and investigating the

response of the guitar body in greater detail.

At the present time, the synthesis produces good guitar notes, but a usable synthesizer

is not yet sufficiently advanced to produce high-quality real-time synthesis. More work in

interfacing to the algorithm is required to make it a viable synthesis method.

12. References

[1] Borin, G., et. al. Sound Synthesis by Dynamic Systems Interaction. From Readings inComputer Generated Music, D. Baggi, Ed. IEEE Computer Society Press, 1992.

[2] Cheng, M. Analysis of Least-Squares Approaches With Applications For Pole-Zero Model-ing. Ph.D. Thesis, Carnegie Mellon University, 1995.

[3] Dorf, Richard H. Electronic Musical Instruments, Third Edition. New York: Radiofile,1968.

[4] Fletcher, N.H. and Rossing, T.D. The Physics of Musical Instruments. New York:Springer-Verlag, 199 I.

[5] Jaffe, D. A. and Smith, J.O. Ill. Extensions of the Karplus-Strong Plucked-String Algo-rithm. Computer Music Journal, Vol. 7, No. 2. MIT Press, 1983.

[6] Karjalainen, M. and Laine, U. A Model for Real-Time Sound Synthesis of Guitar On aFloating-Point Signal Processor. IEEE Transactions on Signal Processing, 1991.

[7] Karjalainen, M. et aL Towards High-Quality Sound Synthesis of the Guitar and StringInstruments. Proceedings of the ICMC, 1993. pg. 56-63

egieon

39 of 49

Appendix A: Note Information

[81 Karplus, K. and Strong, .& Digital Synthesis of Plucked-String and Drum Timbres. Com-puter Music Journal, Vol. 7, No. 2. MIT Press, 1983.

[9] Marple, S. Lawrence Jr. D/g/ta/SpectraIAnalysis. Englewood Cliffs: Prentice-Hall, Inc.,1987.

[10] Oppenheim, Alan V. and Schafer, Ronald W. Discrete-Time Signal Processing. Engle-wood Cliffs: Prentice-Hail, 1989

[1 I] Smith, J. O. Ill. E~ and Physically Accurate Simulation of Strings, Bores, andHorns using Digital Waveguide Techniques. From the CCRMA Associates Conference,Stanford University, May 1991.

[I 2] Stonick, V. L. and Massie, D., "ARMA Filter Design for Music Analysls/Synthesls,"Proceedings of lEEE International Conference on Acoustics, Speech and Signal Process-ing], March 1992, vol. II, pg. 256-260.

[ 13] Sullivan, C. Extending the Karplus-Strong Algorithm to Synthesize Electric Guitar Tim-bres with Distortion and Feedback. Computer Music Journal, Vol. 14, No. 3 MIT Press,1990.

[ 14] Valimaki, V., et aL "Physical Modeling of Plucked String Instruments with Applicationto Real-Time Sound Synthesis", Presented at the 98th Convention of the Audio Engineer-ing Soc/ety, Paris, 1995

[ 15] Yelton, Geary. The Rock Synthesizer Manual Woodstock, G~ Rock Tech Publications,1986.

13. Appendix A: Note Information

For our sampled data, the following characteristics were noted and parameters were cho-sen to fit the model.

String

EADGBHigh E

TABLE 1. String Analysis, Open Strings

Length ofNumber of Linear

Horizontal Vertical Significant PredictionFrequency Decay Decay Harmonics Error

82.5073 Hz 0.9930 0.9650 14 3255109.3729 Hz 0.9925 0.9400 16 2851145.9846 Hz 0.9975 0.9300 11 2203195.5931Hz 0.9928 0.9450 14 3266246.1802 Hz 0.9950 0.9850 17 1724326.1793 Hz 0.9972 0.9850 21 3236

~~ egieon

40 of 49

Appendix B: Notes on Interpolation

14. Appendix B: Notes on Interpolation

In section 6. I, two methods of interpolation for estimation of the fundamental fre-

quency of a note were presented. The accuracy of these methods and the constraints on

this accuracy are of some interest.

14.1 Accuracy of Frequency Prediction With Autocorrelation

Spectral estimation theory tells us that unbiased autocorrelation estimates, as in (18),

have a variance [19) that is proportional to the length of the data sample [9]. This variance

introduces error in the estimation process.

1Lx[ml = (18)

(N’- rrl) 2 rx~ [k] + rx~ [k + m] r~x [k- m] 0~)k =--~

Unbiased estimates of the autocorrelation are calculated efficiently from the data by

using circular convolution and the FFT, as described by [10]. Finding the maximum of the

autocorrelation yields the fundamental period of a note and hence the fundamental fre-

quency.

Consider Figure 14. I. 1, which shows an estimated autocorrelation from I0000 data

points and an analytical autocorrelation function. The largest error between the ideal func-

t_ion and the estimated function is 0.0025, which leads to an estimated frequency error of

0.08 Hz. Experimental results show that for a fixed frequency, the estimated frequency has

an error curve that is proportional to the number of sample points taken, as shown in

Figure 14.1.2.

egieon

41 of 49


Estimated and Ideal Autoco~relation Functions

0.9g~5 I-

0.999

O.gg85 r

0.9975 ~221 221.5 222 222.5 223 223.5 224 224.5 225

Tau

Figure 14.1.1 Estimated vs, Ideal Autocorrelation Functions

10°

0.5 1 1.5 2 2.S 3 3.~; 4 4.5 5Length of Sample x 10‘=

Figure 14.1.2 Error of Frequency Estimation vs. Sample Length

There is another consideration: the accuracy of this method over a range of frequen-

cles. In Figure 14.1.3 the error of the autocorrelation method is determined for a fixed

number of samples and for varying frequency.

~~ egieon

42 of 49


2

1.5

0.5

-1.5

-2.5

0100i I I

0100l I

"-30 4000 5000 6000 7 8000 9000 10000Frequency

Figure 14.1.3 Error of Estimated Frequency From Correlation vs. Frequency

From these plots we can learn several things:

¯ This method of frequency estimation approaches the true value from above.

¯ Error is proportional to sample length (this is also known from spectral estimationtheory).

¯ Accuracy is limited to regions where the autocorrelation function is fiat (Le., lessthan 2000 Hz). After this point the polynomial interpolation is not a valid operation.

14.2 Accuracy of Frequency Prediction With Frequency Interpolation

Frequency interpolation is performed on the magnitude of the estimated spectrum. A

local maximum is detected, and a polynomial is fit to the points containing this maximum

and the nearest neighbors. The maximum of this function is taken as the maximum of the

DFT for this region. Since windowing introduces local smoothness to the region, the poly-

nomial interpolation is a valid operation.

Consider Figure 14.2.1, where the interpolation of a 100 Hz cosine wave is performed.

The maximum of the interpolating polynomial occurs at 99.991 Hz, an error of 0.009 Hz.

43 of 49


64

63

62

61

~8

56 98 100 102 104 106Frequency in Hz

I08

Figure 14.2.1 Polynomial Interpolation On Frequency Spectrum

This method is more and more accurate as the length of the sample increases primarily

due to the greater frequency resolution of the sampled spectrum. As the sample length

increases, the calculation error was found to be as shown in Figure 14.2.2.

IO~

10-1 .

10-=

~o-~

10"*0.5Length of Sample x 10’~

Figure 14.2.2 Error of Estimated Frequency vs. Sample Length

For a fixed sample length, this method has roughly consistent error over a range of fre-

quencies.

egie = of 49on


¯ 0 1000 2000 3000 4000 S(~X) 6000 7000 8000 g0(X) 10000Note Frequency

Figure 14.2.3 Error of Estimated Frequency From Spectrum vs. Frequency

This method has some advantages over the autocorrelaUon method in that it’s accurate

over a much wider range and is more accurate in the actual prediction, but it suffers from

inaccuracy in the initial guess. If the two methods are combined, as in section 6.1, very

good results can be obtained.

ie45 of 49

Appendix C: Results of Body Response Estimation

15. Appendix C: Results of Body Response Estimation

In section 6.3 the horizontal mode of the guitar is synthesized from one period of a rep-

resentative waveform of the steady-state response of the guitar. It is assumed that the gui-

tar response is a result of taking an ideal string response (calculated from the string pluck

point) and passing it through a filter that modifies the shape of the harmonics.

By taking a section of the sampled waveform and the corresponding section of the

"ideal" response, Le. the response calculated from an ideal excitation, the harmonics of

each waveform can be compared to determine the necessary filter that would convert the

ideal string to the sampled string. Note that this assumption also includes the different

plucking possibilities (fingered, guitar pick, etc.) into the body response.

Data was obtained from a guitar by recording the plucked string sound onto DAT at

44. I kHz with 16 bit sampling. The data was transferred from DAT to computer disk for

analysis. The guitar used was a Yamaha C-55A. The microphone was placed approximately

six inches from the sound hole.

The same analysis procedure was run independently on each note played. The funda-

mental frequency was determined using the correlation estimate, and the vertical and hor-

izontal mode decay rates were estimated. The pluck point was determined using the

harmonics of the string and the method outlined in section 6.3. An ideal string response

was synthesized. Figure 15.0.1 and Figure 15.0.2 show the analysis and synthesis of the

ideal response. For this waveform, the initial amplitude was found to be 471.8 and the

decay rate to be 0.9886 (recall that this is the decay from one period to the next).

The harmonics of the sampled string and the ideal string are calculated from 10764

data points (corresponding to 50 periods of data) starting at the point the exponential

model is fit using a 65536-point FFT and a Nuttall (sum of cosines) window.

egie01~

46 of 49


Figure 15.0.1 Analysis Portion of SampledResponse

x tO’

Figure 15.0.2 Envelope of Ideal Response

The harmonics of the input and output are shown in Figure 15.0.3, with the ’x’ repre-

senting the change in dB to obtain the output from the input.

Change in dB: Ideal ~ Sampled Stdng

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35Freouencv In rad/sec

Figure 15.0.3 Change in Magnitude

This experiment was repeated for the first six notes along the ’G’ string (ideal 196 Hz).

The frequencies, decay rates, and other information detected are in Table 2, and the mag-

nitude change in dB is in Figure 15.0.4.

egieon

47 of 49


Note Number

Note 13Note 14Note 15Note 16Note 17Note 18

TABLE 2. Results of Analysis

NumberHorizontal Vertical of

Frequency Pluck Point Decay Decay Harmonics

204.8470 2.2787 0.9886 0.9598 14218.3551 2.6315 0.9899 0.9954 13231.1937 2.6615 0.9928 0.9854 9244.4782 6.6475 0.9947 0.9939 9259.0775 3.4460 0.9942 0.9856 12274.6258 4.7181 0.9948 0.9985 8

The results in the table are roughly consistent: the pluck point increases as the string

length decreases (with the exception of Note 16), which makes sense if there is a constant

pluck point on the string. The horizontal decay rates are very accurate, but the vertical

decay rates suffer from a lack of data (since the vertical decay is harder to detect). The

most interesting trend is the decrease in the number of significant harmonics (with the

exception of Note 17), which demonstrates the low-pass effect of the body since the har-

monics are higher and higher in frequency as the fundamental increases.

In Figure 15.0.4, the harmonics of each note are plotted as a group--i.e., the funda-

mentals are the first curve, the second harmonics are the second curve, etc. The curves

are disjoint since each note does not have all the harmonics. The magnitude change data

is more inconsistent. Trends are observable, but there are striking inconsistencies. The

data is muddled between 0.25 and 0.3 rad/scc (corresponding to 1.75 kHz to 2.1 kHz). The

trends are a strong indication of linearity in the guitar body, but the inconsistencies are an

indication that something else is afoot. The most consistent region is from 0.2 to 0.25 Hz

where the harmonics of lower notes are on the same curve as harmonics of higher notes

negieIon

48 of 49


20

-15

-20

-25

-300

I

0.05

+ X 0

I I I I I I

0.1 0.15 0.2 0.25 0.3 0.35Frequency in rad/sec

Figure 15.0.4 Magnitude Change for Six Notes

l egieon

49 of 49

Documents

Synthesis of an Acoustic Guitar with a Digital String ... · PDF fileSynthesis of an Acoustic Guitar With a Digital String Model and Linear Prediction Kevin Bradley submitted for partial