Matthieu Hodgkinson Ph.D. Thesis defence April 2012 Dept. Of computer science NUI Maynooth

MATTHIEU HODGKINSONPH.D. THESIS DEFENCE

APRIL 2012DEPT. OF COMPUTER SCIENCE

NUI MAYNOOTH

Physically Informed Subtraction of a String’s Resonances from Monophonic,

Discretely Attacked Tones : A Phase Vocoder Approach

Chair : Prof. Raymond O’Neill

External Examiner : Prof. Rudolf Rabenstein

Internal Examiner : Dr. Tomás WardSupervisor : Dr. Joseph Timoney

String ExtractionIdea : subtract string resonances from monophonic, plucked or hit string tones.

If input = string + excitation, then the remainder is input –string = excitation.

Then “string extraction” reduces to “excitation extraction”.

0 0.4958-1

0

1

Am

plitu

de

Input (Viola Pizzicato)

0 0.4904-1

0

1

Am

plitu

de

Excitation

0 0.4898-1

0

1

Am

plitu

de

String

If input = string + excitation + other,then input – string = excitation + other.

Now a few examples of other...

Environmental noise IElectric guitars : very short excitation (non-resonant body), electric buzz audible after string extraction.

0 2.7185-1

0

Time

Am

plitu

de

Stratocaster

Environmental noise IIAccidental background noises in recording room.

0 1.9927-1

0

1

Time

Am

plitu

de

Martin (custom recording)

Input Other stringsSometimes awkward to mute open strings. Non-muted strings respond to excitation and to vibrations of target string.

0 1.5147

0

1

Time

Am

plitu

de

Acoustic guitar (Open D)

Output

Granulation and Extraction

The string extraction uses a transparent Phase Vocoder scheme.The waveform is processed repeatedly over short time intervals.

Each short-time output is added to the long-term output.This process is known as overlap-add.It is illustrated in the next few slides.

Granulation and ExtractionWaveform is multiplied by window to make grain.

Sinusoidal components of string are subtracted.

Residual is added to output.

Process is repeated at regular time intervals, and residuals are added.

0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Input

window grain processed grain

0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Output

Time (s)

Granulation and Extraction

0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Input

0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Output

Time (s)


Waveform is multiplied by window to make grain.









0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Input

0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Output

Time (s)






0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Input

0 0.005 0.01 0.015 0.02 0.025-1

-0.5

0

0.5

1Output

Time (s)

Short-Time String Cancelation

Within each grain, the sinusoidal components of the string are measured and subtracted.


0 500 1000 1500 2000 2500 3000 3500 4000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (Hz)

Mag

nitu

de (d

B)

original spectrum

The subtraction takes place in the frequency domain.

The first harmonic is detected with a peak search.

The complex spectrum is used to measure the parameters of the sinusoid.

A spectrum of this sinusoid is synthesized and subtracted.

The process is repeated for each harmonic.

Short-Time String CancelationThe subtraction takes place in the frequency domain.





0 500 1000 1500 2000 2500 3000 3500 4000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (Hz)

Mag

nitu

de (d

B)

original spectrum synthesized harmonicprocessed spectrum


0 500 1000 1500 2000 2500 3000 3500 4000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (Hz)

Mag

nitu

de (d

B)

original spectrum synthesized harmonicprocessed spectrum The subtraction takes

place in the frequency domain.






0 500 1000 1500 2000 2500 3000 3500 4000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (Hz)

Mag

nitu

de (d

B)








0 500 1000 1500 2000 2500 3000 3500 4000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (Hz)

Mag

nitu

de (d

B)








0 500 1000 1500 2000 2500 3000 3500 4000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (Hz)

Mag

nitu

de (d

B)







Modeling and measurement of harmonics

The harmonics are modeled as sinusoids of 1st-order phase and amplitude (i.e. constant frequency and exponential

amplitude)

njjnx exp][

The measurements of these partials is based on the Complex Spectral Phase-Magnitude Evolution (CSPME) method, generalisation of (Short and Garcia, 2006) to 1st-order

amplitude signals, introduced in the thesis.


njjnx exp][

jnxnxny exp][]1[][

xX DFT

jXY exp

2

*

logrealXYX

2

*

logimagXYX

Modeling and measurement of harmonicsThe standard magnitude spectrum along with the CSPME frequency ω and decay rate γ spectra are shown here for illustration.

ω and γ nevertheless need only be evaluated at the peak maxima.

Zero-padding is with this method superfluous (note the “angularity” of the spectra).

669.0684 1338.1367 2007.2051-60

-40

-20

0

deci

bels

Magnitude Spectrum

669.0684 1338.1367 2007.2051

656.8242

1319.2515

1997.689he

rtz

Frequency Spectrum

669.0684 1338.1367 2007.2051

-45.6197

-7.3079

12.962

Frequency (Hz)

hertz

Decay Rate Spectrum


After 1st-order phase and amplitude terms (i.e. frequency and decay rate) are evaluated, the 0th-order terms can be

evaluated in turn.

njns exp][

The spectrum of a synthetic signal s[n] is used in a process known as demodulation (Marchand, 1998, Zölzer, 2002, Short

and Garcia, 2006).

0

1


][exp

1exp][ nxj

njns

SXlogreal

SXlogimag

Modeling and measurement of harmonicsLikewise, the phase φ and amplitude λ are shown at all spectral indices for illustration.

Note that, due to the exponential decay of the partials, the partial amplitude measurements of the middle plot slightly differ from the magnitude maxima of the upper plot.

669.0684 1338.1367 2007.2051

-33.5506

-24.0077

-13.751

0

deci

bels

Magnitude Spectrum

Amplitude Spectrum

669.0684 1338.1367 2007.2051

-1.4805-0.9259

0.9367

Frequency (Hz)

radi

ans

669.0684 1338.1367 2007.2051

-33.0886

-21.3451

-14.3155

0

deci

bels

Phase Spectrum


Thereafter, the complete partial can be synthesized, Fourier-transformed, and subtracted.

Instead of taking the DFT of the synthetic partial, the spectrum can be directly synthesised with Fourier-series approximation

(derived in thesis).

The spectral values of the main lobe can thus be synthesised alone instead of an entire spectrum, allowing computational

savings.

N

bGVjbX 2exp][

The problem of Inharmonicity

The phenomenon of inharmonicity makes the problem of excitation/string extraction much more delicate, for the reasons

that we are going to see.

Strings with negligible stiffness exhibit linear frequency series.

20 1 kkk

0 kk

Very often, stiffness causes a nonlinear “stretch” of this series.

k is the harmonic number, ω0 is the fundamental frequency, and β, the inharmonicity coefficient.

Spectral StretchThe stretch caused by inharmonicity is negligible at low harmonic indices, but often substantial at high frequency indices.

A linear model cannot be assumed for the identification of the harmonics.

A simple, robust and accurate method was presented in (Hodgkinson et al., 2009).

0 6000

Acoustic guitar open A (FF = 110Hz, IC = 7*10-5)

1

2

3 45

2200 30000

20

21

22 23

24

2526M

agni

tude linearised spectrum

6000 73000

51

5253

5455

56 5758

59

Frequency (Hz)

Appearance of Phantom PartialsA longitudinal series of vibrations has identical fundamental frequency and 1/4 inharmonicity. (Bank and Sujbert, 2003)

When inharmonicity is 0, this series is merged with main series. Else, phantom partials can be salient, and must be subtracted as well.

In this spectrum, the transverse partials are numbered in black, and the phantom partials, in red.

2100 52000

19

2020

21

21

22

22

23

23

24

24

25

25

26

26

27

27

28

28

29

29

30

30

31

31

3232

33

3334 35

36

37

38

39

39

40 40

4141

42

4243

43

44

44

Frequency (Hz)

Mag

nitu

de

found

Overlap of phantom partialsTransverse and phantom partials may overlap.

This compromises the accuracy of the partial measurements.

In situations of overlap, a partial may come out as a bulge instead of a peak.

Transverse partials are generally larger, but not always!

3521.6086 3642.4918 3764.0529 3886.31070

Frequency (Hz)

Mag

nitu

de

32

31

32

33

33

3434

predicted

A transverse partial (black) under a phantom

partial (red)!

Algorithmic searchThis complicates greatly the search and cancelation of the partials.

An algorithmic integrated detection/cancelation process is proposed in the thesis.

First-come, first-serve does not always apply!

Linearity of our subtractive cancelation process is used to tackle situations of overlap.

3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500





3450 39500

Time-varying FF and IC

nen)(0

ne

n)(

00 n

n 0lim

Large vibrational amplitude of the string may cause a downward glide in Fundamental Frequency, and upward trend in Inharmonicity

Coefficient (Hodgkinson et al., 2010).

We propose simplified models, based on string length and tension derivations by (Legge and Fletcher, 1984) and (Bank, 2009).

Time-varying FF and ICThe measurements beside were taken from an acoustic guitar open E3, played fortissimo.

To fit the coefficients ωΔ, ω∞, γω, βΔ, β∞ and γβ, a fast and robust fitting method based on Fourier analysis (FEPCF) was developed (Hodgkinson, 2011)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 161.5

61.6

61.7

61.8

61.9

62Fundamental Frequency

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.11

1.12

1.13

1.14

1.15

1.16x 10

-4 Inharmonicity Coefficient

Time (s)

measurementsFEPCF fit

Time-varying FF and ICThese coefficients found, and entire spectrum of partial tracks can be deployed with only 6 coefficients. (Hodgkinson et al., 2010)

Besides we show the Ovation tone’s spectrogram fitted with our model (solid lines).

To show the importance of the time-varying IC, we also show tracks with constant IC (dashed lines).

Time-varying FF and ICThese coefficients found, and entire spectrum of partial tracks can be deployed with only 6 coefficients. (Hodgkinson et al., 2010)

Besides we show the Ovation tone’s spectrogram fitted with our model (solid lines).

To show the importance of the time-varying IC, we also show tracks with constant IC (dashed lines).

Onset-overlapping grains

Another major difficulty is the cancelation of the partials for the grains that overlap with the onset of the tone.

njjnhnx exp][][

If the onset takes place at time t=ν, then the sinusoidal model in attack-overlapping grains must be formulated as

where h[n] is the unit-step function,

0,10,0

][nn

nh

Attack-overlapping grainsIn attack-overlapping grain, the signal can be seen as windowed by a “unit-stepped” window.

Unofortunately, the frequency-domain characteristics of such windows are far from optimal.

The lower plot confronts the spectrum of a 2nd-order continuous window and a unit-stepped window.

Time0

0

1Am

plitu

de

unit-step function unit-stepped window

0 1 2 3 4 5 6 7 8 9 10-100

-80

-60

-40

-20

0

Frequency (Hz)

Mag

nitu

de (d

B)

Attack-overlapping grainsFurther illustration of the situation with the spectra of an attack-overlapping grain (black) and a regular grain (white).

0

0

Time

Am

plitu

de

0 50 100 150 200 250 300 350 400 450 500 550-150

-100

-50

0

Frequency (Hz)

Mag

nitu

de (d

B)

Attack-overlapping grainsString extraction in onset-overlapping grains can nevertheless contribute to the aural quality of the excitation.

The sound example shows string extraction starting from first non-onset-overlapping grain only (white), and from the grain before (black, overlap factor 1/3).

0 0.05 0.1 0.15 0.2

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Time (s)

Am

plitu

de

no overlap 1/3 overlap

Attack-overlapping grains

A few remarks concerning string extraction in onset-overlapping grains.

The CSPME frequency and exponential decay estimates are sensitive to excessive leakage seen in onset-overlapping grains. A

standard quadratic-fit approach may be preferable there.

Because of the poor resolution of the partials, the search for phantom partials seems futile.

For the application of Commuted Waveguide Synthesis (Karjalainen et al., 1993, Smith, 1993), it is desirable to leave

some sinusoidal energy at the onset of the tone. Onset-overlapping frames can then be left unprocessed.

Recapitulation on contributions

Fundamental Frequency and Inharmonicity Coefficient estimation method (Hodgkinson et al., 2009).

FF and IC time-varying models (Hodgkinson et al., 2010).Exponential-Plus-Constant fitting method (Hodgkinson, 2011)Inharmonicity-related complications “revealed”.Bulge search replaces peak search.Frequency-domain subtractive approach to excitation/string

extraction.Generalisation of the CSPE to exponential-amplitude signals.Analytical formulation of exponential-amplitude-modulated

cosine-window spectra.Onset-overlapping frames approached.

Thesis Contents

Introduction

: Conceptual definition of String Extraction ; applicability ; applications.

Chapter 1 : Comprehensive string model :• Basis (Fletcher and Rossing, 1991, Raichel, 2000,

Steiglitz, 1996)• damping (Chaigne and Askenfelt, 1993, Trautman

and Rabenstein, 2003)• Inharmonicity (Fletcher et al., 1962)• Longitudinal vibrations (Morse and Ingard, 1986,

Giordano and Korty, 1996, Bank and Sujbert, 2003)• Tension modulation (Legge and Fletcher, 1984,

Bank, 2009, Hodgkinson et al., 2010).

Thesis contents

Chapter 2 : Frequency-domain component estimation and cancelation • windowing (Harris, 1978, Nuttall, 1981)• Fourier-series approximation of cosine window DFTs• estimation of partial frequencies with CSPE (Short and Garcia, 2006) and exponential-amplitude generalisation (CSPME).

Chapter 3 : Phase Vocoder (PV) approach to String Extraction• PV scheme (Portnoff, 1981)• unit-step modeling of attack• inharmonicity estimation (Hodgkinson et al., 2009)• phantom partials.

Chapter 4 : Tests and results• CSPME• Fourier series approximation• onset-overlapping frames• phantom partials.

Conclusion : Aims, organisation, contributions and future work.

References

(Legge and Fletcher, 1984) Nonlinear generation of missing modes on a vibrating string. Journal of the Acoustical Society of America, 76(1), 1984

(Balázs Bank, 2009) Energy-based synthesis of tension modulation in strings. In Proceedings of the 12th International Conference on Digital Audio Effects (DAFx-09), Como, Italy, 2009.

(Hodgkinson, 2011) Exponential-plus-constant fitting based on Fourier analysis. In JIM2011 – 17èmes Journées d’Informatique Musicale, Saint-Etienne, France, 2011.

(Fletcher and Rossing, 1991) Neville H. Fletcher and Thomas D. Rossing. The Physics of Musical Instruments. Springer-Verlag New York Inc., New-York, USA, 1991.

(Bank and Sujbert, 2003) Balázs Bank and Lásló Sujbert. Modeling the longitudinal vibration of piano strings. In Proceedings of the Stockholm Music Acoustics Conference, August 6-9, 2003 (SMAC 03), Stockholm, Sweden, 2003.

References

(Morse and Ingard, 1986) Philip M. Morse and K. Uno Ingard. Theoretical Acoustics. Princeton University Press,

(Marchand, 1998) Improving spectral analysis precision with an enhanced phase vocoder using signal derivatives. In Proc. DAFx98 Digital Audio Effects Workshop, pages 114-118. MIT Press, 1998.

(Karjalainen et al., 1993) Towards high-quality sound synthesis of the guitar and string instruments. In International Computer Music Conference, Tokyo, Japan, 1993.

(Smith, 1993) Efficient synthesis of stringed musical instruments. In International Computer Music Conference, Tokyo, Japan, 1993.

(Short and Garcia, 2006) Kevin M. Short and Ricardo A. Garcia. Signal analysis using the complex spectral phase evolution (CSPE) method. In AES 120th Convention, Paris, France, 2006.

References

(Hodgkinson et al., 2009) Matthieu Hodgkinson, Jian Wang, Joseph Timoney and Victor Lazzarini “Handling Inharmonic Series with Median-Adjustive Trajectories”, Proceedings of the 12th International Conference on Digital Audio Effects (DAFx-09), Como, Italy, 2009.

(Hodgkinson et al., 2010) Matthieu Hodgkinson, Joseph Timoney and Victor Lazzarini “A Model of Partial Tracks for Tension-Modulated, Steel-String Guitar Tones”, Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, Austria, 2010.

(Raichel, 2000) Daniel R. Raichel. The Science and Applications of Acoustics. Springer-Verlag New York Inc., New-York, USA, 2000.

(Steiglitz, 1996) Ken Steiglitz. A Digital Signal Processing Primer. Addison-Wesley Publishing Company, Inc., Menlo Park, California, USA, 1996.

References

(Chaigne and Askenfelt, 1993) Antoine Chaigne and Anders Askenfelt. Numerical simulations of piano strings. I. A physical model for a struck string using finite difference methods. Journal of the Acoustical Society of America, 95(2), 1993.

(Trautman and Rabenstein, 2003)

Lutz Trautman and Rudolf Rabenstein. Digital Sound Synthesis by Physical Modeling Using the Functional Transformation Method. Kluwer Academic/Plenum Publishers, New York, 2003.

(Giordano and Corty, 1996) N. Giordano and A. J. Korty. Motion of a piano string: Longitudinal vibrations and the role of the bridge. Journal of the Acoustical Society of America, 100(6), 1996.

(Harris, 1978) Frederic J. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE 66(1), 1978.

(Nuttall, 1981) Albert H. Nuttall. Some windows with very good sidelobe behavior. IEEE Transactions of Acoustics, Speech, and Signal Processing, ASSP-29(1), 1981.

References

(Portnoff, 1981) Michael R. Portnoff. Time-scale modification of speech based on short-time Fourier analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(3), 1981.

Documents

Matthieu Hodgkinson Ph.D. Thesis defence April 2012 Dept. Of computer science NUI Maynooth