30
EE E6820: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music Dan Ellis <[email protected]> Michael Mandel <[email protected]> Columbia University Dept. of Electrical Engineering http://www.ee.columbia.edu/dpwe/e6820 February 26, 2009 1 Music & nonspeech 2 Environmental Sounds 3 Music Synthesis Techniques 4 Sinewave Synthesis E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 1 / 30

Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

EE E6820: Speech & Audio Processing & Recognition

Lecture 6:Nonspeech and Music

Dan Ellis <[email protected]>Michael Mandel <[email protected]>

Columbia University Dept. of Electrical Engineeringhttp://www.ee.columbia.edu/∼dpwe/e6820

February 26, 2009

1 Music & nonspeech

2 Environmental Sounds

3 Music Synthesis Techniques

4 Sinewave Synthesis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 1 / 30

Page 2: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Outline

1 Music & nonspeech

2 Environmental Sounds

3 Music Synthesis Techniques

4 Sinewave Synthesis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 2 / 30

Page 3: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Music & nonspeech

What is ‘nonspeech’?I according to research effort: a little musicI in the world: most everything

Originnatural man-made

Info

rmat

ion

cont

ent

low

high

wind &water

animalsounds

speech music

machines & enginescontact/

collision

attributes?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 3 / 30

Page 4: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sound attributes

Attributes suggest model parameters

What do we notice about ‘general’ sound?I psychophysics: pitch, loudness, ‘timbre’I bright/dull; sharp/soft; grating/soothingI sound is not ‘abstract’:

tendency is to describe by source-events

Ecological perspectiveI what matters about sound is ‘what happened’→ our percepts express this more-or-less directly

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 4 / 30

Page 5: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Motivations for modeling

Describe/classifyI cast sound into model because want to use the resulting

parameters

Store/transmitI model implicitly exploits limited structure of signal

Resynthesize/modifyI model separates out interesting parameters

Sound Model parameter�space

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 5 / 30

Page 6: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Analysis and synthesis

Analysis is the converse of synthesis:

Sound

AnalysisSynthesis

Model / �representation

Can exist apart:I analysis for classificationI synthesis of artificial sounds

Often used together:I encoding/decoding of compressed formatsI resynthesis based on analysesI analysis-by-synthesis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 6 / 30

Page 7: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Outline

1 Music & nonspeech

2 Environmental Sounds

3 Music Synthesis Techniques

4 Sinewave Synthesis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 7 / 30

Page 8: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Environmental Sounds

Where sound comes from:mechanical interactions

I contact / collisionsI rubbing / scrapingI ringing / vibrating

Interest in environmental soundsI carry information about events around us

.. including indirect hintsI need to create them in virtual environments

.. including soundtracks

Approaches to synthesisI recording / samplingI synthesis algorithms

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 8 / 30

Page 9: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Collision soundsFactors influencing:

I colliding bodies: size, material, dampingI local properties at contact point (hardness)I energy of collision

Source-filter modelI ”source” = excitation of collision event

(energy, local properties at contact)I ”filter” = resonance and radiation of energy

(body properties)Variety of strike/scraping sounds

I resonant freqs ∼ size/shapeI damping ∼ materialI HF content in excitation/strike ∼ mallet, force

I (from Gaver, 1993)E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 9 / 30

Page 10: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sound textures

What do we hear in:I a city streetI a symphony orchestra

How do we distinguish:I waterfallI rainfallI applauseI static

time / s

freq

/ Hz

Applause04

0 1 2 3 40

1000

2000

3000

4000

5000

time / s

freq

/ Hz

Rain01

0 1 2 3 40

1000

2000

3000

4000

5000

Levels of ecological description...

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 10 / 30

Page 11: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sound texture modeling (Athineos)

Model broad spectral structure with LPCI could just resynthesize with noise

Model fine temporal structure in residual withlinear prediction in time domain

TD-LPy[n] = iaiy[n-i] DCT FD-LP

E[k] = ΣΣ ibiE[k-i]Whitenedresidual

Per-framespectral

parameters

Per-frametemporal envelope

parameters

ResidualspectrumSound

e[n]y[n] E[k]

I precise dual of LPC in frequencyI ‘poles’ model temporal events

-0.02

0

0.02

Temporal envelopes (40 poles, 256ms)

0.05 0.1 0.15 0.2 0.25time / sec

ampl

itude

0.010.020.03

Allows modification / synthesis?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 11 / 30

Page 12: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Outline

1 Music & nonspeech

2 Environmental Sounds

3 Music Synthesis Techniques

4 Sinewave Synthesis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 12 / 30

Page 13: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Music synthesis techniques

What is music?I could be anything → flexible synthesis needed!

Key elements of conventional musicI instruments→ note-events (time, pitch, accent level)→ melody, harmony, rhythm

I patterns of repetition & variation

Synthesis framework:

instruments: common framework for many notesscore: sequence of (time, pitch, level) note events

&&V?

########

cccc

Soprano

Alto

Tenor

Bass

Allegro „ ∑3

„ ∑3

„ ∑3

„ ∑3

.œ jœ Jœ jœ ŒHal - le - lu - jah!

.œ jœ jœ jœ ŒHal - le - lu - jah!.œ Jœ Jœ Jœ ŒHal - le - lu - jah!

.œ Jœ Jœ Jœ ŒHal - le - lu - jah!

.œ jœ Jœ jœ ‰ Rœ RœHal - le - lu - jah! Hal - le -

.œ jœ jœ jœ ‰ rœ rœHal - le - lu - jah! Hal - le -.œ Jœ Jœ Jœ ‰ Rœ RœHal - le - lu - jah! Hal - le -

.œ Jœ Jœ Jœ ‰ Rœ RœHal - le - lu - jah! Hal - le -

Jœ Jœ ‰ Rœ Rœ Jœ Jœ ‰ Jœlu - jah! Hal - le - lu - jah! Hal -

Jœ jœ ‰ rœ rœ Jœ jœ ‰ jœlu - jah! Hal - le - lu - jah! Hal -

Jœ Jœ ‰ Rœ Rœ Jœ Jœ ‰ Jœlu - jah! Hal - le - lu - jah! Hal -

Jœ Jœ ‰ Rœ Rœ Jœ Jœ ‰ Jœlu - jah! Hal - le - lu - jah! Hal -

&&V?

########

S

A

T

B

7

Jœ œ Jœ œ Œle - lu - jah,

œ œ œ jœ œ Œle - lu - jah,

Jœ œ jœ œ Œle - lu - jah,œ œ œ Jœ œ Œle - lu - jah,

.œ jœ Jœ Jœ ŒHal - le - lu - jah,

.œ jœ jœ jœ ŒHal - le - lu - jah,.œ Jœ Jœ Jœ ŒHal - le - lu - jah,.œ Jœ Jœ Jœ ŒHal - le - lu - jah,

.œ jœ Jœ Jœ ‰ Rœ RœHal - le - lu - jah, Hal - le -

.œ jœ jœ jœ ‰ rœ rœHal - le - lu - jah, Hal - le -.œ Jœ Jœ Jœ ‰ Rœ RœHal - le - lu - jah, Hal - le -.œ Jœ Jœ Jœ ‰ Rœ RœHal - le - lu - jah, Hal - le -

Jœ Jœ ‰ Rœ Rœ Jœ Jœ ‰ Jœlu - jah, Hal - le - lu - jah, Hal -jœ jœ ‰ rœ rœ jœ jœ ‰ jœlu - jah, Hal - le - lu - jah, Hal -

Jœ Jœ ‰ Rœ Rœ Jœ Jœ ‰ Jœlu - jah, Hal - le - lu - jah, Hal -Jœ Jœ ‰ Rœ Rœ Jœ Jœ ‰ Jœlu - jah, Hal - le - lu - jah, Hal -

&&V?

########

S

A

T

B

11 œ œ œ œ Œle - lu - jah,

.œ jœ# œ Œle - lu - jah,œ œ œ Jœ œ Œle - lu - jah,œ œ œ œ Œle - lu - jah,

˙ œ œfor the Lord

˙ œ œfor the Lord

˙ œ œfor the Lord˙ œ œfor the Lord

Jœ jœ .œ Jœ œGod Om - ni - po - tent

jœ jœ .œ jœ œGod Om - ni - po - tent

Jœ jœ .œ Jœ œGod Om - ni - po - tentJœ Jœ

.œ Jœ œGod Om - ni - po - tent

˙ œ ‰ Rœ Rœreign - eth, Hal - le -

˙ œ ‰rœ rœ

reign - eth, Hal - le -˙ œ ‰ rœ rœreign - eth, Hal - le -˙ œ ‰ Rœ Rœreign - eth, Hal - le -

HAN044 #1/8

M E S S I A H44. chorus

G. F. Handel

© 2000 by Score on Line S.A.

HALLELUJAH!

http://www.score-on-line .com

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 13 / 30

Page 14: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

The nature of musical instrument notesCharacterized by instrument (register),note, loudness/emphasis, articulation...

Time

Frequency

Piano

01000200030004000

0 1 2 3 4 Time

Violin

01000200030004000

0 1 2 3 4

Time

Frequency

Clarinet

01000200030004000

0 1 2 3 4 Time

Trumpet

01000200030004000

0 1 2 3 4

Distinguish how?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 14 / 30

Page 15: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Development of music synthesis

Goals of music synthesis:I generate realistic / pleasant new notesI control / explore timbre (quality)

Earliest computer systems in 1960s(voice synthesis, algorithmic)

Pure synthesis approaches:I 1970s: Analog synthsI 1980s: FM (Stanford/Yamaha)I 1990s: Physical modeling, hybrids

Analysis-synthesis methods:I sampling / wavetablesI sinusoid modelingI harmonics + noise (+ transients)

others?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 15 / 30

Page 16: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Analog synthesis

The minimum to make an ‘interesting’ sound

Oscillator Filter

Envelope

PitchTrigger

Vibrato

t

t

f

+

Cutoff�freq

GainSound

++

Elements:I harmonics-rich oscillatorsI time-varying filtersI time-varying envelopeI modulation: low frequency + envelope-based

Result:I time-varying spectrum, independent pitch

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 16 / 30

Page 17: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

FM synthesisFast frequency modulation → sidebands:cos(ωct + β sin(ωmt)) =

∑∞n=−∞ Jn(β) cos((ωc + nωm)t)

I a harmonic series if ωc = rωm

Jn(β) is a Bessel function:

0 1 2 3 4 5 6 7 8 9-0.5

0

0.5

1J0 J1 J2 J3 J4 Jn(β) ≈ 0 �

for β < n - 2

modulation index β

→ Complex harmonic spectra by varying β

time / s

freq

/ Hz

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

1000

2000

3000

4000

I ωc = 2000 Hz, ωm = 200 HzI what use?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 17 / 30

Page 18: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sampling synthesisResynthesis from real notes→ vary pitch, duration, level

Pitch: stretch (resample) waveform

0 0.002 0.004 0.006 0.008 time / s-0.2

-0.1

0

0.1

0.2

0 0.002 0.004 0.006 0.008 time / s

596 Hz 894 Hz

-0.2

-0.1

0

0.1

0.2

Duration: loop a ‘sustain’ section

-0.2

-0.1

0

0.1

0.2

-0.2

-0.1

0

0.1

0.2

0 0.1 0.2 0.3 0 0.1 0.2 0.3time / s time / s

0.204 0.2060.174 0.176

Level: cross-fade different examples

-0.2

-0.1

0

0.1

0.2

-0.2

-0.1

0

0.1

0.2

0 0.05 0.1 0.15 0 0.05 0.1 0.15time / s time / s

Soft Loud

veloc

mix

I need to ‘line up’ source samplesI good & bad?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 18 / 30

Page 19: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Outline

1 Music & nonspeech

2 Environmental Sounds

3 Music Synthesis Techniques

4 Sinewave Synthesis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 19 / 30

Page 20: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sinewave synthesis

If patterns of harmonics are what matter,why not generate them all explicitly:s[n] =

∑k Ak [n] cos(k · ω0[n] · n)

I particularly powerful model for pitched signals

Analysis (as with speech):I find peaks in STFT |S [ω, n]| & trackI or track fundamental ω0 (harmonics / autocorrelation)

& sample STFT at k · ω0

→ set of Ak [n] to duplicate tone:

0 0.05 0.1 0.15 0.2 time / s time / s0

2000

4000

6000

8000

freq

/ Hz

freq / Hz

mag

00.1

0.2

05000

0

1

2

Synthesis via bank of oscillators

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 20 / 30

Page 21: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Steps to sinewave modeling - 1

The underlying STFT:

X [k , n0] =N−1∑n=0

x [n + n0] · w [n] · exp−j

(2πkn

N

)

I what value for N (FFT length & window size)?I what value for H (hop size: n0 = r · H, r = 0, 1, 2 . . . )?

STFT window length determines freq. resolution:Xw (e jω) = X (e jω) ∗W (e jω)

Choose N long enough to resolve harmonics

→ 2-3x longest (lowest) fundamental periodI e.g. 30-60 ms = 480-960 samples @ 16 kHzI choose H ≤ N/2

N too long → lost time resolutionI limits sinusoid amplitude rate of change

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 21 / 30

Page 22: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Steps to sinewave modeling - 2Choose candidate sinusoids at each timeby picking peaks in each STFT frame:

time / sfre

q / H

zle

vel /

dB

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1802000400060008000

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40-20

020

Quadratic fit for peak:

400 600 800 freq / Hz

-20-10

01020

400 600 800 freq / Hz

-10

-5

0

leve

l / d

B

phas

e / r

ad

y

xy = ax(x-b)

b/2

ab2/4

+ linear interpolation of unwrapped phaseE6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 22 / 30

Page 23: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Steps to sinewave modeling - 3

Which peaks to pick?Want ‘true’ sinusoids, not noise fluctuations

I ‘prominence’ threshold above smoothed spectrum

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40

-20

0

20

leve

l / d

B

Sinusoids exhibit stability...I of amplitude in timeI of phase derivative in time→ compare with adjacent time frames to test?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 23 / 30

Page 24: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Steps to sinewave modeling - 4

‘Grow’ tracks by appending newly-found peaksto existing tracks:

timefreq

death

birthexisting�tracks

new�peaks

I ambiguous assignments possible

Unclaimed new peakI ‘birth’ of new trackI backtrack to find earliest trace?

No continuation peak for existing trackI ‘death’ of trackI or: reduce peak threshold for hysteresis

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 24 / 30

Page 25: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Resynthesis of sinewave modelsAfter analysis, each track defines contours infrequency, amplitude fk [n], Ak [n] (+ phase?)

I use to drive a sinewave oscillators & sum up

0 0.05 0.1 0.15 0.2500

600

7000

1

2

3

0 0.05 0.1 0.15 0.2time / s

time / sfreq

/ Hz

leve

l

-3-2-10123

Ak[n]·cos(2πfk[n]·t)

fk[n]

Ak[n]

n

‘Regularize’ to exactly harmonic fk [n] = k · f0[n]

0 0.05 0.1 0.15 0.20

2000

4000

6000

0 0.05 0.1 0.15 0.2550

600

650

700

time / s time / s

freq

/ Hz

freq

/ Hz

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 25 / 30

Page 26: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Modification in sinewave resynthesis

Change duration by warping timebaseI may want to keep onset unwarped

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1000

2000

3000

4000

5000

time / s

freq

/ Hz

Change pitch by scaling frequenciesI either stretching or resampling envelope

0 1000 2000 3000 4000010203040

freq / Hz

leve

l / d

B

0 1000 2000 3000 4000010203040

freq / Hz

leve

l / d

B

Change timbre by interpolating parameters

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 26 / 30

Page 27: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sinusoids + residual

Only ‘prominent peaks’ became tracksI remainder of spectral energy was noisy?→ model residual energy with noise

How to obtain ‘non-harmonic’ spectrum?I zero-out spectrum near extracted peaks?I or: resynthesize (exactly) & subtract waveforms

es [n] = s[n]−∑

k

Ak [n] cos(2πn · fk [n])

.. must preserve phase!

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

mag

/ dB

-80

-60

-40

-20

0

20

original

sinusoids

residualLPC

Can model residual signal with LPC→ flexible representation of noisy residual

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 27 / 30

Page 28: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Sinusoids + noise + transients

Sound represented as sinusoids and noise:

s[n] =∑k

Ak [n] cos(2πn · fk [n]) + hn[n] ∗ b[n]

Sinusoids Residual es [n]

Parameters are Ak [n], fk [n], hn[n]

time / s

freq

/ Hz

0 0.2 0.4 0.602000400060008000

2000400060008000

0 0.2 0.4 0.602000400060008000

Separate out abrupt transients in residual?es [n] =

∑k tk [n] + hn[n] ∗ b′[n]

I more specific → more flexible

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 28 / 30

Page 29: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

Summary

‘Nonspeech audio’I i.e. sound in generalI characteristics: ecological

Music synthesisI control of pitch, duration, loudness, articulationI evolution of techniquesI sinusoids + noise + transients

Music analysis...

and beyond?

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 29 / 30

Page 30: Lecture 6: Nonspeech and Musicdpwe/e6820/lectures/L06-non... · 2009. 2. 26. · I bright/dull; sharp/soft; grating/soothing I sound is not ‘abstract’: tendency is to describe

References

W.W. Gaver. Synthesizing auditory icons. In Proc. Conference on Human factors incomputing systems INTERCHI-93, pages 228–235. Addison-Wesley, 1993.

M. Athineos and D. P. W. Ellis. Autoregressive modeling of temporal envelopes. IEEETr. Signal Proc., 15(11):5237–5245, 2007. URLhttp://www.ee.columbia.edu/~dpwe/pubs/AthinE07-fdlp.pdf.

X. Serra and J. Smith III. Spectral Modeling Synthesis: A Sound Analysis/SynthesisSystem Based on a Deterministic Plus Stochastic Decomposition. Computer MusicJournal, 14(4):12–24, 1990.

T. S. Verma and T. H. Y. Meng. An analysis/synthesis tool for transient signals thatallows aflexible sines+ transients+ noise model for audio. In Proc. ICASSP, pagesVI–3573–3576, Seattle, 1998.

E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 30 / 30