CSC475 Music Information Retrievalmarsyas.cs.uvic.ca/mirBook/csc475_sinusoids.pdfCSC475 Music...

Preview:

Citation preview

CSC475 Music Information RetrievalSinusoids and DSP notation

George Tzanetakis

University of Victoria

2014

G. Tzanetakis 1 / 38

Table of Contents I

1 Time and Frequency

2 Sinusoids and Phasors

G. Tzanetakis 2 / 38

Motivation

Frequently the computer science students who take this coursehave no background in Digital Signal Processing (DSP) so Ialways try to do a few lectures introducing some DSPfundamentals. An introduction to DSP typically requires anentire course and learning DSP is a life long pursuit so whatone can do in a few lectures is rather limited. My goal is tostress intuition and attempt to demystify the basics of themathematical notation used. In addition, DSP contains somebeautiful mathematical ideas that connect the continuousmathematics of the physical world with the discretemathematics needed by computers. I hope this material willmotivate you to learn more about DSP.

G. Tzanetakis 3 / 38

Digital Audio Recordings

Recordings in analog media (like vinyl or magnetic tape)degrade over time

Digital audio representations theoretically can remainaccurate without any loss of information through copyingof patterns of bits.

MIR requires a distilling information from an extremelylarge amount of data

Digitally storing 3 minutes of audio requiresapproximately 16 million numbers. A tempo extractionprogram must somehow convert these to a singlenumerical estimate of the tempo.

G. Tzanetakis 4 / 38

Production and Perception of Periodic Sounds

Animal sound generation and perception

The sound generation and perception systems of animals haveevolved to help them survive in their environment. From anevolutionary perspective the intentional sounds generated byanimals should be distinct from the random sounds of theenvironment.

Repetition

Repetition is a key property of sounds that can make themmore identifiable as coming from other animals (predators,prey, potential mates) and therefore animal hearing systemshave evolved to be good at detecting periodic sounds.

G. Tzanetakis 5 / 38

Pitch Perception

Pitch

When the same sound is repeated more than 10-20 times persecond instead of it being perceived as a sequence of individualsound events it is fused into a single sonic event with aproperty we call pitch that is related to the underlying periodof repetition. Note that this fusion is something that ourperception does rather than reflect some underlying singalchange other than the decrease of the repetition period.

G. Tzanetakis 6 / 38

Time-Frequency Representations

Music Notation

When listening to mixtures of sounds (including music) we areinterested in when specific sounds take place (time) and whatis their source of origin (pitch, timbre). This is also reflectedin music notation which fundamentally represents time fromleft to right and pitch from bottom to top.

G. Tzanetakis 7 / 38

Spectrum

Informal definition of Spectrum

A fundamental concept in DSP is the notion of a spectrum.Informally complex sounds such as the ones produced bymusical instruments and their combinations can be modeled aslinear combinations of simple elementary sinusoidal signalswith different frequencies. A spectrum shows how “much”each such basis sinusoidal component contributes to theoverall mixture. It can be used to extract information aboutthe sound such as its perceived pitch or what instrument(s)are playing. A spectrum corresponds to a short snapshot ofthe sound in time.

G. Tzanetakis 8 / 38

Spectrum example

Spectrum of a tenor saxophone note

G. Tzanetakis 9 / 38

Spectrograms

Spectrograms

Music and sound change over time. A spectrum does notprovide any information about the time evolution of differentfrequencies. It just shows the relative contribution of eachfrequency to the mixture signal over the duration analyzed.In order to capture the time evolution of sound and music thestandard approach is to segment the audio signal into smallchunks (called windows or frames) and calculate the spectrumfor each of these windows. The assumption is that during therelatively short period of analysis (typically less than a second)there is not much change and therefore the calculatedshort-time spectrum is an accurate representation of theunderlying signal. The resulting sequence of spectra over timeis called a spectrogram.

G. Tzanetakis 10 / 38

Examples of spectrograms

Spectrogram of a few tenor saxophone notes

G. Tzanetakis 11 / 38

Waterfall spectrogram view

Waterfall display using sndpeek

G. Tzanetakis 12 / 38

Table of Contents I

1 Time and Frequency

2 Sinusoids and Phasors

G. Tzanetakis 13 / 38

Why is DSP important for MIR ?

A large amount of MIR research deals with audio signals.

Audio signals are represented digitally as very longsequences of numbers.

Digital Signal Processing techniques are essential inextracting information from audio signals.

The mathematical ideas behind DSP are amazing. Forexample it is through DSP that you can understand howany sound that you can hear can be expressed as a sum ofsine waves or represented as a long sequence of 1’s and0’s.

G. Tzanetakis 14 / 38

DSP for MIR

Digital Signal Processing is a large field and thereforeimpossible to cover adequately in this course. The main goalof the lectures focusing on DSP will be to provide you withsome intuition behind the main concepts and techniques thatform the foundation of many MIR algorithms. I hope that theyserve as a seed for growing a long term passion and interestfor DSP and the textbook provides some pointers for furtherreading.

G. Tzanetakis 15 / 38

Sinusoids

We start our exposition with discussing sinusoids which areelementary signals that are crucial in understading both DSPconcepts and the mathematical notation used to understandthem. Our ultimate goal of the DSP lectures is to makeequations such as less intimidating and more meaningfull:

X (f ) =

∫ ∞−∞

x(t)e−j2πftdt (1)

G. Tzanetakis 16 / 38

What is a sinusoid ?

Family of elementary signals that have a particularshape/pattern of repetition.sin(ωt) and cosin(ωt) are particular examples of sinusoids thatcan be described by the more general equation:

x(t) = sin(ωt + φ) (2)

where ω is the frequency and φ is the phase. There is aninfinite number of continuous periodic signals that belong tothe sinusoid family. Each is characterized by three numbers:the amplitude the frequency and the phase.

G. Tzanetakis 17 / 38

Figure : Simple sinusoids

G. Tzanetakis 18 / 38

4 motivating viewpoints for sinusoids

Solutions to the differential equations that describesimple systems of vibration

Family of signals that pass “unchanged” through LTIsystems

Phasors (rotating vectors) providing geometric intutionabout DSP concepts and notation

Basis functions of the Fourier Transform

G. Tzanetakis 19 / 38

Simple vibration I

Consider striking the tine of a tuning fork. The tine willdeform, the be restored to the original position, the inertia willmake it overshoot and deform in the other direction and thepattern will repeat. At any particular displacement x Newton’ssecond law applies:

F = ma = −kx (3)

The accelaration is the second derivative of the displacement xwith respect to t so the equation can be rewritten:

d2x

dt2= −(k/m)x (4)

G. Tzanetakis 20 / 38

Sinusoids satisfy the equation

We are looking for a signal x(t) that satisfies the equationdescribing simple vibrations i.e we are looking for a signal thatis proportional to its second derivative.

d

dtsin(ωt) = ω cos(ωt)

d2

dt2sin(ωt) = −ω2 sin(ωt) (5)

So it turns out that sinusoidal signals arise as the solutions tothe physics equations that describe simple systems of vibrationthat can potentially generate sound.

G. Tzanetakis 21 / 38

Linear Time Invariant Systems

Definition

Systems are transformations of signals. They take a input asignal x(t) and produce a corresponding output signal y(t).Example: y(t) = [x(t)]2 + 5.

LTI Systems

Linearity means that one can calculate the output of thesystem to the sum of two input signals by summing the systemoutputs for each input signal individually. Formally ify1(t) = S{x1(t)} and y2(t) = S{x2(t)} thenS{x1(t) + x2(t)} = ysum(t) = y1(t) + y2(t). Time invarianceshift in input results in shift in output.

G. Tzanetakis 22 / 38

Sinusoids and LTI Systems

When a sinusoids of frequency ω goes through a LTI system it“stays” in the family of sinusoids of frequency ω i.e only theamplitude and the phase are changed by the system. Becauseof linearity this implies that if a complex signal is a sum ofsinusoids of different frequencies then the system output willnot contain any new frequencies. The behavior of the systemcan be completely understood by simply analyzing how itresponds to elementary sinusoids. Examples of LTI systems inmusic: guitar boy, vocal tract, outer ear, concert hall.

G. Tzanetakis 23 / 38

Thinking in circles

Key insight

Think of sinusoidal signal as a vector rotating at a constantspeed in the plane (phasor) rather than a single valued signalthat goes up and down.

Amplitude = Length

Frequency = Speed

Phase = Angle at time t

G. Tzanetakis 24 / 38

Projecting a phasor

The projection of the rotating vector or phasor on the x-axis isa cosine wave and on the y-axis a sine wave.

G. Tzanetakis 25 / 38

Notating a phasor

Complex numbers

An elegant notation system for describing and manipulatingrotating vectors.

x + jy

where x is called the real part and y is called the imaginarypart. If we represent a sinusoid as a rotating vector then usingcomplex number notation we can simply write:

cos(ωt) + jsin(ωt)

G. Tzanetakis 26 / 38

Multiplication by j

Multiplication by j is an operation of rotation in the plane. Youcan think of it as rotate +90 degrees counter-clockwise. Twosuccessive rotations by +90 degrees bring us to the negativereal axis, hence j2 = −1. This geometric viewpoint shows thatthere is nothing imaginary or strange about complex numbers.

G. Tzanetakis 27 / 38

Complex number multiplication

Complex number addition is the same as vector addition i.e weadd the x-coordinates (real parts) and the y-coordinates(imaginary parts). Where complex numbers draw their poweris when they are multiplied. Complex number multiplication iscan be done by following the rules of algebra blindly, andreplacing j2 with −1 when needed. However complex numbermultiplication makes more sense when we represent thecomplex numbers as vectors in polar form. When representedin polar form complex number multiplication has the propertythat the magnitude of the product is the product of themagnitudes and the angle of the product is the sum ofthe angles. This is the underlying reason why complexnumbers are a great notation for dealing with rotations.

G. Tzanetakis 28 / 38

Euler’s formula

Key insight

The rotating vector that represents a sinusoid is just a singlecomplex number raised to progressively higher and higherpowers.

Consider a rotating vector of unit magnitude. Let’s use E (θ)the function that represents the vector at some arbitrary angleθ. Then from simple geometry:

E (θ) = cos(θ) + j sin(θ)

anddE (θ)

dθ= − sin θ + j cos(θ) = jE (θ)

G. Tzanetakis 29 / 38

As can be seen this is a function for which the derivative isproportional to the original function and from calculus weknow that that only the exponential function has this propertyso we can write our function E (θ) as:

E (θ) = e jθ (6)

So now we can express the fact that a rotating vector arisingfrom simple harmonic motion can be notated as a complexnumber raised to higher and higher powers using the famousEuler formula named after the Swiss mathematician LeonardEuler (1707-1783):

e jθ = cos θ + j sin(θ) (7)

G. Tzanetakis 30 / 38

Complex Conjugate

A bit of notation that will be used later. Given a complexnumber z = Re jθ, its complex conjugate is defined asz∗ = Re−jθ. Geometrically z∗ is the reflection of z in the realaxis.

G. Tzanetakis 31 / 38

Adding sinusoids of the same frequency I

G. Tzanetakis 32 / 38

Adding sinusoids of the same frequency II

Geometric view of the property that sinusoids (phasors) of aparticular frequency ω are closed under addition.

G. Tzanetakis 33 / 38

Negative frequencies and phasors

G. Tzanetakis 34 / 38

Measuring the amplitude of a sinusoid

G. Tzanetakis 35 / 38

Other DSP concepts with phasors

Many DSP concepts can be visualized and understood nicelyusing phasors. It is fun to create animations similar to theones I showed in this lecture to illustrate concepts such as:

Sampling, nyquist frequency and aliasing (takingsnapshots of the phasor as it goes around the circle)

Filtering (effect of simple low-pass filter)

Beating (phasors that are close in frequency)

G. Tzanetakis 36 / 38

Book that inspired this DSP exposition

A Digital Signal ProcessingPrimer by Ken Steiglitz

G. Tzanetakis 37 / 38

Summary

Sinusoidal signals are fundamental in understanding DSP

Representing them as phasors (i.e vectors rotating at aconstant speed) can help understand intuitively severalconcepts in DSP

Complex numbers are an elegant system for expressingrotations and can be used to notate phasors in a way thatleverages our knowledge of algebra

Thinking this way makes e jωt more intuitive.

G. Tzanetakis 38 / 38

Recommended