CSC475 Music Information Retrievalmarsyas.cs.uvic.ca/mirBook/csc475_sinusoids.pdfCSC475 Music...

CSC475 Music Information RetrievalSinusoids and DSP notation

George Tzanetakis

University of Victoria

G. Tzanetakis 1 / 38

Table of Contents I

1 Time and Frequency

2 Sinusoids and Phasors

Motivation

Frequently the computer science students who take this coursehave no background in Digital Signal Processing (DSP) so Ialways try to do a few lectures introducing some DSPfundamentals. An introduction to DSP typically requires anentire course and learning DSP is a life long pursuit so whatone can do in a few lectures is rather limited. My goal is tostress intuition and attempt to demystify the basics of themathematical notation used. In addition, DSP contains somebeautiful mathematical ideas that connect the continuousmathematics of the physical world with the discretemathematics needed by computers. I hope this material willmotivate you to learn more about DSP.

Digital Audio Recordings

Recordings in analog media (like vinyl or magnetic tape)degrade over time

Digital audio representations theoretically can remainaccurate without any loss of information through copyingof patterns of bits.

MIR requires a distilling information from an extremelylarge amount of data

Digitally storing 3 minutes of audio requiresapproximately 16 million numbers. A tempo extractionprogram must somehow convert these to a singlenumerical estimate of the tempo.

Production and Perception of Periodic Sounds

Animal sound generation and perception

The sound generation and perception systems of animals haveevolved to help them survive in their environment. From anevolutionary perspective the intentional sounds generated byanimals should be distinct from the random sounds of theenvironment.

Repetition

Repetition is a key property of sounds that can make themmore identifiable as coming from other animals (predators,prey, potential mates) and therefore animal hearing systemshave evolved to be good at detecting periodic sounds.

Pitch Perception

When the same sound is repeated more than 10-20 times persecond instead of it being perceived as a sequence of individualsound events it is fused into a single sonic event with aproperty we call pitch that is related to the underlying periodof repetition. Note that this fusion is something that ourperception does rather than reflect some underlying singalchange other than the decrease of the repetition period.

Time-Frequency Representations

Music Notation

When listening to mixtures of sounds (including music) we areinterested in when specific sounds take place (time) and whatis their source of origin (pitch, timbre). This is also reflectedin music notation which fundamentally represents time fromleft to right and pitch from bottom to top.

Spectrum

Informal definition of Spectrum

A fundamental concept in DSP is the notion of a spectrum.Informally complex sounds such as the ones produced bymusical instruments and their combinations can be modeled aslinear combinations of simple elementary sinusoidal signalswith different frequencies. A spectrum shows how “much”each such basis sinusoidal component contributes to theoverall mixture. It can be used to extract information aboutthe sound such as its perceived pitch or what instrument(s)are playing. A spectrum corresponds to a short snapshot ofthe sound in time.

Spectrum example

Spectrum of a tenor saxophone note

Spectrograms

Music and sound change over time. A spectrum does notprovide any information about the time evolution of differentfrequencies. It just shows the relative contribution of eachfrequency to the mixture signal over the duration analyzed.In order to capture the time evolution of sound and music thestandard approach is to segment the audio signal into smallchunks (called windows or frames) and calculate the spectrumfor each of these windows. The assumption is that during therelatively short period of analysis (typically less than a second)there is not much change and therefore the calculatedshort-time spectrum is an accurate representation of theunderlying signal. The resulting sequence of spectra over timeis called a spectrogram.

Examples of spectrograms

Spectrogram of a few tenor saxophone notes

Waterfall spectrogram view

Waterfall display using sndpeek

Table of Contents I

1 Time and Frequency

2 Sinusoids and Phasors

Why is DSP important for MIR ?

A large amount of MIR research deals with audio signals.

Audio signals are represented digitally as very longsequences of numbers.

Digital Signal Processing techniques are essential inextracting information from audio signals.

The mathematical ideas behind DSP are amazing. Forexample it is through DSP that you can understand howany sound that you can hear can be expressed as a sum ofsine waves or represented as a long sequence of 1’s and0’s.

DSP for MIR

Digital Signal Processing is a large field and thereforeimpossible to cover adequately in this course. The main goalof the lectures focusing on DSP will be to provide you withsome intuition behind the main concepts and techniques thatform the foundation of many MIR algorithms. I hope that theyserve as a seed for growing a long term passion and interestfor DSP and the textbook provides some pointers for furtherreading.

Sinusoids

We start our exposition with discussing sinusoids which areelementary signals that are crucial in understading both DSPconcepts and the mathematical notation used to understandthem. Our ultimate goal of the DSP lectures is to makeequations such as less intimidating and more meaningfull:

X (f ) =

∫ ∞−∞

x(t)e−j2πftdt (1)

What is a sinusoid ?

Family of elementary signals that have a particularshape/pattern of repetition.sin(ωt) and cosin(ωt) are particular examples of sinusoids thatcan be described by the more general equation:

x(t) = sin(ωt + φ) (2)

where ω is the frequency and φ is the phase. There is aninfinite number of continuous periodic signals that belong tothe sinusoid family. Each is characterized by three numbers:the amplitude the frequency and the phase.

Figure : Simple sinusoids

4 motivating viewpoints for sinusoids

Solutions to the differential equations that describesimple systems of vibration

Family of signals that pass “unchanged” through LTIsystems

Phasors (rotating vectors) providing geometric intutionabout DSP concepts and notation

Basis functions of the Fourier Transform

Simple vibration I

Consider striking the tine of a tuning fork. The tine willdeform, the be restored to the original position, the inertia willmake it overshoot and deform in the other direction and thepattern will repeat. At any particular displacement x Newton’ssecond law applies:

F = ma = −kx (3)

The accelaration is the second derivative of the displacement xwith respect to t so the equation can be rewritten:

dt2= −(k/m)x (4)

Sinusoids satisfy the equation

We are looking for a signal x(t) that satisfies the equationdescribing simple vibrations i.e we are looking for a signal thatis proportional to its second derivative.

dtsin(ωt) = ω cos(ωt)

dt2sin(ωt) = −ω2 sin(ωt) (5)

So it turns out that sinusoidal signals arise as the solutions tothe physics equations that describe simple systems of vibrationthat can potentially generate sound.

Linear Time Invariant Systems

Definition

Systems are transformations of signals. They take a input asignal x(t) and produce a corresponding output signal y(t).Example: y(t) = [x(t)]2 + 5.

LTI Systems

Linearity means that one can calculate the output of thesystem to the sum of two input signals by summing the systemoutputs for each input signal individually. Formally ify1(t) = S{x1(t)} and y2(t) = S{x2(t)} thenS{x1(t) + x2(t)} = ysum(t) = y1(t) + y2(t). Time invarianceshift in input results in shift in output.

Sinusoids and LTI Systems

When a sinusoids of frequency ω goes through a LTI system it“stays” in the family of sinusoids of frequency ω i.e only theamplitude and the phase are changed by the system. Becauseof linearity this implies that if a complex signal is a sum ofsinusoids of different frequencies then the system output willnot contain any new frequencies. The behavior of the systemcan be completely understood by simply analyzing how itresponds to elementary sinusoids. Examples of LTI systems inmusic: guitar boy, vocal tract, outer ear, concert hall.

Thinking in circles

Key insight

Think of sinusoidal signal as a vector rotating at a constantspeed in the plane (phasor) rather than a single valued signalthat goes up and down.

Amplitude = Length

Frequency = Speed

Phase = Angle at time t

Projecting a phasor

The projection of the rotating vector or phasor on the x-axis isa cosine wave and on the y-axis a sine wave.

Notating a phasor

Complex numbers

An elegant notation system for describing and manipulatingrotating vectors.

x + jy

where x is called the real part and y is called the imaginarypart. If we represent a sinusoid as a rotating vector then usingcomplex number notation we can simply write:

cos(ωt) + jsin(ωt)

Multiplication by j

Multiplication by j is an operation of rotation in the plane. Youcan think of it as rotate +90 degrees counter-clockwise. Twosuccessive rotations by +90 degrees bring us to the negativereal axis, hence j2 = −1. This geometric viewpoint shows thatthere is nothing imaginary or strange about complex numbers.

Complex number multiplication

Complex number addition is the same as vector addition i.e weadd the x-coordinates (real parts) and the y-coordinates(imaginary parts). Where complex numbers draw their poweris when they are multiplied. Complex number multiplication iscan be done by following the rules of algebra blindly, andreplacing j2 with −1 when needed. However complex numbermultiplication makes more sense when we represent thecomplex numbers as vectors in polar form. When representedin polar form complex number multiplication has the propertythat the magnitude of the product is the product of themagnitudes and the angle of the product is the sum ofthe angles. This is the underlying reason why complexnumbers are a great notation for dealing with rotations.

Euler’s formula

Key insight

The rotating vector that represents a sinusoid is just a singlecomplex number raised to progressively higher and higherpowers.

Consider a rotating vector of unit magnitude. Let’s use E (θ)the function that represents the vector at some arbitrary angleθ. Then from simple geometry:

E (θ) = cos(θ) + j sin(θ)

anddE (θ)

dθ= − sin θ + j cos(θ) = jE (θ)

As can be seen this is a function for which the derivative isproportional to the original function and from calculus weknow that that only the exponential function has this propertyso we can write our function E (θ) as:

E (θ) = e jθ (6)

So now we can express the fact that a rotating vector arisingfrom simple harmonic motion can be notated as a complexnumber raised to higher and higher powers using the famousEuler formula named after the Swiss mathematician LeonardEuler (1707-1783):

e jθ = cos θ + j sin(θ) (7)

Complex Conjugate

A bit of notation that will be used later. Given a complexnumber z = Re jθ, its complex conjugate is defined asz∗ = Re−jθ. Geometrically z∗ is the reflection of z in the realaxis.

Adding sinusoids of the same frequency I

Adding sinusoids of the same frequency II

Geometric view of the property that sinusoids (phasors) of aparticular frequency ω are closed under addition.

Negative frequencies and phasors

Measuring the amplitude of a sinusoid

Other DSP concepts with phasors

Many DSP concepts can be visualized and understood nicelyusing phasors. It is fun to create animations similar to theones I showed in this lecture to illustrate concepts such as:

Sampling, nyquist frequency and aliasing (takingsnapshots of the phasor as it goes around the circle)

Filtering (effect of simple low-pass filter)

Beating (phasors that are close in frequency)

Book that inspired this DSP exposition

A Digital Signal ProcessingPrimer by Ken Steiglitz

Summary

Sinusoidal signals are fundamental in understanding DSP

Representing them as phasors (i.e vectors rotating at aconstant speed) can help understand intuitively severalconcepts in DSP

Complex numbers are an elegant system for expressingrotations and can be used to notate phasors in a way thatleverages our knowledge of algebra

Thinking this way makes e jωt more intuitive.

CSC475 Music Information Retrievalmarsyas.cs.uvic.ca/mirBook/csc475_sinusoids.pdfCSC475 Music...

Documents

Music Information Retrieval - marsyas.cs.uvic.camarsyas.cs.uvic.ca/mirBook/mirBook_jan05_2015.pdf · 5.4 The top panel depicts the time domain representation of a frag-ment of a polyphonic

Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

MANIPULATION, ANALYSIS AND RETRIEVALgtzan/work/pubs/thesis02gtzan.pdf · manipulation, analysis and retrieval systems for audio signals george tzanetakis a dissertation presented

Furman Universitycs.furman.edu/~tallen/csc475/materials/Turing_Report_on... · 2006. 3. 3. · 2ho of Valve Elements . of Stora¿e . 1. Introductow. ... In pas t it has no b een poa

Automatic Music Genre Classification of Audio Signals George Tzanetakis, Georg Essl & Perry Cook

Learning-based Cooperative Sound Event Detection with Edge ...jrwang/file/IPCCC_slides.pdf · Jingrong Wang†, Kaiyang Liu†∗, George Tzanetakis†, Jianping Pan† †Department

CSC475 Music Information Retrieval - Tags and Musicmarsyas.cs.uvic.ca/mirBook/csc475_tagging.pdf · 2017-09-25 · The Music Information Retrieval Evaluation Exchange (MIREX) audio

Robert R. Martin and Ioannis E. Tzanetakis USDA-ARS ... Strawberry Virus... · Commercial strawberry (Fragaria × ananassa Duchesne), which originated in Europe around 1750, is a

Robert R. Martin and Ioannis E. Tzanetakis USDA-ARS ...fpms.ucdavis.edu/WebSitePDFs/Articles/06 Strawberry... · USDA-ARS Horticultural Crops Research Laboratory, Corvallis, OR

From rat laughter to speed metal - a personal history of …aalbu/elec310_2010/tzanetakis guest lecture.pdf · Vocoder = voice encoder (Youtube: Herbie Hancock - I thought it was

Is Quantum Computing for Real? - Furman Universitycs.furman.edu/~tallen/csc475/materials/tichy.pdfIs Quantum Computing for Real? An Interview with Catherine McGeoch of D-Wave Systems

Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon

Portfolio Theodoros Tzanetakis

Music Genre Classiﬁcation Revisited: An In-Depth ... · on saxophones and Kenny Kirkland on pianos. All these Sting songs are classiﬁed as rock by Tzanetakis, whereas participants

Double-StrandedRNAs and their Use for Characterization ... · their Use for Characterization ofRecalcitrant Viruses R. R. Martin, W. Jelkmann, and I. E. Tzanetakis Introduction

3D GRAPHICS TOOLS FOR SOUND COLLECTIONS George Tzanetakis …gtzan/work/pubs/dafx00gtzan.pdf · 3D GRAPHICS TOOLS FOR SOUND COLLECTIONS George Tzanetakis Computer Science Department

The Role of the 3’ UTR of Dulcamara mottle virus RNA in Translation Alma Laney Dr. Yannis Tzanetakis Dr. Theo Dreher

Studies on Viruses Infecting Mint Germplasm J.D. Postman I.A. Tzanetakis R.R. Martin USDA Agricultural Research Service and Oregon State University Corvallis,

Characterization of Three Novel Viruses Infecting Raspberry...uncharacterized viruses (Tzanetakis et al., 2005). Virus-free plants very rarely have high molecular weight dsRNAs, which

Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research