Download pdf - MUS421/EE367BLecture2 ReviewoftheDiscreteFourierTransform(DFT)jos/ReviewFourier/ReviewFourier.pdf · MUS421/EE367BLecture2 ReviewoftheDiscreteFourierTransform(DFT) ... Y’ in Matlab,

MUS421/EE367B Lecture 2Review of the Discrete Fourier Transform (DFT)

Julius O. Smith III ([email protected])Center for Computer Research in Music and Acoustics (CCRMA)

Department of Music, Stanford UniversityStanford, California 94305

April 10, 2018

Outline

• Domains of Definition

• Discrete Fourier Transform

• Properties of the Fourier Transform

For more details, see

•Mathematics of the DFT (Music 320 text):http://ccrma.stanford.edu/~jos/mdft/

• Chapter 2 and Appendix B ofSpectral Audio Signal Processing (our text):http://ccrma.stanford.edu/~jos/sasp/

1

http://ccrma.stanford.edu/~jos

http://ccrma.stanford.edu/

http://www.stanford.edu/group/Music/

http://www.stanford.edu/

http://ccrma.stanford.edu/~jos/mdft/

http://ccrma.stanford.edu/~jos/sasp/Fourier_Transforms_Continuous_Discrete_Time_Frequency.html

http://ccrma.stanford.edu/~jos/sasp/More_Fourier_Theorems.html

http://ccrma.stanford.edu/~jos/sasp/

Domains of Definition

The Fourier Transform can be defined for signals that are

• Discrete or Continuous Time

• Finite or Infinite Duration

This results in four cases:

Time DurationFinite Infinite

Fourier Series (FS) Fourier Transform (FT) cont.

X(k) =

∫ P

0

x(t)e−jωktdt X(ω) =

∫ +∞

−∞

x(t)e−jωtdt time

k = −∞, . . . ,+∞ ω ∈ (−∞,+∞) tDiscrete FT (DFT) Discrete Time FT (DTFT) discr.

X(k) =N−1∑

n=0

x(n)e−jωkn X(ω) =+∞∑

n=−∞

x(n)e−jωn time

k = 0, 1, . . . , N − 1 ω ∈ (−π,+π) n

discrete freq. k continuous freq. ω

2

Geometric Interpretation of the FourierTransform

In all four cases,

X(ω) = 〈x, sω〉

where sω is a complex sinusoid at radian frequency ωrad/s:

• ejωt (Fourier transform case),

• ejωn (DTFT case),

• ej2πkn/N (DFT case).

Geometrically, X(ω) = 〈x, sω〉 is proportional to thecoefficient of projection of the signal x onto the signal sω.

3

Signal and Transform Notation

• n, k ∈ Z (integers) or ZN (integers modulo N)

• x(n) ∈ R (reals) or C (complex numbers)

• x ∈ CN means x is a length N complex sequence

• x = x(·)

• X = DFT(x) ∈ CN , or

x↔ X

where “↔” is read as “corresponds to”.

• X(k) = DFTk(x) = DFTN,k(x) ∈ C

• x(n) = IDFTn(X) = IDFTN,n(X)

• For x ∈ C∞, X = DTFT(x) = DFT∞(x) ∈ C

∞2π

• x = conjugate of x

• ∠x = phase of x

The notation XY or X · Y denotes the vector containing(XY )k = X(k)Y (k), k = 0, . . . , N − 1. This is denotedby ‘X .* Y’ in Matlab, where X and Y may a pair ofcolumn vectors, or a pair of row vectors.

4

The Discrete Fourier Transform

The “kth bin” of the Discrete Fourier Transform (DFT)is defined as

X(k)∆= DFTk(x)

∆= 〈x, sk〉

∆=

N−1∑

n=0

x(tn)e−jωktn

sk(n)∆= ejωktn; k = 0, 1, . . . , N − 1

ωk∆= 2π

k

Nfs =

2πk

NT; tn

∆= nT

We may interpret the DFT as the coefficients of

projection of the signal vector x onto the N sinusoidalbasis signals sk, k = 0, 1, . . . , N − 1:

X(k) = 〈x, sk〉

5

Inverse DFT

The inverse DFT is given by

x(tn) =

N−1∑

k=0

〈x, sk〉

‖ sk ‖2sk(tn) =

1

N

N−1∑

k=0

X(ωk)ejωktn

It can be interpreted as the superposition of the

projections, i.e., the sum of the sinusoidal basis signalsweighted by their respective coefficients of projection:

x =∑

k

〈x, sk〉

‖ sk ‖2sk

6

The DFT, Cont’d

There are several ways to think about the DFT:

1. Projection onto the set of “basis” sinusoids(frequencies at N roots of unity)

2. Coordinate transformation (“natural” RN basis to“sinusoidal” basis)

3. Matrix multiplication X = W∗x,where W∗[k, n] = e−jωktn

4. Sampled uniform filter bank output

This course will emphasize interpretations 1 and 4.

7

Properties of the DFT

We are going to be performing manipulations on signalsand their Fourier Transform throughout this class. It isimportant to understand how changes we make in onedomain affect the other domain. The Fourier theoremsare helpful for this purpose.

Derivations of the Fourier theorems for the DTFT casemay be found in Chapter 2 of the text, and inMathematics of the DFT1 (Music 320 text) for theDFT case.

1http://ccrma.stanford.edu/~jos/mdft/Fourier Theorems.html

8

http://ccrma.stanford.edu/~jos/sasp/

http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems.html

Linearity

αx1 + βx2 ↔ αX1 + βX2

or

DFT(αx1 + βx2) = α ·DFT(x1) + β ·DFT(x2)

α, β ∈ C

x1, x2, X1, X2 ∈ CN

The Fourier Transform “commutes with mixing.”

9

Symmetries for Real Signals

If a time-domain signal x is real, then its Fouriertransform X is conjugate symmetric (Hermitian):

x ∈ RN ⇔ X(−k) = X(k)

orReal↔ Hermitian

Hermitian symmetry implies

• Real part Symmetric (even):

re X(−k) = re X(k)

• Imaginary part Antisymmetric (skew-symmetric, odd):

im X(−k) = −im X(k)

• Magnitude Symmetric (even):

|X(−k)| = |X(k)|

• Phase Antisymmetric (odd):

∠X(−k) = −∠X(k)

10

Time Reversal

Definition:

Flipn(x)∆= x(−n)

∆= x(N − n)

Note: x(n)∆= x(nmodN) for signals in C

N (DFTcase).

When computing a sampled DTFT using the DFT, weinterpret time indices n = 1, 2, . . . , N/2− 1 as positivetime indices, and n = N − 1, N − 2, . . . , N/2 as thenegative time indices n = −1,−2, . . . ,−N/2. Underthis interpretation, the Flip operator simply reverses asignal in time.

Fourier theorems:

Flip(x)↔ Flip(X)

for x ∈ CN . In the typical special case of real signals

(x ∈ RN), we have Flip(X) = X so that

Flip(x)↔ X

Time-reversing a real signal conjugates its spectrum

Shift Theorem

11

The Shift operator is defined as Shiftl,n(y)∆= y(n− l).

Since indexing is defined modulo N , Shiftl(y) is acircular right-shift by l samples.

Shiftl(y)↔ e−j(·)lY

or, more loosely,

y(n− l)↔ e−jωlY (ω)

i.e.,DFTk[Shiftl(y)] =

(

e−jωkl)

Y (ωk)

e−jωkl = Linear Phase Term, slope = −l

• ∠Y (ωk) += − ωkl

• Multiplying a spectrum Y by a linear phase terme−jωkl with phase slope −l corresponds to a circular

right-shift in the time domain by l samples:

• negative slope ⇒ time delay

• positive slope ⇒ time advance

12

Convolution

The cyclic convolution of x and y is defined as

(x ∗ y)(n)∆=

N−1∑

m=0

x(m)y(n−m), x, y ∈ CN

Cyclic convolution is also called circular convolution,

since y(n−m)∆= y(n−m (mod N)).

Convolution is cyclic in the time domain for the DFT andFS cases, and acyclic for the DTFT and FT cases.

The Convolution Theorem is then

(x ∗ y)↔ X · Y

13

Linear Convolution of Short Signals

hx(t) y(t) = (x∗ h)(t)

Convolution theorem for DFTs:

(h ∗ x)↔ H ·X

orDFTk(h ∗ x) = H(ωk)X(ωk)

where h, x ∈ CN , and H and X are the N -point DFTs

of h and x, respectively.

DFT performs circular (or cyclic) convolution:

y(n)∆= (x ∗ h)(n)

∆=

N−1∑

m=0

x(m)h(n−m)N

where (n−m)N means “(n−m) modulo N”

Another way to look at this is as the inner product of x,and Shiftn[Flip(h)], i.e.,

y(n) = 〈x,Shiftn[Flip(h)]〉

14

FFT Convolution

The convolution theorem h ∗ x↔ H ·X shows us thatthere are two ways to perform circular convolution.

• direct calculation of the summation = O(N 2)

• frequency-domain approach = O(N lgN)

• Fourier Transform both signals

• Perform term by term multiplication of thetransformed signals

• Inverse transform the result to get back to thetime domain

Remember ... this still gives us cyclic convolution

Idea: If we add enough trailing zeros to the signalsbeing convolved, we can get the same results as in acyclic

convolution (in which the convolution summation goesfrom m = 0 to ∞).

Question: How many zeros do we need to add?

∗ =

Nx +Nh -1Nx Nh

N N N

15

• If we perform an acyclic convolution of two signals, xand h, with lengths Nx and Nh, the resulting signal islength Ny = Nx +Nh − 1.

• Therefore, to implement acyclic convolution using theDFT, we must add enough zeros to x and y so thatthe cyclic convolution result is length Ny or longer.

• If we don’t add enough zeros, some of ourconvolution terms “wrap around” and add backupon others (due to modulo indexing).

• This can be called time domain aliasing.

• We typically zero-pad even further (to the next powerof 2) so we can use the Cooley-Tukey FFT formaximum speed

A sampling-theorem based insight:

Zero-padding in the time domain results in more samples(closer spacing) in the frequency domain. This can bethought of as a higher ‘sampling rate’ in the frequencydomain. If we have a high enough frequency-domainsampling rate, we can avoid time domain aliasing.

16

Example FFT Convolution

% matlab/fftconvexample.m

x = [1 2 3 4 5 6];

h = [1 1 1];

nx = length(x);

nh = length(h);

nfft = 2^nextpow2(nx+nh-1)

xzp = [x, zeros(1,nfft-nx)];

hzp = [h, zeros(1,nfft-nh)];

X = fft(xzp);

H = fft(hzp);

Y = H .* X;

y = real(ifft(Y))

Program output:

octave:10> fftconvexample

nfft = 8

y =

1 3 6 9 12 15 11 6

17

FFT Convolution vs. Direct Convolution

Let’s compare the number of operations needed toperform the convolution of

2 length N sequences:

• It takes ≈ N 2 multiply/add operations to calculatethe convolution summation directly.

• It takes on the order of N · log(N) operations tocompute an FFT. (Note: H(ωk) can be calculated inadvance for time-invariant filtering.)

N FFT Direct Convolution

4 176 1632 2560 102464 5888 4096128 13,312 16,384256 29,696 65,5362048 311,296 4,194,304

In this example (from Strum and Kirk), the FFT(software) beats direct time-domain convolution at length128 and higher

18

Correlation

The cross-correlation of x and y in CN is defined as:

(x ⋆ y)(n)∆=

N−1∑

m=0

x(m)y(n +m), x, y ∈ CN

Using this definition we have the correlation theorem:

(x ⋆ y)↔ X(ωk)Y (ωk)

The correlation theorem is often used in the context ofspectral analysis of filtered noise signals.

Autocorrelation

The autocorrelation of a signal x ∈ CN is simply the

cross-correlation of x with itself:

(x ⋆ x)(n)∆=

N−1∑

m=0

x(m)x(m + n), x ∈ CN

From the correlation theorem, we have

(x ⋆ x)↔ |X(ωk)|2

19

Power Theorem

The inner product of two signals is defined as:

〈x, y〉∆=

∑

n

xnyn

Using this notation, we have the following:

〈x, y〉 =1

N〈X, Y 〉

When we consider the inner product of a signal with itself,we have a special case known as Parseval’s Theorem:

‖x‖2 = 〈x, x〉 =1

N〈X,X〉 =

‖X‖2

N

(Also called the Rayleigh’s Energy Theorem.)

20

Stretch

We define the Stretch operator such that:

StretchL : CN → CNL

Which means that it transforms a length N complexsignal, into a length NL signal. Specifically, we do thisby inserting L− 1 zeros in between each pair of samplesof the signal.

...

x

y = Stretch2(x) →

...

y

21

Repeat or Scale

Similarly, the RepeatL operator, defined on the unitcircle, frequency-scales its input spectrum by the factor L:

ω ← Lω

The original spectrum is repeated L times as ω traversesthe unit circle. This is illustrated in the following diagramfor L = 3:

X

Y = REPEAT3(X) →

Y

ωω

Using these definitions, we have the Stretch Theorem:

StretchL(x)↔ RepeatL(X)

Application: Upsampling by any integer factor L:Passing the stretched signal through an ideal lowpassfilter cutting off at ω ≥ π/L yields ideal bandlimitedinterpolation of the original signal by the factor L.

22

Zero-Padding ↔ Interpolation

Zero padding in the time domain corresponds to ideal

interpolation in the frequency domain.

Proof:http://ccrma.stanford.edu/~jos/mdft/Zero Padding Theorem Spectral.html

Downsampling ↔ Aliasing

The downsampling operation DownsampleM selectsevery M th sample of a signal:

DownsampleM,n(x)∆= x(Mn)

In the DFT case, DownsampleM maps CN to CNM ,

while for the DTFT, DownsampleM maps C∞ to C∞.

The Aliasing Theorem states that downsampling in timecorresponds to aliasing in the frequency domain:

DownsampleM(x)↔1

MAliasM(X)

where the Alias operator is defined for X ∈ CN

23

http://ccrma.stanford.edu/~jos/mdft/Zero_Padding_Theorem_Spectral.html

(DFT case) as

AliasM,l(X)∆=

M−1∑

k=0

X

(

l + kN

M

)

, l = 0, 1, . . . ,N

M−1

For X ∈ C∞ (DTFT case), the Alias operator is

AliasM,ω(X)∆=

M−1∑

k=0

X(

ej(ωM+k2πM )

)

, −π ≤ ω < π

∆=

M−1∑

k=0

X(

W kMz

1M

)

where WM∆= ej2π/M is a common notation for the

primitive M th root of unity, and z = ejω as usual. Thisnormalization corresponds to T = 1 after downsampling.Thus, T = 1/M prior to downsampling.

The summation terms above for k 6= 0 are called aliasing

components.

The aliasing theorem points out that in order todownsample by factor M without aliasing, we must firstlowpass-filter the spectrum to [−πfs/M, πfs/M ]. Thisfiltering essentially zeroes out the spectral regions whichalias upon sampling.

24

Ideal Spectral Interpolation

Recall:

X(ω)∆= 〈x, sω〉

where

sω(t)∆= ejωt (FT)

sω(tn)∆= ejωtn

∆= ejωn (DTFT)

For signals in the DTFT domain which happen to be timelimited to n ∈ [−N/2, N/2− 1],

X(ω)∆= 〈x, sω〉 =

∞∑

n=−∞

x(n)e−jωn =

N/2−1∑

n=−N/2

x(n)e−jωn

• This can be interpreted as a 0-centered DFTevaluated at ω instead of ωk = 2πk/N

• It arises as the DTFT of a finite-length signal

• Same as DFT plus infinite zero padding

• Such signals can be sampled at ω = ωk = 2πk/Nwithout loss of information

25

Meaning of Spectral Interpolation

• Let X(ωk) denote the spectrum to be interpolated.

• Then the corresponding time signal isx = IDFTN(X).

• We define the spectral interpolation X(ω) as theprojection of our signal x onto an arbitrary sinusoidsω = ejωnT .

• This is equivalent to X(ω) = DTFTω(x):

X(ω)∆= 〈x, sω〉 =

∑

n

x(n)e−jωnT

= DTFTω· · · 0, x, 0, . . .

≈ FFTωkZeroPadLx

for some sufficiently large zero-padding factor L.

• In the Quadratically Interpolated FFT (QIFFT)method for measuring parameters of spectral peaks,we will choose L to be sufficient in conjunction withquadratic interpolation of spectral log magnitude

samples at each peak

26

Interpolating a DFT

Starting with a sampled spectrum X(ωk),k = 0, 1, . . . , N − 1, we may interpolate ideally by takingthe DTFT of the zero-padded IDFT:

X(ω) = DTFTω(ZeroPad∞(IDFTN(X)))

∆=

N/2−1∑

n=−N/2

[

1

N

N−1∑

k=0

X(ωk)ejωkn

]

e−jωn

=

N−1∑

k=0

X(ωk)

1

N

N/2−1∑

n=−N/2

ej(ωk−ω)n

=

N−1∑

k=0

X(ωk)asincN(ω − ωk)

=⟨

X,SampleΩN(Shiftω(asincN))

⟩

= (X ⊛ asincN)ω,

where ⊛ denotes convolution between a discrete (X) andcontinuous (asinc) signal. (If math operators adapt totheir argument types like perl functions, we can simplyuse ∗ as usual.)

• Zero-padding in the time domain corresponds to“asincN interpolation” in the frequency domain

• This is “ideal time-limited spectral interpolation”

27

Practical Zero Padding

To interpolate a uniformly sampled spectrum X(ωk) bythe factor L, we may take the inverse DFT, appendzeros, and take the FFT (which is very fast):

X(ωl) = FFTLN,l(ZeroPadLN(IDFTN(X))),

l = 0, . . . , LN − 1

This operation creates L− 1 new bins between each pairof original bins in X , thus increasing the number ofspectral samples around the unit circle from N to LN .

In matlab, we can specify zero-padding by simplyproviding the optional FFT-size argument:

X = fft(x,N); % FFT size N > length(x)

28

Reasons for Zero Padding(Spectral Interpolation)

• Zero-padding makes our FFTs look like DTFTs whendisplaying spectra.

• Zero-padding enables us to use the FFT with anywindow length M . When M is not a power of 2, weappend enough zeros to make the FFT size N > M apower of 2.

• For sinusoidal peak-finding, spectral interpolation viazero-padding gets us closer to the true maximum ofthe main lobe when we simply take themaximum-magnitude FFT-bin as our estimate.

29

Zero Padding Examples

Let’s look at the effect of zero padding on the Fouriertransform of the popular (causal) Hamming window:

w(n) = 0.54− 0.46 cos

(

2πn

M

)

, n = 0, 1, 2, . . .M − 1

where M = 21 in our examples.

We will look at shifts of the

• critically sampled window transform W (ωk−ω0), and

• 2× oversampled window transform W (ωk′ − ω0)

where ω0 = 2π · 3/M = 2π/7 ≈ 0.9 rad/samp is thenormalized radian frequency of the test sinusoid to whichthe window is applied.

30

Critically Sampled Hamming Window Transform

Consider performing a length M DFT on a length Mwindowed signal:

• N∆= DFT size = M

∆= Window length

• DFT frequency samples at ωk = k2πM

(critically sampled DTFT)

• Window sequence and windowed-sinusoid spectrum:

0 2 4 6 8 10 12 14 16 18 200

0.2

0.4

0.6

0.8

1Causal Hamming window − M = 21 − no zero padding

Time (samples)

Am

plit

ude

(a)

0 0.5 1 1.5 2 2.5 3−60

−50

−40

−30

−20

−10

Normalized Frequency (radians/sample)

Magnitude (

dB

)

(b)

• DFT bin width = 2πN = 2π

M (critically sampled)

• 4 samples per main lobe (Hamming window)

31

2X Oversampled Hamming Window Transform

Let’s now zero-pad by a factor of 2 in the time domain,before we perform our DFT:

• Zero-padding factor L∆= N

M = 2

• N = DFT size = 2M

• DFT frequency samples at ωk′ = k′2πN = k′ 2π2M

Causal zero-padding by a factor of two (L = 2):

0 5 10 15 20 25 30 35 40 450

0.2

0.4

0.6

0.8

1Causal Hamming window − M = 21 − zero padding factor = 2

Time (samples)

Am

plit

ud

e

(a)

0 0.5 1 1.5 2 2.5 3−60

−50

−40

−30

−20

−10


Ma

gn

itu

de

(d

B)

(b)

• DFT bin width = 1L2πM = 2π

2M (2× oversampled)

• 8 samples per main lobe (Hamming window)

32

Oversampled Spectral Peaks

Note that zero-padding helps in finding the true peak ofthe sampled window transform.

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3

Magnitude (

linea

r)


zero pad factor = 8

33

Zero-Centered Zero-Padding

−15 −10 −5 0 5 10 15−1

−0.5

0

0.5

1Blackman Windowed Sinusoid

Time (samples)

Am

plit

ud

e

(a)

0 10 20 30 40 50 60−1

−0.5

0

0.5

1

Time (samples)

Am

plit

ud

e

positive time negative time

(b)

(a) Blackman window overlaid with windowed data.(b) Zero-padded and loaded into FFT input buffer.

• Use zero-centered zero padding with zero-phasewindows

• Use causal zero padding with causal windows

34

Zero-Centered Spectra

0 10 20 30 40 50 600

2

4

6

8

positive frequencies negative frequencies

Frequency (bins))

Magnitude (

linear)

(a)

−30 −20 −10 0 10 20 300

2

4

6

8

negative frequencies positive frequencies

Frequency (bins))

Magnitude (

linear)

(b)

(a) FFT magnitude data, as returned by the FFT.(b) FFT magnitude spectrum “rotated” to a more

“physical” frequency axis in bin numbers.

35

fftshift

Matlab and Octave have a simple utility called fftshift

that performs this bin rotation. Consider the followingexample:

octave:4>

fftshift([1 2 3 4])

ans =

3 4 1 2

octave:5>

Note that both Matlab and Octave regard the spectralsample at half the sampling rate as a negative frequency.

For odd N , the only reasonable answer is

octave:4>

fftshift([1 2 3])

ans =

3 1 2

octave:5>

corresponding to frequencies −fs/3, 0, fs/3, respectively.

36