MUS421/EE367B Lecture 2Review of the Discrete Fourier Transform (DFT)
Julius O. Smith III ([email protected])Center for Computer Research in Music and Acoustics (CCRMA)
Department of Music, Stanford UniversityStanford, California 94305
April 10, 2018
Outline
• Domains of Definition
• Discrete Fourier Transform
• Properties of the Fourier Transform
For more details, see
•Mathematics of the DFT (Music 320 text):http://ccrma.stanford.edu/~jos/mdft/
• Chapter 2 and Appendix B ofSpectral Audio Signal Processing (our text):http://ccrma.stanford.edu/~jos/sasp/
1
Domains of Definition
The Fourier Transform can be defined for signals that are
• Discrete or Continuous Time
• Finite or Infinite Duration
This results in four cases:
Time DurationFinite Infinite
Fourier Series (FS) Fourier Transform (FT) cont.
X(k) =
∫ P
0
x(t)e−jωktdt X(ω) =
∫ +∞
−∞
x(t)e−jωtdt time
k = −∞, . . . ,+∞ ω ∈ (−∞,+∞) tDiscrete FT (DFT) Discrete Time FT (DTFT) discr.
X(k) =N−1∑
n=0
x(n)e−jωkn X(ω) =+∞∑
n=−∞
x(n)e−jωn time
k = 0, 1, . . . , N − 1 ω ∈ (−π,+π) n
discrete freq. k continuous freq. ω
2
Geometric Interpretation of the FourierTransform
In all four cases,
X(ω) = 〈x, sω〉
where sω is a complex sinusoid at radian frequency ωrad/s:
• ejωt (Fourier transform case),
• ejωn (DTFT case),
• ej2πkn/N (DFT case).
Geometrically, X(ω) = 〈x, sω〉 is proportional to thecoefficient of projection of the signal x onto the signal sω.
3
Signal and Transform Notation
• n, k ∈ Z (integers) or ZN (integers modulo N)
• x(n) ∈ R (reals) or C (complex numbers)
• x ∈ CN means x is a length N complex sequence
• x = x(·)
• X = DFT(x) ∈ CN , or
x↔ X
where “↔” is read as “corresponds to”.
• X(k) = DFTk(x) = DFTN,k(x) ∈ C
• x(n) = IDFTn(X) = IDFTN,n(X)
• For x ∈ C∞, X = DTFT(x) = DFT∞(x) ∈ C
∞2π
• x = conjugate of x
• ∠x = phase of x
The notation XY or X · Y denotes the vector containing(XY )k = X(k)Y (k), k = 0, . . . , N − 1. This is denotedby ‘X .* Y’ in Matlab, where X and Y may a pair ofcolumn vectors, or a pair of row vectors.
4
The Discrete Fourier Transform
The “kth bin” of the Discrete Fourier Transform (DFT)is defined as
X(k)∆= DFTk(x)
∆= 〈x, sk〉
∆=
N−1∑
n=0
x(tn)e−jωktn
sk(n)∆= ejωktn; k = 0, 1, . . . , N − 1
ωk∆= 2π
k
Nfs =
2πk
NT; tn
∆= nT
We may interpret the DFT as the coefficients of
projection of the signal vector x onto the N sinusoidalbasis signals sk, k = 0, 1, . . . , N − 1:
X(k) = 〈x, sk〉
5
Inverse DFT
The inverse DFT is given by
x(tn) =
N−1∑
k=0
〈x, sk〉
‖ sk ‖2sk(tn) =
1
N
N−1∑
k=0
X(ωk)ejωktn
It can be interpreted as the superposition of the
projections, i.e., the sum of the sinusoidal basis signalsweighted by their respective coefficients of projection:
x =∑
k
〈x, sk〉
‖ sk ‖2sk
6
The DFT, Cont’d
There are several ways to think about the DFT:
1. Projection onto the set of “basis” sinusoids(frequencies at N roots of unity)
2. Coordinate transformation (“natural” RN basis to“sinusoidal” basis)
3. Matrix multiplication X = W∗x,where W∗[k, n] = e−jωktn
4. Sampled uniform filter bank output
This course will emphasize interpretations 1 and 4.
7
Properties of the DFT
We are going to be performing manipulations on signalsand their Fourier Transform throughout this class. It isimportant to understand how changes we make in onedomain affect the other domain. The Fourier theoremsare helpful for this purpose.
Derivations of the Fourier theorems for the DTFT casemay be found in Chapter 2 of the text, and inMathematics of the DFT1 (Music 320 text) for theDFT case.
1http://ccrma.stanford.edu/~jos/mdft/Fourier Theorems.html
8
Linearity
αx1 + βx2 ↔ αX1 + βX2
or
DFT(αx1 + βx2) = α ·DFT(x1) + β ·DFT(x2)
α, β ∈ C
x1, x2, X1, X2 ∈ CN
The Fourier Transform “commutes with mixing.”
9
Symmetries for Real Signals
If a time-domain signal x is real, then its Fouriertransform X is conjugate symmetric (Hermitian):
x ∈ RN ⇔ X(−k) = X(k)
orReal↔ Hermitian
Hermitian symmetry implies
• Real part Symmetric (even):
re X(−k) = re X(k)
• Imaginary part Antisymmetric (skew-symmetric, odd):
im X(−k) = −im X(k)
• Magnitude Symmetric (even):
|X(−k)| = |X(k)|
• Phase Antisymmetric (odd):
∠X(−k) = −∠X(k)
10
Time Reversal
Definition:
Flipn(x)∆= x(−n)
∆= x(N − n)
Note: x(n)∆= x(nmodN) for signals in C
N (DFTcase).
When computing a sampled DTFT using the DFT, weinterpret time indices n = 1, 2, . . . , N/2− 1 as positivetime indices, and n = N − 1, N − 2, . . . , N/2 as thenegative time indices n = −1,−2, . . . ,−N/2. Underthis interpretation, the Flip operator simply reverses asignal in time.
Fourier theorems:
Flip(x)↔ Flip(X)
for x ∈ CN . In the typical special case of real signals
(x ∈ RN), we have Flip(X) = X so that
Flip(x)↔ X
Time-reversing a real signal conjugates its spectrum
Shift Theorem
11
The Shift operator is defined as Shiftl,n(y)∆= y(n− l).
Since indexing is defined modulo N , Shiftl(y) is acircular right-shift by l samples.
Shiftl(y)↔ e−j(·)lY
or, more loosely,
y(n− l)↔ e−jωlY (ω)
i.e.,DFTk[Shiftl(y)] =
(
e−jωkl)
Y (ωk)
e−jωkl = Linear Phase Term, slope = −l
• ∠Y (ωk) += − ωkl
• Multiplying a spectrum Y by a linear phase terme−jωkl with phase slope −l corresponds to a circular
right-shift in the time domain by l samples:
• negative slope ⇒ time delay
• positive slope ⇒ time advance
12
Convolution
The cyclic convolution of x and y is defined as
(x ∗ y)(n)∆=
N−1∑
m=0
x(m)y(n−m), x, y ∈ CN
Cyclic convolution is also called circular convolution,
since y(n−m)∆= y(n−m (mod N)).
Convolution is cyclic in the time domain for the DFT andFS cases, and acyclic for the DTFT and FT cases.
The Convolution Theorem is then
(x ∗ y)↔ X · Y
13
Linear Convolution of Short Signals
hx(t) y(t) = (x∗ h)(t)
Convolution theorem for DFTs:
(h ∗ x)↔ H ·X
orDFTk(h ∗ x) = H(ωk)X(ωk)
where h, x ∈ CN , and H and X are the N -point DFTs
of h and x, respectively.
DFT performs circular (or cyclic) convolution:
y(n)∆= (x ∗ h)(n)
∆=
N−1∑
m=0
x(m)h(n−m)N
where (n−m)N means “(n−m) modulo N”
Another way to look at this is as the inner product of x,and Shiftn[Flip(h)], i.e.,
y(n) = 〈x,Shiftn[Flip(h)]〉
14
FFT Convolution
The convolution theorem h ∗ x↔ H ·X shows us thatthere are two ways to perform circular convolution.
• direct calculation of the summation = O(N 2)
• frequency-domain approach = O(N lgN)
• Fourier Transform both signals
• Perform term by term multiplication of thetransformed signals
• Inverse transform the result to get back to thetime domain
Remember ... this still gives us cyclic convolution
Idea: If we add enough trailing zeros to the signalsbeing convolved, we can get the same results as in acyclic
convolution (in which the convolution summation goesfrom m = 0 to ∞).
Question: How many zeros do we need to add?
∗ =
Nx +Nh -1Nx Nh
N N N
15
• If we perform an acyclic convolution of two signals, xand h, with lengths Nx and Nh, the resulting signal islength Ny = Nx +Nh − 1.
• Therefore, to implement acyclic convolution using theDFT, we must add enough zeros to x and y so thatthe cyclic convolution result is length Ny or longer.
• If we don’t add enough zeros, some of ourconvolution terms “wrap around” and add backupon others (due to modulo indexing).
• This can be called time domain aliasing.
• We typically zero-pad even further (to the next powerof 2) so we can use the Cooley-Tukey FFT formaximum speed
A sampling-theorem based insight:
Zero-padding in the time domain results in more samples(closer spacing) in the frequency domain. This can bethought of as a higher ‘sampling rate’ in the frequencydomain. If we have a high enough frequency-domainsampling rate, we can avoid time domain aliasing.
16
Example FFT Convolution
% matlab/fftconvexample.m
x = [1 2 3 4 5 6];
h = [1 1 1];
nx = length(x);
nh = length(h);
nfft = 2^nextpow2(nx+nh-1)
xzp = [x, zeros(1,nfft-nx)];
hzp = [h, zeros(1,nfft-nh)];
X = fft(xzp);
H = fft(hzp);
Y = H .* X;
y = real(ifft(Y))
Program output:
octave:10> fftconvexample
nfft = 8
y =
1 3 6 9 12 15 11 6
17
FFT Convolution vs. Direct Convolution
Let’s compare the number of operations needed toperform the convolution of
2 length N sequences:
• It takes ≈ N 2 multiply/add operations to calculatethe convolution summation directly.
• It takes on the order of N · log(N) operations tocompute an FFT. (Note: H(ωk) can be calculated inadvance for time-invariant filtering.)
N FFT Direct Convolution
4 176 1632 2560 102464 5888 4096128 13,312 16,384256 29,696 65,5362048 311,296 4,194,304
In this example (from Strum and Kirk), the FFT(software) beats direct time-domain convolution at length128 and higher
18
Correlation
The cross-correlation of x and y in CN is defined as:
(x ⋆ y)(n)∆=
N−1∑
m=0
x(m)y(n +m), x, y ∈ CN
Using this definition we have the correlation theorem:
(x ⋆ y)↔ X(ωk)Y (ωk)
The correlation theorem is often used in the context ofspectral analysis of filtered noise signals.
Autocorrelation
The autocorrelation of a signal x ∈ CN is simply the
cross-correlation of x with itself:
(x ⋆ x)(n)∆=
N−1∑
m=0
x(m)x(m + n), x ∈ CN
From the correlation theorem, we have
(x ⋆ x)↔ |X(ωk)|2
19
Power Theorem
The inner product of two signals is defined as:
〈x, y〉∆=
∑
n
xnyn
Using this notation, we have the following:
〈x, y〉 =1
N〈X, Y 〉
When we consider the inner product of a signal with itself,we have a special case known as Parseval’s Theorem:
‖x‖2 = 〈x, x〉 =1
N〈X,X〉 =
‖X‖2
N
(Also called the Rayleigh’s Energy Theorem.)
20
Stretch
We define the Stretch operator such that:
StretchL : CN → CNL
Which means that it transforms a length N complexsignal, into a length NL signal. Specifically, we do thisby inserting L− 1 zeros in between each pair of samplesof the signal.
...
x
y = Stretch2(x) →
...
y
21
Repeat or Scale
Similarly, the RepeatL operator, defined on the unitcircle, frequency-scales its input spectrum by the factor L:
ω ← Lω
The original spectrum is repeated L times as ω traversesthe unit circle. This is illustrated in the following diagramfor L = 3:
X
Y = REPEAT3(X) →
Y
ωω
Using these definitions, we have the Stretch Theorem:
StretchL(x)↔ RepeatL(X)
Application: Upsampling by any integer factor L:Passing the stretched signal through an ideal lowpassfilter cutting off at ω ≥ π/L yields ideal bandlimitedinterpolation of the original signal by the factor L.
22
Zero-Padding ↔ Interpolation
Zero padding in the time domain corresponds to ideal
interpolation in the frequency domain.
Proof:http://ccrma.stanford.edu/~jos/mdft/Zero Padding Theorem Spectral.html
Downsampling ↔ Aliasing
The downsampling operation DownsampleM selectsevery M th sample of a signal:
DownsampleM,n(x)∆= x(Mn)
In the DFT case, DownsampleM maps CN to CNM ,
while for the DTFT, DownsampleM maps C∞ to C∞.
The Aliasing Theorem states that downsampling in timecorresponds to aliasing in the frequency domain:
DownsampleM(x)↔1
MAliasM(X)
where the Alias operator is defined for X ∈ CN
23
(DFT case) as
AliasM,l(X)∆=
M−1∑
k=0
X
(
l + kN
M
)
, l = 0, 1, . . . ,N
M−1
For X ∈ C∞ (DTFT case), the Alias operator is
AliasM,ω(X)∆=
M−1∑
k=0
X(
ej(ωM+k2πM )
)
, −π ≤ ω < π
∆=
M−1∑
k=0
X(
W kMz
1M
)
where WM∆= ej2π/M is a common notation for the
primitive M th root of unity, and z = ejω as usual. Thisnormalization corresponds to T = 1 after downsampling.Thus, T = 1/M prior to downsampling.
The summation terms above for k 6= 0 are called aliasing
components.
The aliasing theorem points out that in order todownsample by factor M without aliasing, we must firstlowpass-filter the spectrum to [−πfs/M, πfs/M ]. Thisfiltering essentially zeroes out the spectral regions whichalias upon sampling.
24
Ideal Spectral Interpolation
Recall:
X(ω)∆= 〈x, sω〉
where
sω(t)∆= ejωt (FT)
sω(tn)∆= ejωtn
∆= ejωn (DTFT)
For signals in the DTFT domain which happen to be timelimited to n ∈ [−N/2, N/2− 1],
X(ω)∆= 〈x, sω〉 =
∞∑
n=−∞
x(n)e−jωn =
N/2−1∑
n=−N/2
x(n)e−jωn
• This can be interpreted as a 0-centered DFTevaluated at ω instead of ωk = 2πk/N
• It arises as the DTFT of a finite-length signal
• Same as DFT plus infinite zero padding
• Such signals can be sampled at ω = ωk = 2πk/Nwithout loss of information
25
Meaning of Spectral Interpolation
• Let X(ωk) denote the spectrum to be interpolated.
• Then the corresponding time signal isx = IDFTN(X).
• We define the spectral interpolation X(ω) as theprojection of our signal x onto an arbitrary sinusoidsω = ejωnT .
• This is equivalent to X(ω) = DTFTω(x):
X(ω)∆= 〈x, sω〉 =
∑
n
x(n)e−jωnT
= DTFTω· · · 0, x, 0, . . .
≈ FFTωkZeroPadLx
for some sufficiently large zero-padding factor L.
• In the Quadratically Interpolated FFT (QIFFT)method for measuring parameters of spectral peaks,we will choose L to be sufficient in conjunction withquadratic interpolation of spectral log magnitude
samples at each peak
26
Interpolating a DFT
Starting with a sampled spectrum X(ωk),k = 0, 1, . . . , N − 1, we may interpolate ideally by takingthe DTFT of the zero-padded IDFT:
X(ω) = DTFTω(ZeroPad∞(IDFTN(X)))
∆=
N/2−1∑
n=−N/2
[
1
N
N−1∑
k=0
X(ωk)ejωkn
]
e−jωn
=
N−1∑
k=0
X(ωk)
1
N
N/2−1∑
n=−N/2
ej(ωk−ω)n
=
N−1∑
k=0
X(ωk)asincN(ω − ωk)
=⟨
X,SampleΩN(Shiftω(asincN))
⟩
= (X ⊛ asincN)ω,
where ⊛ denotes convolution between a discrete (X) andcontinuous (asinc) signal. (If math operators adapt totheir argument types like perl functions, we can simplyuse ∗ as usual.)
• Zero-padding in the time domain corresponds to“asincN interpolation” in the frequency domain
• This is “ideal time-limited spectral interpolation”
27
Practical Zero Padding
To interpolate a uniformly sampled spectrum X(ωk) bythe factor L, we may take the inverse DFT, appendzeros, and take the FFT (which is very fast):
X(ωl) = FFTLN,l(ZeroPadLN(IDFTN(X))),
l = 0, . . . , LN − 1
This operation creates L− 1 new bins between each pairof original bins in X , thus increasing the number ofspectral samples around the unit circle from N to LN .
In matlab, we can specify zero-padding by simplyproviding the optional FFT-size argument:
X = fft(x,N); % FFT size N > length(x)
28
Reasons for Zero Padding(Spectral Interpolation)
• Zero-padding makes our FFTs look like DTFTs whendisplaying spectra.
• Zero-padding enables us to use the FFT with anywindow length M . When M is not a power of 2, weappend enough zeros to make the FFT size N > M apower of 2.
• For sinusoidal peak-finding, spectral interpolation viazero-padding gets us closer to the true maximum ofthe main lobe when we simply take themaximum-magnitude FFT-bin as our estimate.
29
Zero Padding Examples
Let’s look at the effect of zero padding on the Fouriertransform of the popular (causal) Hamming window:
w(n) = 0.54− 0.46 cos
(
2πn
M
)
, n = 0, 1, 2, . . .M − 1
where M = 21 in our examples.
We will look at shifts of the
• critically sampled window transform W (ωk−ω0), and
• 2× oversampled window transform W (ωk′ − ω0)
where ω0 = 2π · 3/M = 2π/7 ≈ 0.9 rad/samp is thenormalized radian frequency of the test sinusoid to whichthe window is applied.
30
Critically Sampled Hamming Window Transform
Consider performing a length M DFT on a length Mwindowed signal:
• N∆= DFT size = M
∆= Window length
• DFT frequency samples at ωk = k2πM
(critically sampled DTFT)
• Window sequence and windowed-sinusoid spectrum:
0 2 4 6 8 10 12 14 16 18 200
0.2
0.4
0.6
0.8
1Causal Hamming window − M = 21 − no zero padding
Time (samples)
Am
plit
ude
(a)
0 0.5 1 1.5 2 2.5 3−60
−50
−40
−30
−20
−10
Normalized Frequency (radians/sample)
Magnitude (
dB
)
(b)
• DFT bin width = 2πN = 2π
M (critically sampled)
• 4 samples per main lobe (Hamming window)
31
2X Oversampled Hamming Window Transform
Let’s now zero-pad by a factor of 2 in the time domain,before we perform our DFT:
• Zero-padding factor L∆= N
M = 2
• N = DFT size = 2M
• DFT frequency samples at ωk′ = k′2πN = k′ 2π2M
Causal zero-padding by a factor of two (L = 2):
0 5 10 15 20 25 30 35 40 450
0.2
0.4
0.6
0.8
1Causal Hamming window − M = 21 − zero padding factor = 2
Time (samples)
Am
plit
ud
e
(a)
0 0.5 1 1.5 2 2.5 3−60
−50
−40
−30
−20
−10
Normalized Frequency (radians/sample)
Ma
gn
itu
de
(d
B)
(b)
• DFT bin width = 1L2πM = 2π
2M (2× oversampled)
• 8 samples per main lobe (Hamming window)
32
Oversampled Spectral Peaks
Note that zero-padding helps in finding the true peak ofthe sampled window transform.
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3
Magnitude (
linea
r)
Normalized Frequency (radians/sample)
zero pad factor = 8
33
Zero-Centered Zero-Padding
−15 −10 −5 0 5 10 15−1
−0.5
0
0.5
1Blackman Windowed Sinusoid
Time (samples)
Am
plit
ud
e
(a)
0 10 20 30 40 50 60−1
−0.5
0
0.5
1
Time (samples)
Am
plit
ud
e
positive time negative time
(b)
(a) Blackman window overlaid with windowed data.(b) Zero-padded and loaded into FFT input buffer.
• Use zero-centered zero padding with zero-phasewindows
• Use causal zero padding with causal windows
34
Zero-Centered Spectra
0 10 20 30 40 50 600
2
4
6
8
positive frequencies negative frequencies
Frequency (bins))
Magnitude (
linear)
(a)
−30 −20 −10 0 10 20 300
2
4
6
8
negative frequencies positive frequencies
Frequency (bins))
Magnitude (
linear)
(b)
(a) FFT magnitude data, as returned by the FFT.(b) FFT magnitude spectrum “rotated” to a more
“physical” frequency axis in bin numbers.
35
fftshift
Matlab and Octave have a simple utility called fftshift
that performs this bin rotation. Consider the followingexample:
octave:4>
fftshift([1 2 3 4])
ans =
3 4 1 2
octave:5>
Note that both Matlab and Octave regard the spectralsample at half the sampling rate as a negative frequency.
For odd N , the only reasonable answer is
octave:4>
fftshift([1 2 3])
ans =
3 1 2
octave:5>
corresponding to frequencies −fs/3, 0, fs/3, respectively.
36