Linear Convolution Using DFT

Linear Convolution Using DFT Recall that linear convolution is

when the lengths of x1[n] and x2[n] are L and P, respectively the length of x3[n] is L+P-1.

Thus a useful property is that the circular convolution of two finite-length sequences (with lengths being L and P respectively) is equivalent to linear convolution of the two N-point (N ≥ L+P−1) sequences obtained by zero padding.

Another useful property is that we can perform circular convolution and see how many points remain the same as those of linear convolution. When P < L and an L-point circular convolution is performed, the first (P−1) points are corrupted by circulation, and the remaining points from n=p−1 to n=L−1 (ie. The last L−P+1 points) are not corrupted (ie., the last L−P+1 points remain the same as the linear convolution result).

][][][ 213 mnxmxnxm

−= ∑∞

−∞=

Block convolution (for implementing an FIR filter)

FIR filtering is equal to the linear convolution of a (possibly) infinite-length sequence.

To avoid delay in processing, and also to make efficient computation, we would like to segment the signal into sections of length L. Each L-length sequence can then be convolved with the finite-length impulse response and the filtered sections fitted together in an appropriate way. – called block convolution.

When each section is sufficiently large, we usually use circular convolution (instead of linear convolution) to compute each section. (since it will be shown that there are fast algorithms, fast Fourier transform (FFT), to compute circular convolutions highlyefficiently)

Two methods for circular-convolution-based block convolution:Overlapping-add method and overlapping-save method.

Overlapping-add method (for implementing an FIR filter)When segmenting into L-length segments, the signal x[n] can be represented as

where

Because convolution is an LTI operation, it follows that

where

Since xr[n] is of length L and h[n] is of length P, each yr[n] has length (L+P−1).So, we can use zero-padding to form two N point sequences, N=L+P−1, for both xr[n] and h[n]. Performing N-point circular convolution (instead of linear convolution) to compute yr[n].

∑∞

−∞=

−=m

r rLnxnx ][][

⎩⎨⎧ −≤≤+

=otherwise

LnrLnxnxr 0

10][][

∑∞

−∞=

−=∗=m

r rLnynhnxny ][][][][

][][][ nhnxny rr ∗=

For example, consider two sequences h[n] and x[n] as follows.

Segmenting x[n] into L-length sequences. Each segment is padded by P−1 zero values.

Fir filtering by using the overlapping-add method.

Overlapping-save method (for implementing an FIR filter)Can we perform L-point circular convolution, instead of (L+P−1)-point circular convolution?

If a P-point sequence is circularly convolved with a P-point sequence (P<L), the first (P−1) points of the result are incorrect, while the remaining points are identical to those that would be obtained by linear convolution.

Separating x[n] as overlapping sections of length L, so that each section overlaps the preceding section by (P−1) points.

Then

10 ],1)1([][ −≤≤+−+−+= LnPPLrnxnxr

∑∞

=

−++−−=0

]1)1([][r

r PPLrnyny

Example of overlapping-save method

Decompose x[n] into overlapping sections of length L

Example of overlapping-save method (continue)

Result of circularly convolving each section with h[n]. The portions of each filter section to be discarded in forming the linear convolution are indicated

See the following reference for the suggestion of L and P:M. Borgerding, “Turning Overlap-Save into a Multiband Mixing, DownsamplingFilter Bank,” IEEE Signal Processing Magazine, pp. 158-161, 2006.

Fast Fourier Transform (FFT) DFT pairs:

WN = e−j2π/N is a root of the equation WN=1.It requires N2 complex multiplications and (N−1)N complex additions for computation.Each complex multiplication needs four real multiplications and two real additions, and each complex addition requires two real additions. It requires 4N2 real multiplications and N(4N−2) real additions.

∑−

=

−==1

01...,1,0 ,][][

N

n

knN NkWnxkX

∑−

=

− −==1

01...,1,0 ,][1][

N

k

knN NnWkX

Nnx

Goertzel algorithmSince

Define

The above equation can be interpreted as a discrete convolution of the finite-duration sequence x[n], 0 ≤ n ≤ N−1, with the sequence , which is the impulse response of the LTI system.Note that x[r] is nonzero only when 0 ≤ r < N. We can easily verify that

1=−kNNW

∑

∑−

=

−−

−

=

−

=

=

1

0

)(

1

0

][

][][

N

r

rNkN

N

r

krN

kNN

Wrx

WrxWkX

][][][ )( rnuWrxnyr

rnkNk −= ∑

∞

−∞=

−−

Nnk nykX == ][][

][nuW knN−

The Goertzel algorithm computes DFT by implementing the above LTI system. The system function is the z transform of

The signal-flow graph of the LTI system for obtaining yk[n] is

111)( −−−

=zW

zH kN

k

][nuW knN−

Flow graph of second-order computation of X[k] (Goertzel algorithm)

The implementation can be further simplified as

21

1

11

1

1

)/2cos(211

)1)(1(1

11)(

−−

−

−−−

−

−−

+−−

=

−−−

=−

=

zzNkzW

zWzWzW

zWzH

kN

kN

kN

kN

kN

k

π

vk[n]

Since we only need to bring the system to a state from which yk[n] can be computed, the complex multiplication by −Wn

k required to implement the zero of the system need not be performed at every iteration, but only after the N-th iteration, by the following difference equation:

It requires 2 real multiplications and 4 real additions to compute vk[n] (that may be a complex sequence). The multiplication by −Wn

k is performed only when n=N, which requires 4 real multiplications and 4 real additions. Finally, a total of 2N+4 real multiplications and 4N+4 real additions are required.

To compute all the X[k], k=0, …, N−1, we need 2N(N+2) real multiplications and 4N(N+1) real additions, where the number of multiplications are reduced by almost a half.

The Goertzel algorithm is usually used to compute X[k] for which only a single k or a small number of k values are needed.

.0 ],2[]1[)/2cos(2][][ NknvnvNknxnv kkk ≤≤−−−+= π]1[][][][ −−== NvWNvNykX k

kNkk

Decimation-in-time FFT algorithmMost conveniently illustrated by considering the special case of N an integer power of 2, i.e, N=2v.Since N is an even integer, we can consider computing X[k] by separating x[n] into two (N/2)-point sequence consisting of the even numbered point in x[n] and the odd-numbered points in x[n].

or, with the substitution of variable n=2r for n even and n=2r+1for n odd

∑∑ +=dd

][][][on

nkN

evenn

nkN WnxWnxkX

∑∑−

=

+−

=

++=1)2/(

0

)12(1)2/(

0

2 ]12[]2[][N

r

krN

N

r

rkN WrxWrxkX

∑∑−

=

−

=

++=1)2/(

0

21)2/(

0

2 )](12[)](2[N

r

rkN

kN

N

r

rkN WrxWWrx

Since

That is, WN2 is the root of the equation WN/2=1

Consequently,

Both G[k] and H[k] can be computed by (N/2)-point DFT, where G[k] is the (N/2)-point DFT of the even numbered points of the original sequence and the second being the (N/2)-point DFT of the odd-numbered point of the original sequence.Although the index ranges over N values, k = 0, 1, …, N-1, each of the sums must be computed only for k between 0 and (N/2)-1, since G[k] and H[k] are each periodic in k with period N/2.

2/)2//(2)/2(22

NNjNj

N WeeW === ππ

1,...,1,0 ],[][ −=+= NkkHWkG kN

∑∑−

=

−

=

++=1)2/(

0

21)2/(

0

2 )](12[)](2[][N

r

rkN

kN

N

r

rkN WrxWWrxkX

Decomposing N-point DFT into two (N/2)-point DFT for the case of N=8

We can further decompose the (N/2)-point DFT into two (N/4)-point DFTs. For example, the upper half of the previous diagram can be decomposed as

Hence, the 8-point DFT can be obtained by the following diagram with four 2-point DFTs.

Flow graph of a 2-point DFT

Finally, each 2-point DFT can be implemented by the following signal-flow graph, where no multiplications are needed.

Flow graph of complete decimation-in-time decomposition of an 8-point DFT.

In each stage of the decimation-in-time FFT algorithm, there are a basic structure called the butterfly computation:

The butterfly computation can be simplified as follows:

Flow graph of a basic butterfly computation in FFT.

Simplified butterfly computation.

][][][

][][][

11

11

qXWpXqX

qXWpXpX

mr

Nmm

mr

Nmm

−−

−−

−=

+=

Flow graph of 8-point FFT using the simplified butterfly computation

In the above, we have introduced the decimation-in-time algorithm of FFT.Here, we assume that N is the power of 2. For N=2v, it requires v=log2N stages of computation. The number of complex multiplications and additions required wasN+N+…N = Nv = N log2N.When N is not the power of 2, we can apply the same principle that were applied in the power-of-2 case when N is a composite integer. For example, if N=RQ, it is possible to express an N-point DFT as either the sum of R Q-point DFTs or as the sum of Q R-point DFTs.In practice, by zero-padding a sequence into an N-point sequence with N=2v, we can choose the nearest power-of-two FFT algorithm for implementing a DFT.The FFT algorithm of power-of-two is also called the Cooley-Tukey algorithm since it was first proposed by them.For short-length sequence, Goertzel algorithm might be more efficient.

Two-dimensional Fourier TransformTwo-dimensional transforms can be formulated by directly extending the one-dimensional transform. Eg.

DFT of two-dimensional signal (eg., an image):

Two-dimensional convolution (circular convolution):

∑∑

∑∑−

=

−

=

−−

−

=

−

=

=

=

1

0

1

0

1

0

1

0

],[1],[

],[1],[

N

k

N

l

nlN

mkN

N

m

N

n

lnN

kmN

WWlkvN

nmu

WWnmuN

lkv

∑∑−

=

−

=

−−=1

0'

1

0'12 ]','[]mod)'(,mod)'[(],[

N

m

N

nnmuNnnNmmhnmu

Decimation-in-frequency FFT algorithmThe decimation-in-time FFT algorithms are all based on structuring the DFT computation by forming smaller and smaller subsequences of the input sequence x[n]. Alternatively, we can consider dividing the output sequence X[k] into smaller and smaller subsequences in the same manner.

The even-numbered frequency samples are

1,...,1,0 ][][1

0−==∑

−

=

NkWnxkXN

n

nkN

∑∑∑−

=

−

=

−

=

+==1

)2/(

)2(1)2/(

0

)2(1

0

)2( ][][][]2[N

Nn

rnN

N

n

rnN

N

n

rnN WnxWnxWnxrX

∑∑−

=

+−

=

++=1)2/(

0

))2/((21)2/(

0

2 )]2/([][]2[N

n

NnrN

N

n

nrN WNnxWnxrX

Since

and

The above equation is the (N/2)-point DFT of the (N/2)-point sequence obtained by adding the first and the last half of the input sequence.

Adding the two halves of the input sequence represents time aliasing, consistent with the fact that in computing only the even-number frequency samples, we are sub-sampling the Fourier transform of x[n].

rnN

rNN

rnN

NnrN WWWW 22)]2/([2 ==+

2/2

NN WW =

1)2/(,...,1,0 )])2/([][(]2[1)2/(

02/ −=++= ∑

−

=

NrWNnxnxrXN

n

rnN

We now consider obtaining the odd-numbered frequency points:

Since

∑∑∑−

=

+−

=

+−

=

+ +==+1

)2/(

)12(1)2/(

0

)12(1

0

)12( ][][][]12[N

Nn

rnN

N

n

rnN

N

n

rnN WnxWnxWnxrX

)12(1)2/(

0

)12(1)2/(

0

)12)(2/(

1)2/(

0

)12)(2/(1

2/

)12(

)]2/([

)]2/([

)]2/([][

+−

=

+−

=

+

−

=

++−

=

+

∑

∑

∑∑

+−=

+=

+=

rnN

N

n

rnN

N

n

rNN

N

n

rNnN

N

Nn

rnN

WNnx

WNnxW

WNnxWnx

We obtain

The above equation is the (N/2)-point DFT of the sequence obtained by subtracting the second half of the input sequence from the first half and multiplying the resulting sequence by WN

n.

Let g[n] = x[n]+x[n+N/2] and h[n] = x[n]−x[x+N/2], the DFT can be computed by forming the sequences g[n] and h[n], then computing h[n] WN

n, and finally computing the (N/2)-point DFTsof these two sequences.

1)2/(,...,1,0 ])2/[][(

])2/[][(]12[

2/

1)2/(

0

1)2/(

0

)12(

−=+−=

+−=+

∑

∑−

=

−

=

+

NrWWNnxnx

WNnxnxrX

nrN

N

n

nN

N

n

rnN

Flow graph of decimation-in-frequency decomposition of an N-point DFT (N=8).

Recursively, we can further decompose the (N/2)-point DFT into smaller substructures:

Finally, we have

Butterfly structure for decimation-in-frequency FFT algorithm:

The decimation-in-frequency FFT algorithm also has the computation complexity of O(N log2N)

Chirp Transform Algorithm (CTA)This algorithm is not optimal in minimizing any measure of computational complexity, but it has been used to compute anyset of equally spaced samples of the DTFT on the unit circle.

To derive the CTA, we let x[n] denote an N-point sequence and X(ejw) its DTFT. We consider the evaluation of M samples of X(ejw)that are equally spaced in angle on the unit cycle, at frequencies

)/2,1,...,1,0(

0

MwMk

wkwwk

π=Δ−=

Δ+=

When w0 =0 and M=N, we obtain the special case of DFT.

The DTFT values evaluated at wk are

with W defined as

we have

The Chirp transform represents X(ejwk) as a convolution:

To achieve this purpose, we represent nk as

1,...,1,0 ][)(1

0−==∑

−

=

− MkenxeXN

n

njwjw kk

wjeW Δ−=

1,...,1,0 ][)(1

0

0 −==∑−

=

− MkWenxeXN

n

nknjwjwk

])()[2/1( 222 nkknnk −−+=

Then, the DTFT value evaluated at wk is

Letting

we can then write

To interpret the above equation, we obtain more familiar notation by replacing k by n and n by k:

X(ejwk) corresponds to the convolution of the sequence g[n] with the sequence W−n2/2.

∑−

=

−−−=1

0

2/)(2/2/ 2220][)(

N

n

nkknnjwjw WWWenxeX k

2/20][][ nnjw Wenxng −=

1,...,1,0 ][)(1

0

2/)(2/ 22

−=⎟⎠

⎞⎜⎝

⎛= ∑

−

=

−− MkWngWeXN

n

nkkjwk

1,...,1,0 ][)(1

0

2/)(2/ 22

−=⎟⎠

⎞⎜⎝

⎛= ∑

−

=

−− MnWkgWeXN

k

knnjwn

The block diagram of the chirp transform algorithm is

Since only the outputs of n=0,1,…,M−1 are required, let h[n] be the following impulse response with finite length (FIR filter):

Then ⎪⎩

⎪⎨⎧ −≤≤−−=

−

otherwiseMnNWnh

n

01)1(][

2/2

( ) 1,...,1,0 ][][)( 2/2

−=∗= MnnhngWeX njwn

The block diagram of the chirp transform algorithm for FIR is

Then the output y[n] satisfies that

Evaluating frequency responses using the procedure of chirp transform has a number of potential advantages:

We do not require N=M as in the FFT algorithms, and neither N nor M need be composite numbers. => The frequency values can be evaluated in a more flexible manner.

The convolution involved in the chirp transform can still be implemented efficiently using an FFT algorithm. The FFT size must be no smaller than (M+N−1). It can be chosen, for example, to be an appropriate power of 2.

1,...,1,0 ][)( −== MnnyeX njw

In the above, the FIR filter h[n] is non-causal. For certain real-time implementation it must be modified to obtain a causal system. Since h[n] is of finite duration, this modification is easily accomplished by delaying h[n] by (N−1) to obtain a causal impulse response:

and the DTFT transform values are

In hardware implementation, a fixed and pre-specified causal FIR can be implemented by certain technologies, such as charge-coupled devices (CCD) and surface acoustic wave (SAW) devices.

⎪⎩

⎪⎨⎧ −+==

+−−

otherwiseNMnWnh

Nn

02,...,1,0][

2/)1(

1

2

1,...,1,0 ]1[)( 1 −=−+= MnNnyeX njw

Two-dimensional Transform Revisited(c.f. Fundamentals of Digital Image Processing, A. K. Jain, Prentice

Hall, 1989)

One-dimensional orthogonal (unitary) transforms

v=Au →

u = A*T v = AH v →

where A*T = A−1, i.e., AAH = AHA = I. That is, the columns of AH

form a set of orthonormal bases, and so are the columns of A.

The vector ak* ≡ {ak,n*, 0≤ n ≤ N−1} are called the basis vector of A. The series coefficients v[k] give a representation of the original sequence u[k], and are useful in filtering, data compression, feature extraction, and other analysis.

10 ][][1

0, −≤≤=∑

−

=

NknuakvN

nnk

10 ][][1

0, −≤≤=∑

−

=

∗ NnkvanuN

knk

Two-dimensional orthogonal (unitary) transformsLet {u[m,n]} be an n×n image.

where {ak,l[m,n]}, called an image transform, is a set of complete orthonormal discrete basis functions satisfying the properties:

Orthonormality:

where δ[a,b] is the 2D delta function, which is one only when a=b=0, and is zero otherwise.

1,0 ],[],[],[

1,0 ],[],[],[

1

0

1

0,

1

0

1

0,

−≤≤=

−≤≤=

∑∑

∑∑−

=

−

=

∗

−

=

−

=

Nnmnmalkvnmu

Nlknmanmulkv

N

k

N

llk

N

m

N

nlk

]','[]','[],[

]','[],[],[

,

1

0

1

0,

','

1

0

1

0,

nnmmnmanma

llkknmanma

lk

N

k

N

llk

lk

N

m

N

nlk

−−=

−−=

∗−

=

−

=

∗−

=

−

=

∑∑

∑∑

δ

δ

V = {v[k,l]} is called the transformed image.The orthonormal property assures that any expansion of the basis images

will be minimized by the truncated series

When P=Q=N, the error of minimization will be zero.

Separable Unitary TransformsThe number of multiplications and additions required to compute the transform coefficients v[k,l] is O(N4), which is quite excessive.

The dimensionality can be reduced to O(N3) when the transform is restricted to be separable.

NQNPnmalkvnmu lk

P

m

Q

nQP ≤≤= ∗

−

=

−

=∑∑ , ],[],['],[ ,

1

0

1

0,

],[],[' lkvlkv =

A transform {ak,l[m,n]} is separable iff for all 0≤k,l,m,n≤N−1, it can be decomposed as follows:

where A ≡ {a[k,m]} and B ≡ {b[l,n]} should be unitary matrices themselves, i.e., AAH = AHA = I and BBH = BHB = I .

Often one choose B to be the same as A, so that

Hence, we can simplify the transform as

V = AUAT, and U = A*TVA*

where V = {v[k,l]} and U = {u[m,n]}.

][][ ],[, nbmanma lklk =

][],[][],[

][],[][],[

1

0

1

0

1

0

1

0

∑∑

∑∑−

=

−

=

∗∗

−

=

−

=

=

=

N

k

N

llk

N

m

N

nlk

nalkvmanmu

nanmumalkv

A more general form: for an M×N rectangular image, the transform pair is

V = AMUANT, and U = AM*TUAN*

where AM and AN are M×M and N×N unitary matrices, respectively. themselves, i.e., AAH = AHA = I and BBH = BHB = I.

These are called two-dimensional separable transforms. The complexity in computing the coefficient image is O(N3).

The computation can be decomposed as computing T=UAT

first, and then compute V= AT (for an N×N image)

Computing T=UAT requires N2 inner products (of N-point vectors). Each inner product requires N operations, and so in total O(N3).

Similarly, V= AT also requires O(N3) operations, and finally we need O(N3) to compute V.

A closer look at T=UAT:

Let the rows of U be {U1, U2, …, UN}. Then

T=UAT = [U1T, U2

T, …, UNT]TAT = [U1AT, U2AT, …, UNAT] T.

Note that each UiAT (i=1 … N) is a one-dimensional unitary transform. That is, this step performs N one-dimensional transforms for the rows of the image U, obtaining a temporary image T.

Then, the step V= AT performs N 1-D unitary transforms on the columns of T.

Totally, 2N 1-D transforms are performed. Each 1-D transform is of O(N2).

Remember that the two-dimensional DFT is

where

The 2D DFT is separable, and so it can be represented as

V = FUF

where F is the N×N matrix with the element of k-th row and n-th element be

F =

∑∑

∑∑−

=

−

=

−−

−

=

−

=

=

=

1

0

1

0

1

0

1

0

],[1],[

],[1],[

N

k

N

l

nlN

mkN

N

m

N

n

lnN

kmN

WWlkvN

nmu

WWnmuN

lkv

)/2( NjN eW π−=

1,0, 1−≤≤

⎭⎬⎫

⎩⎨⎧ NnkW

Nkn

N

Fast computation of two-dimensional DFT:

According to V = FUF, it can be decomposed as the computation of 2N 1-D DFTs.

Each 1-D DFT requires N×log2N computations.

So, the 2-D DFT can be efficiently implemented in time complexity of O(N2×log2N )

2-D DFT is inherent in many properties of 1-D DFT (e.g., conjugate symmetry, shifting, scaling, convolution, etc.). A property not from the 1-D DFT is the rotation property.

Rotation property: if we represent (m,n) and (k,l) in polar coordinate,

and

then

That is, the rotation of an image implies the rotation of its DFT.

)sin,cos(),( θθ rrnm = )sin,cos(),( ϕϕ wwlk =

],[],[ θϕθθ Δ+⇔Δ+ wvruDFT

Documents

Linear Convolution Using DFT