Download pdf - Chapter 10: Pipelined and Parallel Recursive and …people.ece.umn.edu/users/parhi/SLIDES/chap10.pdfChapter 10: Pipelined and Parallel Recursive and Adaptive Filters Keshab K. Parhi

Chapter 10: Pipelined and ParallelRecursive and Adaptive Filters

Keshab K. Parhi

Chapter 10 2

Outline

• Introduction• Pipelining in 1st-Order IIR Digital Filters• Pipelining in Higher-Order IIR Digital Filters• Parallel Processing for IIR Filters• Combined Pipelining and Parallel Processing for IIR

Filters

Chapter 10 3

First-Order IIR Filter• Consider a 1st-order linear time-invariant recursion (see Fig. 1)

• The iteration period of this filter is , where representword-level multiplication time and addition time

Look-Ahead Computation

{ }am TT + { }am TT ,)()()1( nubnyany ⋅+⋅=+ (10.1)

• In look-ahead transformation, the linear recursion is first iterated a fewtimes to create additional concurrency.

• By recasting this recursion, we can express y(n+2) as a function of y(n)to obtain the following expression (see Fig. 2(a))

• The iteration bound of this recursion is , the same as theoriginal version, because the amount of computation and the number oflogical delays inside the recursive loop have both doubled

( ) 22 am TT +[ ] )1()()()2( +++=+ nbunbunayany (10.2)

Chapter 10 4

• Another recursion equivalent to (10.2) is (10.3). Shown on Fig.2(b), itsiteration bound is , a factor of 2 lower than before.

• Applying (M-1) steps of look-ahead to the iteration of (10.1), we canobtain an equivalent implementation described by (see Fig. 3)

– Note: the loop delay is instead of , which means that the loopcomputation must be completed in M clock cycles (not 1 clock cycle). Theiteration bound of this computation is , which correspondsto a sample rate M times higher than that of the original filter

– The terms in (10.4) can be pre-computed(referred to as pre-computation terms). The second term in RHS of (10.4)is the look-ahead computation term (referred to as the look-aheadcomplexity); it is non-recursive and can be easily pipelined

)1()()()2( 2 +⋅+⋅+⋅=+ nubnuabnyany( ) 2am TT +

(10.3)

∑−

=

−−+⋅⋅+⋅=+1

0

)1()()(M

i

iM iMnubanyaMny (10.4)Mz− 1−z

( ) MTT am +

{ }MM ababaab ,,,, 12 −⋅⋅⋅

Chapter 10 5

Fig.2.(a)

Fig.2.(b)

2D

D

b

a

u(n+1)

y(n+1)

u(n)

b

a

y(n+2)y(n)

2D

D

ab a

u(n+1) u(n)

y(n+2)y(n)

b 2

D

b a

u(n)

y(n)y(n+1)

Fig. 1

Chapter 10 6

MD

D

ab a

u(n+M-1) u(n+M-2)

y(n+M)

y(n)

b M

D

a b

u(n)

M-1

Fig. 3: M-stage Pipelinable 1st-Order IIR Filter

• Look-ahead computation has allowed a single serial computation to betransformed into M independent concurrent computations, and topipeline the feedback loop to achieve high speed filtering of a singletime series while maintaining full hardware utilization.

• Provided the multiplier and the adder can be conveniently pipelined,the iteration bound can be achieved by retiming or cutsettransformation (see Chapter 4)

Chapter 10 7

Pipelining in 1st-Order IIR Digital Filters

– Look-ahead techniques add canceling poles and zeros with equalangular spacing at a distance from the origin which is same asthat of the original pole. The pipelined filters are always stableprovided that the original filter is stable

– The pipelined realizations require a linear increase in complexitybut decomposition techniques can be used to obtain animplementation with logarithmic increase in hardware withrespect to the number of loop pipeline stages

– Example: Consider the 1st-order IIR filter transfer function

• The output sample y(n) can be computed using the input sampleu(n) and the past output sample as follows:

• The sample rate of this recursive filter is limited by thecomputation time of one multiply-add operation

111

)( −⋅−=

zazH (10.5)

)()1()( nunyany +−⋅= (10.6)

Chapter 10 8

Pipelining in 1st-Order IIR Digital Filters (continued)

1. Look-Ahead Pipelining for 1st-Order IIR Filters• Look-ahead pipelining adds canceling poles and zeroes to the transfer

function such that the coefficients of in thedenominator of the transfer function are zero. Then, the output sampley(n) can be computed using the inputs and the output sample y(n-M)such that there are M delay elements in the critical loop, which in turncan be used to pipeline the critical loop by M stages and the samplerate can be increased by a factor M

• Example: Consider the 1st-order filter, , whichhas a pole at z=a (a≤1). A 3-stage pipelined equivalent stable filter canbe derived by adding poles and zeroes at , and is givenby

( )},,{ 11 −−− ⋅⋅⋅ Mzz

( )111)( −⋅−= zazH

( )32πjaez ±=

33

221

11

)(−

−−

⋅−⋅+⋅+

=za

zazazH

Chapter 10 9


2. Look-Ahead Pipelining with Power-of-2 Decomposition• With power-of-2 decomposition, an M-stage (for power-of-2 M)

pipelined implementation for 1st-order IIR filter can be obtained by sets of transformations

• Example: Consider a 1st-order recursive filter transfer functiondescribed by . The equivalent pipelinedtransfer function can be described using the decomposition techniqueas follows

– This pipelined implementation is derived by adding (M-1) poles and zerosat identical locations.

– The original transfer function has a single pole at (see Fig.4(a)).

M2log

( )MM

M

i

za

zazbzH

ii

−

−

=−−

⋅−

⋅+⋅= ∏

1

1)(

1log

0221 2

(10.7)

( ) ( )11 1)( −− ⋅−⋅= zazbzH

az =

Chapter 10 10

– The pipelined transfer function has poles at the following locations (seeFig.4(b) for M=8):

– The decomposition of the canceling zeros is shown in Fig.4(c). The i-thstage of the decomposed non-recursive portion implements zeroslocated at:

– The i-th stage of the decomposed non-recursive portion requires a singlepipelined multiplication operation independent of the stage number i

– The multiplication complexity of the pipelined implementation is

– The finite-precision pipelined filters suffer from inexact pole-zerocancellation, which leads to magnitude and phase error. These errors canbe reduced by increasing the wordlength (see p. 323)

( ) ( ){ }MMjMj aeaea ππ 212 ,,, ⋅−⋅⋅⋅

i2

( ) ( )( ),212exp injaz π+= (10.8)( )12,,1,0 −⋅⋅⋅= in

( )2log 2 +M

Chapter 10 11

Fig.

4

•Po

le re

pres

enta

tion

of a

1ST

-ord

er re

curs

ive

filte

r•

Pole

/zer

o re

pres

enta

tion

of a

1ST

-ord

er II

R w

ith 8

loop

sta

ges

•D

ecom

posi

tion

base

d on

pip

elin

ed II

R fo

r M =

8

Chapter 10 12


3. Look-Ahead Pipelining with General Decomposition• The idea of decomposition can be extended to any arbitrary number of

loop pipelining stages M. If , then the non-recursivestages implement , , zeros,respectively, totaling (M-1) zeros

• Example (Example 10.3.3, p.325) Consider the 1st-order IIR

– A 12-stage pipelined decomposed implementation is given by

pMMMM ⋅⋅⋅= 21( )11 −M ( )121 −MM ( )1, 121 −⋅⋅⋅⋅⋅⋅ − pp MMMM

111

)( −⋅−=

zazH

( )( )( )1212

6644221

1212

11

0

1111

1)( −

−−−−

−=

−

⋅−++++

=⋅−

⋅= ∑

zazazazaaz

za

zazH i

ii

- This implementation is based on a decomposition (see Fig. 5)

232 ××

Chapter 10 13

– The first section implements 1 zero at –a, the second section implements 4zeros at , and the third section implements 6 zeros at

– Another decomposed implementation ( decomposition) is givenby

• The first section implements 1 zero at –a, the second sectionimplements 2 zeros at , and the third section implements 8 zerosat

{ }323, ππ jj aeae ±±

{ }656 ,, ππ jj aeaeja ±±±

( )( )( )1212

8844221

1111

)(−

−−−−

⋅−++++

=za

zazazaazzH

322 ××

{ }653236 ,,, ππππ jjjj aeaeaeae ±±±±ja±

– The third decomposition ( decomposition) is given by

• The first section implements 2 zeros at , the second sectionimplements 3 zeros at , and the third sectionimplements 6 zero at

223 ××( )( )( )

1212

6633221

1111

)(−

−−−−

⋅−++++

=za

zazazaazzH

{ }32πjae±

{ }3, πjaea ±−{ }656 ,, ππ jj aejaae ±± ±

Chapter 10 14

Fig.5(a)

Fig.5(b)

Pole-zero location of a 12-stage pipelined 1ST-order IIR

Decomposition of the zerosof the pipelined filter fora 2X3X2 decomposition

Chapter 10 15

Pipelining in Higher-order IIR Digital Filters

• Higher-order IIR digital filters can be pipelined by using clusteredlook-ahead or scattered look-ahead techniques. (For 1st-order IIR filters,these two look-ahead techniques reduce to the same form)– Clustered look-ahead: Pipelined realizations require a linear

complexity in the number of loop pipelined stages and are notalways guaranteed to be stable

– Scattered look-ahead: Can be used to derive stable pipelined IIRfilters

– Decomposition technique: Can also be used to obtain area-efficientimplementation for higher-order IIR for scattered look-ahead filters

– Constrained filter design techniques: Achieve pipelining withoutpole-zero cancellation

Chapter 10 16

• The transfer function of an N-th order direct-form recursive filter isdescribed by

• Equivalently, the output y(n) can be described in terms of the inputsample u(n) and the past input/output samples as follows

– The sample rate of this IIR filter realization is limited by the throughput of1 multiplication and 1 addition, since the critical path contains a singledelay element

∑∑

=−

=−

−= N

ii

i

N

ii

i

za

zbzH

1

0

1)( (10.9)

(10.10))()(

)()()(

1

01

nzinya

inubinyany

N

ii

N

ii

N

ii

+−=

−+−=

∑

∑∑

=

==

Chapter 10 17

1. Clustered Look-Ahead Pipelining

• The basic idea of clustered look-ahead:– Add canceling poles and zeros to the filter transfer function such

that the coefficients of in the denominator of thetransfer function are 0, and the output samples y(n) can bedescribed in terms of the cluster of N past outputs:

– Hence the critical loop of this implementation contains M delayelements and a single multiplication. Therefore, this loop can bepipelined by M stages, and the sample rate can be increased by afactor M. This is referred to as M-stage clustered look-aheadpipelining

{ })1(1 ,, −−− ⋅⋅⋅ Mzz

{ })1(,),( +−−⋅⋅⋅− NMnyMny

Pipelining in Higher-order IIR Digital Filters (cont’d)

Chapter 10 18

• Example: (Example 10.4.1, p.327) Consider the all-pole 2nd-order IIR filterwith poles at . The transfer function of this filter is

– A 2-stage pipelined equivalent IIR filter can be obtained by eliminatingthe term in the denominator (i.e., multiplying both the numerator anddenominator by ). The transformed transfer function isgiven by:

– From, the transfer function, we can see that the coefficient of in thedenominator is zero. Hence, the critical path of this filter contains 2 delayelements and can be pipelined by 2 stages

{ }43,21

2831

451

1)( −− +−

=zz

zH(10.11)

1−z( )1451 −+ z

332152

1619

145

145

145

2831

45

11

11

11

)(

−−

−

−

−

−−

+−+

=

++

⋅+−

=

zzz

zz

zzzH

(10.12)

1−z

Chapter 10 19

• Computation complexity: The numerator (non-recursive portion) of thispipelined filter needs (N+M) multiplications, and the denominator (recursiveportion) needs N multiplications. Thus, the total complexity of this pipelinedimplementation is (N+N+M) multiplications

• Stability: The canceling poles and zeros are utilized for pipelining IIR filters.However, when the additional poles lie outside the unit circle, the filterbecomes unstable. Note that the filters in (10.12) and (10.13) are unstable.

– Similarly, a 3-stage pipelined realization can be derived by eliminatingthe terms of in the denominator of (10.21), which can bedone by multiplying both numerator and denominator by

.– The new transfer function is given by:

{ }21 , −− zz

( )21 1619451 −− ++ zz

4128573

6465

216191

45

11

)( −−

−−

+−++

=zz

zzzH (10.13)

Chapter 10 20

• If the desired pipeline delay M does not produce a stable filter, Mshould be increased until a stable pipelined filter is obtained. To obtainthe optimal pipelining level M, numerical search methods are generallyused

• Example (Example 10.4.3, p.330) Consider a 5-level (M=5) pipelinedimplementation of the following 2nd-order transfer function

– By the stability analysis, it is shown that (M=5) does not meet the stabilitycondition. Thus M is increased to M=6 to obtain the following stablepipelined filter as

21 6889.05336.111

)( −− +−=

zzzH

(10.14)

76

54321

5011.03265.117275.01454.14939.16630.15336.11

)( −−

−−−−−

+−+++++

=zz

zzzzzzH

(10.15)

2. Stable Clustered Look-Ahead Filter Design

Chapter 10 21

3. Scattered Look-Ahead Pipelining

• Scattered look-ahead pipelining: The denominator of the transferfunction in (10.9) is transformed in a way that it contains the N terms

. Equivalently, the state y(n) is computed interms of N past scattered states y(n-M),y(n-2M),…, and y(n-NM)

• In scattered look-ahead, for each poles in the original filter, weintroduce (M-1) canceling poles and zeros with equal angular spacingat a distance from the origin the same as that of the original pole.– Example: if the original filter has a pole at , we add (M-1)

poles and zeros at toderive a pipelined realization with M loop pipeline stages

• Assume that the denominator of the transfer function can be factorizedas follows:

pz=

{ }NMMM zzz −−− ⋅⋅⋅ ,,, 2

( ){ }1,,2,1,2exp −⋅⋅⋅== MkMkjpz π

∏ =−−=

N

i i zpzD1

1)1()( (10.16)

Chapter 10 22

– Then, the pipelining process using the scattered look-aheadapproach can be described by

(continued)

( )( ) )('

)('

1

1)(

)()(

)(

1

1

012

1

1

1

12

MN

i

M

kMkj

i

N

i

M

k

Mkji

zDzN

zep

zepzN

zDzN

zH

=−

−=

=

∏ ∏∏ ∏

=

−

=−

=

−

=

−

π

π

(10.17)

• Example (Example 10.4.5, p.332) Consider the 2nd-order filter with complexconjugate poles at . The filter transfer function is given by

– We can pipeline this filter by 3 stages by introducing 4 additionalpoles and zeros at if

θjrez ±=

221cos211

)( −− +−=

zrzrzH

θ

( ) ( ){ }3232 , πθπθ −±+± == jj rezrez 32πθ ≠

Chapter 10 23

– (cont’d) The equivalent pipelined filter is then given by

– When , then only 1 additional pole and zero at isrequired for 3-stage pipelining since and . The equivalent pipelined filter is thengiven by

• Example (Example 10.4.6, p.332) Consider the 2nd-order filter with realpoles at . The transfer function is given by

( )( ) 6633

4433221

3cos21cos22cos21cos21

)( −−

−−−−

+⋅−+⋅+⋅++⋅+

=zrzr

zrzrzrzrzH

θθθθ

32πθ = rz =

( ) rrez j == −± 32πθ

( ) θπθ jj rerez ±+± == 32

( )( ) 33

1

1221

1

11

111

)( −

−

−−−

−

⋅−⋅−

=⋅−⋅+⋅+

⋅−=

zrzr

zrzrzrzr

zH

{ }21, rzrz ==

( ) 221

1211

1)( −− ⋅+⋅+−

=zrrzrr

zH

Chapter 10 24

– (cont’d) A 3-stage pipelined realization is derived by addingpoles(and zeros) at . The pipelinedrealization is given by

– The pole-zero locations of a 3-stage pipelined 2nd-order filter withpoles at z=1/2 and z=3/4 are shown in Fig. 6

• CONCLUSIONS:– If the original filter is stable, then the scattered look-ahead

approach leads to stable pipelined filters, because the distance ofthe additional poles from the original is the same as that of theoriginal filter

– Multiplication Complexity: (NM+1) for the non-recursive portionin (10.17), and N for the recursive portion. Total pipelined filtermultiplication complexity is (NM+N+1)

{ }322

321 , ππ jj erzerz ±± ==

( ) ( ) ( )( ) 63

23

133

23

1

422

21

32121

22221

21

121

11

)(−−

−−−−

+⋅+−++++++++

=zrrzrr

zrrzrrrrzrrrrzrrzH

Chapter 10 25

21

43 1 Re[z]

Im[z]

Fig. 6: Pole-zero representation of a 3-stage pipelined equivalent stable filter derived using scattered look-ahead approach

– (cont’d) The multiplication complexity is linear with respect to M.and is much greater than that of clustered look-ahead

– Also the latch complexity is square in M, because each multiplieris pipelined by M stages

Chapter 10 26

4. Scattered Look-Ahead Pipelining with Power-of-2Decomposition

• This kind of decomposition technique will lead to a logarithmicincrease in multiplication complexity (hardware) with respect to thelevel of pipelining

• Let the transfer function of a recursive digital filter be

– A 2-stage pipelined implementation can be obtained bymultiplying by in the numerator anddenominator. The equivalent 2-stage pipelined implementation isdescribed by

)()(

1)(

1

0

zDzN

za

zbzH N

i

ii

N

i

ii =

⋅−=

∑∑

=−

=−

(10.18)

( )∑ =−⋅−−

N

i

ii

i za1

)1(1

( ) ( )( )( )( ) )('

)('

)1(11

11)(

11

10

zDzN

zaza

zazbzH N

ii

iiN

ii

i

N

ii

iiN

ii

i =⋅−−⋅−

⋅−−=

∑∑∑∑

=−

=−

=−

=−

(10.19)

Chapter 10 27

– (cont’d) Similarly, subsequent transformations can be applied toobtain 4, 8, and 16 stage pipelined implementations, respectively

– Thus, to obtain an M-stage pipelined implementation (for power-of-2 M), sets of such transformations need to be applied.

– By applying sets of such transformations, anequivalent transfer function (with M pipelining stages inside therecursive loop) can be derived, which requires a complexity of

multiplications, a logarithmic complexitywith respect to M.

M2log( )1log 2 −M

( )1log2 2 ++ MNN

– Note: the number of delays (or latches) is linear: the total number ofdelays (or latches) is approximately , about NMdelays in non-recursive portion, and delays forpipelining each of the multipliers by M stages

– In the decomposed realization, the 1st stage implements an N-thorder non-recursive section, and the subsequent stages respectivelyimplement 2N, 4N,…, NM/2-order non-recursive sections

( )1log 2 +MNMMNM 2log

MN 2log

Chapter 10 28

• Example (Example 10.4.7, p.334) Consider a 2nd-order recursive filterdescribed by

– The poles of the system are located at (see Fig. 7(a)).– The pipelined filter requires multiplications and is

described by

221

22

110

cos21)()(

)( −−

−−

+⋅−++

==zrzr

zbzbbzUzY

zHθ

{ }θjrez ±=

×+⋅−

= −−=

−∑MMMM

ii

i

zrzMr

zbzH 22

2

0

cos21)(

θ

( )∏ −

=−− ++

+⋅+1log

022222 11

2cos21M

ii iiii

zrzr θ

{ }Mwhere πθ 2≠

– The 2M poles of the transformed transfer function (shown in Fig.7(b)) are located at

( )( ) ( )1,2,1,0,2 −⋅⋅⋅== +± Mirez Mij πθ

( )5log2 2 +M

Chapter 10 29

Fig. 7: (also see Fig.10.11, p.336) Pole-zero representation for Example 10.4.7,the decompositions of the zeros is shown in (c)

Chapter 10 30

Parallel Processing in IIR Filters• Parallel processing can also be used in design of IIR filters• First discuss parallel processing for a simple 1st-order IIR filter, then

we discuss higher order filters• Example: (Example 10.5.1, p.339) Consider the transfer function of a 1st-

order IIR filter given by

– where for stability. This filter has only 1 pole located at . Thecorresponding input-output can be written as y(n+1)=ay(n)+u(n)

– Consider the design of a 4-parallel architecture (L=4) for the foregoingfilter. Note that in the parallel system, each delay element is referred to asa block delay, where the clock period of the block system is 4 times thesample period. Therefore, the loop update equation should update y(n+4)by using inputs and y(n).

1

1

1)(

−

−

−=

azz

zH

1≤a az =

Chapter 10 31

– By iterating the recursion (or by applying look-ahead technique), we get

– Substituting

– The corresponding architecture is shown in Fig.8.

)3()2()1()()()4( 234 +++++++=+ nunaunuanuanyany

kn 4=

)34()24()14()4()4()44( 234

+++++++=+

kukaukuakuakyaky

D

2a 3aa

)24( +ku )14( +ku )4( ku)34( +ku4a

)44( +ky )4( ky

Fig. 8: (also see Fig.10.14, p. 340)

(10.20)

Chapter 10 32

– The pole of the original filter is at , whereas the pole for theparallel system is at , which is much closer to the origin since

– An important implication of this pole movement is the improvedrobustness of the system to the round-off noise

– A straightforward block processing structure for L=4 obtained bysubstituting n=4k+4, 4k+5, 4k+6 and 4k+7 in (10.20) is shown in Fig. 9.

– Hardware complexity of this architecture: multiply-add operations(Because L multiply-add operations are required for each output and thereare L outputs in total)

– The square increase in hardware complexity can be reduced by exploitingthe concurrency in the computation (the decomposition property in thescattered look-ahead mode can not be exploited in the block processing modebecause one hardware delay element represents L sample delays)

az =4az =

{ }1since,4 ≤≤ aaa

2L

Chapter 10 33

Fig. 9: (also Fig.10.15, p.341) A 4-parallel 1st-order recursive filter

D

2a3a

)44( +ku )54( +ku )64( +ku)34( +ku4a

)74( +ky )34( +ky

a

D

2a3a

4a

)64( +ky )24( +ky

a

D

D

2a3a

4a

)54( +ky )14( +ky

a

D

3a 2a

4a

)64( +ky )4( ky

a

D

D

Chapter 10 34

• Trick:– Instead of computing y(4k+1), y(4k+2) & y(4k+3) independently,

we can use y(4k) to compute y(4k+1), use y(4k+1) to computey(4k+2), and use y(4k+2) to compute y(4k+3), at the expense of anincrease in the system latency, which leads to a significantreduction in hardware complexity.

– This method is referred as incremental block processing, andy(4k+1), y(4k+2) and y(4k+3) are computed incrementally.

• Example (Example 10.5.2, p.341) Consider the same 1st-order filter in lastexample. To derive its 4-parallel filter structure with the minimum hardwarecomplexity instead of simply repeating the hardware 4 times as in Fig.15, theincremental computation technique can be used to reduce hardware complexity– First, design the circuit for computing y(4k) (same as Fig.14)– Then, derive y(4k+1) from y(4k), y(4k+2) from y(4k+1), y(4k+3) from

y(4k+2) by using

+++=++++=+

+=+

)24()24()34()14()14()24(

)4()4()14(

kukaykykukayky

kukayky

Chapter 10 35

– The complete architecture is shown in Fig.10– The hardware complexity has reduced from to (2L-1) at the expense of

an increase in the computation time for y(4k+1), y(4k+2) and y(4k+3)

Fig.10: (also see Fig.10.16, p.342) Incremental block filter structure with L=4

2L

D

2a 3aa

)24( +ku )14( +ku )4( ku)34( +ku4a

)44( +ky)4( ky

a

a

a

)14( +ky

)24( +ky

)34( +ky

Chapter 10 36

• Example (Example 10.5.3, p.342) Consider a 2nd-order IIR filter describedby the transfer function (10.21). Its pole-zero locations are shown inFig.11. Derive a 3-parallel IIR filter where in every clock cycle 3inputs are processed and 3 outputs are generated

– Since the filter order is 2, 2 outputs need to be updated independently andthe 3rd output can be computed incrementally outside the feedback loopusing the 2 updated outputs. Assume that y(3k) and y(3k+1) are computedusing loop update operations and y(3k+2) is computed incrementally.From the transfer function, we have:

– The loop update process for the 3-parallel system is shown in Fig.12where y(3k+3) and y(3k+4) are computed using y(3k) and y(3k+1)

2831

45

21

1)1(

)( −−

−

+−+

=zz

zzH (10.21)

)2()1(2)()(

);()2()1()( 83

45

−+−+=

+−−−=

nunununf

nfnynyny(10.22)

Chapter 10 37

21

43 11−

D

D

y(3k+3) y(3k)

y(3k+1)y(3k+4)

Fig.11: Pole-zero plots for the transfer function

Fig.12: Loop update for block size=3

Chapter 10 38

– The computation of y(3k+3) using y(3k) & y(3k+1) can be carried out ify(n+3) can be computed using y(n) & y(n+1). Similarly y(3k+4) can becomputed using y(3k) & y(3k+1) if y(n+4) can be expressed in terms ofy(n) & y(n+1) (see Fig.13). These state update operations correspond toclustered look-ahead operation for M=2 and 3 cases. The 2-stage and 3-stage clustered look-ahead equations are derived as:

[ ]

[ ])()1(

)3()2()4()3(

)(

)2()1()3()2(

)()2()1()(

45

3215

83

45

1619

83

83

45

45

83

45

nfnf

nynfnyny

nf

nynfnyny

nfnynyny

+−+

−−−+−−−=+

−−−+−−−=

+−−−=

(10.23)

Chapter 10 39

– Substituting n=3k+3 & n=3k+4 into (10.23), we have the following 2loop update equations:

– The output y(3k+2) can be obtained incrementally as follows:

– The block structure is shown in Fig. 14

)43()33(

)23()3()13()43(

)23(

)23()3()13()33(

45

1619

12857

6465

45

3215

1619

++++

++−+=+++

++−+=+

kfkf

kfkykyky

kf

kfkykyky

(10.24)

)23()3()13()23( 38

45 ++−+=+ kfkykyky

3

2n+2 n+3 n+4n+1n

Fig.13: Relationship of the recursive outputs

Chapter 10 40

Fig. 14: Block structure of the 2nd-order IIR filter (L=3)(also see Fig.10.20, p.344)

D

2

)43( +ku

D

2

2

D

D

45 16

19

45

1619

3215−

6465

12851−

83−

45

)33( +ku )23( +ku

)43(1 +kf

)33(1 +kf

)23(1 +kf

)43(2 +kf

)33(2 +kf

)13( +ky

)3( ky

)23( +ky

Chapter 10 41

• Comments– The original sequential system has 2 poles at {1/2, 3/4}. Now consider the

pole locations of the new parallel system. Rewrite the 2 state updateequations in matrix form: Y(3k+3)=AY(3k)+F, i.e.

– The eigenvalues of system matrix A are , which are the poles ofthe new parallel system. Thus, the parallel system is more stable. Note:the parallel system has the same number of poles as the original system

– For a 2nd-order IIR filter (N=2), there are total 3L+[(L-2)+(L-1)]+4+2(L-2)=7L-3 multiplications, (the numerator part — 3L; the overhead of loopupdate — [(L-2)+(L-1)]; the loop multiplications — 4; the incrementalcomputation — 2(L-2)). The multiplication complexity is linear functionof block size L. This multiplication complexity can be further reduced byusing fast parallel filter structures and substructure sharing for theincrementally-computed outputs

+

+

⋅

=

++

−

−

2

1

6465

12857

1619

3215

)13()3(

)43()33(

ff

kyky

kyky

(10.25)

( ) ( )3433

21 ,

Chapter 10 42

Combined Pipelining and Parallel ProcessingFor IIR Filters

• Pipelining and parallel processing can also be combined for IIR filtersto achieve a speedup in sample rate by a factor L×M, where L denotesthe levels of block processing and M denotes stages of pipelining, or toachieve power reduction at the same speed

• Example (Example 10.6.1, p.345) Consider the 1st-order IIR with transferfunction (10.26). Derive the filter structure with 4-level pipelining and3-level block processing (i.e., M=4, L=3)

– Because the filter order is 1, only 1 loop update operation is required.Theother 3 outputs can be computed incrementally.

111

)( −⋅−=

zazH (10.26)

Chapter 10 43

– Since pipelining level M=4, the loop must contain 4 delay elements(shown in Fig.15). Since the block size L=3, each delay elementrepresents a block delay (corresponds to 3 sample delays). Therefore,y(3k+12) needs to be expressed in terms of y(3k) and inputs (see Fig. 15).

Fig.15: Loop update for the pipelined block system(also see Fig.10.21, p. 346)

4Dy(3k)y(3k+12)

12a

Chapter 10 44

– Substituting n=3k+12, we get:

– Finally, we have:

– The parallel-pipelined filter structure is shown in Fig. 16

)()11()12(

)()1()(

1112 nunuanya

nunayny

+⋅⋅⋅⋅⋅⋅+−+−=

⋅⋅⋅⋅⋅⋅=+−=

)123()93()63()3(

)123()13()3()123(

113

2612

1112

++++++=

++⋅⋅⋅⋅⋅⋅+++=+

kfkfakfakya

kukuakyaky

+++=+

+++++=+

)123()93()123(

)123()113()103()123(

113

2

21

kfkfakf

kukaukuakfwhere

+++=+++=+

++++++=+

)23()13()23()13()3()13(

)123()93()63()3()123( 113

2612

kukaykykukayky

kfkfakfakyaky

(10.27)

Chapter 10 45

Fig. 16: The filter structure of the pipelined block systemwith L=3 & M=4 (also see Fig.10.22, p.347)

3D

)103( +ku

2a

)113( +ku)123( +ku

a

3D

D3a

2D3a

4D

12a

)123(1 +kf )123(2 +kf

)13( +ku

)23( +ku )23( +ky

)13( +ky

)3( ky

Chapter 10 46

• Comments– The parallel-pipelined filter has 4 poles: . Since the

pipelining level is 4 and the filter order is 1, there are total 4 poles in thenew system, which are separated by the same angular distance. Since theblock size is 3, the distance of the poles from the origin is .

– Note: The decomposition method is used here in the pipelining phase.– The multiplication complexity ( assuming the pipelining level M to be

power of 2) can be calculated as (10.28), which is linear with respect to L,and logarithmic with respect to M:

• Example (Example 10.6.2, p. 347) Consider the 2nd-order filter in Example10.5.3 again, design a pipelined-block system for L=3 and M=2

( )3333 ,,, jajaaa −−

3a

( ) ( ) MLLML 22 log1211log1 +−=−+++−

)2()1(2)()(

);()2()1()( 83

45

−+−+=

+−−−=

nunununf

nfnynyny

(10.28)

(10.29)

Chapter 10 47

– A method similar to clustered look-ahead can be used to update y(3k+6)and y(3k+7) using y(3k) and y(3k+1). Then by index substitution, the finalsystem of equations can be derived.

– Suppose the system update matrix is A. Since the poles of the originalsystem are , the eigenvalues of A can be verified to be

– The poles of the new parallel-pipelined second-order filter are the squareroots of eigenvalues of A, i.e.,

• Comments: In general, the systematic approach below can be used tocompute the pole location of the new parallel pipelined system:– 1. Write the loop update equations using LM-level look-ahead, where M

and L denote the level of pipelining and parallel processing, respectively.– 2. Write the state space representation of the parallel pipelined filter,

where state matrix A has dimension N×N and N is the filter order– 3. Compute the eigenvalues of matrix A,– 4. The NM poles of the new parallel-pipelined system correspond to the

M-th roots of the eigenvalues of A, i.e.,

( )43

21 , ( ) ( )6

436

21 ,

( ) ( ) ( ) ( )3433

433

213

21 ,,, −−

iλ

Ni ≤≤1( )Mi

1

λ