Chapter 10: Pipelined and ParallelRecursive and Adaptive Filters
Keshab K. Parhi
Chapter 10 2
Outline
• Introduction• Pipelining in 1st-Order IIR Digital Filters• Pipelining in Higher-Order IIR Digital Filters• Parallel Processing for IIR Filters• Combined Pipelining and Parallel Processing for IIR
Filters
Chapter 10 3
First-Order IIR Filter• Consider a 1st-order linear time-invariant recursion (see Fig. 1)
• The iteration period of this filter is , where representword-level multiplication time and addition time
Look-Ahead Computation
{ }am TT + { }am TT ,)()()1( nubnyany ⋅+⋅=+ (10.1)
• In look-ahead transformation, the linear recursion is first iterated a fewtimes to create additional concurrency.
• By recasting this recursion, we can express y(n+2) as a function of y(n)to obtain the following expression (see Fig. 2(a))
• The iteration bound of this recursion is , the same as theoriginal version, because the amount of computation and the number oflogical delays inside the recursive loop have both doubled
( ) 22 am TT +[ ] )1()()()2( +++=+ nbunbunayany (10.2)
Chapter 10 4
• Another recursion equivalent to (10.2) is (10.3). Shown on Fig.2(b), itsiteration bound is , a factor of 2 lower than before.
• Applying (M-1) steps of look-ahead to the iteration of (10.1), we canobtain an equivalent implementation described by (see Fig. 3)
– Note: the loop delay is instead of , which means that the loopcomputation must be completed in M clock cycles (not 1 clock cycle). Theiteration bound of this computation is , which correspondsto a sample rate M times higher than that of the original filter
– The terms in (10.4) can be pre-computed(referred to as pre-computation terms). The second term in RHS of (10.4)is the look-ahead computation term (referred to as the look-aheadcomplexity); it is non-recursive and can be easily pipelined
)1()()()2( 2 +⋅+⋅+⋅=+ nubnuabnyany( ) 2am TT +
(10.3)
∑−
=
−−+⋅⋅+⋅=+1
0
)1()()(M
i
iM iMnubanyaMny (10.4)Mz− 1−z
( ) MTT am +
{ }MM ababaab ,,,, 12 −⋅⋅⋅
Chapter 10 5
Fig.2.(a)
Fig.2.(b)
2D
D
b
a
u(n+1)
y(n+1)
u(n)
b
a
y(n+2)y(n)
2D
D
ab a
u(n+1) u(n)
y(n+2)y(n)
b 2
D
b a
u(n)
y(n)y(n+1)
Fig. 1
Chapter 10 6
MD
D
ab a
u(n+M-1) u(n+M-2)
y(n+M)
y(n)
b M
D
a b
u(n)
M-1
Fig. 3: M-stage Pipelinable 1st-Order IIR Filter
• Look-ahead computation has allowed a single serial computation to betransformed into M independent concurrent computations, and topipeline the feedback loop to achieve high speed filtering of a singletime series while maintaining full hardware utilization.
• Provided the multiplier and the adder can be conveniently pipelined,the iteration bound can be achieved by retiming or cutsettransformation (see Chapter 4)
Chapter 10 7
Pipelining in 1st-Order IIR Digital Filters
– Look-ahead techniques add canceling poles and zeros with equalangular spacing at a distance from the origin which is same asthat of the original pole. The pipelined filters are always stableprovided that the original filter is stable
– The pipelined realizations require a linear increase in complexitybut decomposition techniques can be used to obtain animplementation with logarithmic increase in hardware withrespect to the number of loop pipeline stages
– Example: Consider the 1st-order IIR filter transfer function
• The output sample y(n) can be computed using the input sampleu(n) and the past output sample as follows:
• The sample rate of this recursive filter is limited by thecomputation time of one multiply-add operation
111
)( −⋅−=
zazH (10.5)
)()1()( nunyany +−⋅= (10.6)
Chapter 10 8
Pipelining in 1st-Order IIR Digital Filters (continued)
1. Look-Ahead Pipelining for 1st-Order IIR Filters• Look-ahead pipelining adds canceling poles and zeroes to the transfer
function such that the coefficients of in thedenominator of the transfer function are zero. Then, the output sampley(n) can be computed using the inputs and the output sample y(n-M)such that there are M delay elements in the critical loop, which in turncan be used to pipeline the critical loop by M stages and the samplerate can be increased by a factor M
• Example: Consider the 1st-order filter, , whichhas a pole at z=a (a≤1). A 3-stage pipelined equivalent stable filter canbe derived by adding poles and zeroes at , and is givenby
( )},,{ 11 −−− ⋅⋅⋅ Mzz
( )111)( −⋅−= zazH
( )32πjaez ±=
33
221
11
)(−
−−
⋅−⋅+⋅+
=za
zazazH
Chapter 10 9
Pipelining in 1st-Order IIR Digital Filters (continued)
2. Look-Ahead Pipelining with Power-of-2 Decomposition• With power-of-2 decomposition, an M-stage (for power-of-2 M)
pipelined implementation for 1st-order IIR filter can be obtained by sets of transformations
• Example: Consider a 1st-order recursive filter transfer functiondescribed by . The equivalent pipelinedtransfer function can be described using the decomposition techniqueas follows
– This pipelined implementation is derived by adding (M-1) poles and zerosat identical locations.
– The original transfer function has a single pole at (see Fig.4(a)).
M2log
( )MM
M
i
za
zazbzH
ii
−
−
=−−
⋅−
⋅+⋅= ∏
1
1)(
1log
0221 2
(10.7)
( ) ( )11 1)( −− ⋅−⋅= zazbzH
az =
Chapter 10 10
– The pipelined transfer function has poles at the following locations (seeFig.4(b) for M=8):
– The decomposition of the canceling zeros is shown in Fig.4(c). The i-thstage of the decomposed non-recursive portion implements zeroslocated at:
– The i-th stage of the decomposed non-recursive portion requires a singlepipelined multiplication operation independent of the stage number i
– The multiplication complexity of the pipelined implementation is
– The finite-precision pipelined filters suffer from inexact pole-zerocancellation, which leads to magnitude and phase error. These errors canbe reduced by increasing the wordlength (see p. 323)
( ) ( ){ }MMjMj aeaea ππ 212 ,,, ⋅−⋅⋅⋅
i2
( ) ( )( ),212exp injaz π+= (10.8)( )12,,1,0 −⋅⋅⋅= in
( )2log 2 +M
Chapter 10 11
Fig.
4
•Po
le re
pres
enta
tion
of a
1ST
-ord
er re
curs
ive
filte
r•
Pole
/zer
o re
pres
enta
tion
of a
1ST
-ord
er II
R w
ith 8
loop
sta
ges
•D
ecom
posi
tion
base
d on
pip
elin
ed II
R fo
r M =
8
Chapter 10 12
Pipelining in 1st-Order IIR Digital Filters (continued)
3. Look-Ahead Pipelining with General Decomposition• The idea of decomposition can be extended to any arbitrary number of
loop pipelining stages M. If , then the non-recursivestages implement , , zeros,respectively, totaling (M-1) zeros
• Example (Example 10.3.3, p.325) Consider the 1st-order IIR
– A 12-stage pipelined decomposed implementation is given by
pMMMM ⋅⋅⋅= 21( )11 −M ( )121 −MM ( )1, 121 −⋅⋅⋅⋅⋅⋅ − pp MMMM
111
)( −⋅−=
zazH
( )( )( )1212
6644221
1212
11
0
1111
1)( −
−−−−
−=
−
⋅−++++
=⋅−
⋅= ∑
zazazazaaz
za
zazH i
ii
- This implementation is based on a decomposition (see Fig. 5)
232 ××
Chapter 10 13
– The first section implements 1 zero at –a, the second section implements 4zeros at , and the third section implements 6 zeros at
– Another decomposed implementation ( decomposition) is givenby
• The first section implements 1 zero at –a, the second sectionimplements 2 zeros at , and the third section implements 8 zerosat
{ }323, ππ jj aeae ±±
{ }656 ,, ππ jj aeaeja ±±±
( )( )( )1212
8844221
1111
)(−
−−−−
⋅−++++
=za
zazazaazzH
322 ××
{ }653236 ,,, ππππ jjjj aeaeaeae ±±±±ja±
– The third decomposition ( decomposition) is given by
• The first section implements 2 zeros at , the second sectionimplements 3 zeros at , and the third sectionimplements 6 zero at
223 ××( )( )( )
1212
6633221
1111
)(−
−−−−
⋅−++++
=za
zazazaazzH
{ }32πjae±
{ }3, πjaea ±−{ }656 ,, ππ jj aejaae ±± ±
Chapter 10 14
Fig.5(a)
Fig.5(b)
Pole-zero location of a 12-stage pipelined 1ST-order IIR
Decomposition of the zerosof the pipelined filter fora 2X3X2 decomposition
Chapter 10 15
Pipelining in Higher-order IIR Digital Filters
• Higher-order IIR digital filters can be pipelined by using clusteredlook-ahead or scattered look-ahead techniques. (For 1st-order IIR filters,these two look-ahead techniques reduce to the same form)– Clustered look-ahead: Pipelined realizations require a linear
complexity in the number of loop pipelined stages and are notalways guaranteed to be stable
– Scattered look-ahead: Can be used to derive stable pipelined IIRfilters
– Decomposition technique: Can also be used to obtain area-efficientimplementation for higher-order IIR for scattered look-ahead filters
– Constrained filter design techniques: Achieve pipelining withoutpole-zero cancellation
Chapter 10 16
• The transfer function of an N-th order direct-form recursive filter isdescribed by
• Equivalently, the output y(n) can be described in terms of the inputsample u(n) and the past input/output samples as follows
– The sample rate of this IIR filter realization is limited by the throughput of1 multiplication and 1 addition, since the critical path contains a singledelay element
∑∑
=−
=−
−= N
ii
i
N
ii
i
za
zbzH
1
0
1)( (10.9)
(10.10))()(
)()()(
1
01
nzinya
inubinyany
N
ii
N
ii
N
ii
+−=
−+−=
∑
∑∑
=
==
Chapter 10 17
1. Clustered Look-Ahead Pipelining
• The basic idea of clustered look-ahead:– Add canceling poles and zeros to the filter transfer function such
that the coefficients of in the denominator of thetransfer function are 0, and the output samples y(n) can bedescribed in terms of the cluster of N past outputs:
– Hence the critical loop of this implementation contains M delayelements and a single multiplication. Therefore, this loop can bepipelined by M stages, and the sample rate can be increased by afactor M. This is referred to as M-stage clustered look-aheadpipelining
{ })1(1 ,, −−− ⋅⋅⋅ Mzz
{ })1(,),( +−−⋅⋅⋅− NMnyMny
Pipelining in Higher-order IIR Digital Filters (cont’d)
Chapter 10 18
• Example: (Example 10.4.1, p.327) Consider the all-pole 2nd-order IIR filterwith poles at . The transfer function of this filter is
– A 2-stage pipelined equivalent IIR filter can be obtained by eliminatingthe term in the denominator (i.e., multiplying both the numerator anddenominator by ). The transformed transfer function isgiven by:
– From, the transfer function, we can see that the coefficient of in thedenominator is zero. Hence, the critical path of this filter contains 2 delayelements and can be pipelined by 2 stages
{ }43,21
2831
451
1)( −− +−
=zz
zH(10.11)
1−z( )1451 −+ z
332152
1619
145
145
145
2831
45
11
11
11
)(
−−
−
−
−
−−
+−+
=
++
⋅+−
=
zzz
zz
zzzH
(10.12)
1−z
Chapter 10 19
• Computation complexity: The numerator (non-recursive portion) of thispipelined filter needs (N+M) multiplications, and the denominator (recursiveportion) needs N multiplications. Thus, the total complexity of this pipelinedimplementation is (N+N+M) multiplications
• Stability: The canceling poles and zeros are utilized for pipelining IIR filters.However, when the additional poles lie outside the unit circle, the filterbecomes unstable. Note that the filters in (10.12) and (10.13) are unstable.
– Similarly, a 3-stage pipelined realization can be derived by eliminatingthe terms of in the denominator of (10.21), which can bedone by multiplying both numerator and denominator by
.– The new transfer function is given by:
{ }21 , −− zz
( )21 1619451 −− ++ zz
4128573
6465
216191
45
11
)( −−
−−
+−++
=zz
zzzH (10.13)
Chapter 10 20
• If the desired pipeline delay M does not produce a stable filter, Mshould be increased until a stable pipelined filter is obtained. To obtainthe optimal pipelining level M, numerical search methods are generallyused
• Example (Example 10.4.3, p.330) Consider a 5-level (M=5) pipelinedimplementation of the following 2nd-order transfer function
– By the stability analysis, it is shown that (M=5) does not meet the stabilitycondition. Thus M is increased to M=6 to obtain the following stablepipelined filter as
21 6889.05336.111
)( −− +−=
zzzH
(10.14)
76
54321
5011.03265.117275.01454.14939.16630.15336.11
)( −−
−−−−−
+−+++++
=zz
zzzzzzH
(10.15)
2. Stable Clustered Look-Ahead Filter Design
Chapter 10 21
3. Scattered Look-Ahead Pipelining
• Scattered look-ahead pipelining: The denominator of the transferfunction in (10.9) is transformed in a way that it contains the N terms
. Equivalently, the state y(n) is computed interms of N past scattered states y(n-M),y(n-2M),…, and y(n-NM)
• In scattered look-ahead, for each poles in the original filter, weintroduce (M-1) canceling poles and zeros with equal angular spacingat a distance from the origin the same as that of the original pole.– Example: if the original filter has a pole at , we add (M-1)
poles and zeros at toderive a pipelined realization with M loop pipeline stages
• Assume that the denominator of the transfer function can be factorizedas follows:
pz=
{ }NMMM zzz −−− ⋅⋅⋅ ,,, 2
( ){ }1,,2,1,2exp −⋅⋅⋅== MkMkjpz π
∏ =−−=
N
i i zpzD1
1)1()( (10.16)
Chapter 10 22
– Then, the pipelining process using the scattered look-aheadapproach can be described by
(continued)
( )( ) )('
)('
1
1)(
)()(
)(
1
1
012
1
1
1
12
MN
i
M
kMkj
i
N
i
M
k
Mkji
zDzN
zep
zepzN
zDzN
zH
=−
−=
=
∏ ∏∏ ∏
=
−
=−
=
−
=
−
π
π
(10.17)
• Example (Example 10.4.5, p.332) Consider the 2nd-order filter with complexconjugate poles at . The filter transfer function is given by
– We can pipeline this filter by 3 stages by introducing 4 additionalpoles and zeros at if
θjrez ±=
221cos211
)( −− +−=
zrzrzH
θ
( ) ( ){ }3232 , πθπθ −±+± == jj rezrez 32πθ ≠
Chapter 10 23
– (cont’d) The equivalent pipelined filter is then given by
– When , then only 1 additional pole and zero at isrequired for 3-stage pipelining since and . The equivalent pipelined filter is thengiven by
• Example (Example 10.4.6, p.332) Consider the 2nd-order filter with realpoles at . The transfer function is given by
( )( ) 6633
4433221
3cos21cos22cos21cos21
)( −−
−−−−
+⋅−+⋅+⋅++⋅+
=zrzr
zrzrzrzrzH
θθθθ
32πθ = rz =
( ) rrez j == −± 32πθ
( ) θπθ jj rerez ±+± == 32
( )( ) 33
1
1221
1
11
111
)( −
−
−−−
−
⋅−⋅−
=⋅−⋅+⋅+
⋅−=
zrzr
zrzrzrzr
zH
{ }21, rzrz ==
( ) 221
1211
1)( −− ⋅+⋅+−
=zrrzrr
zH
Chapter 10 24
– (cont’d) A 3-stage pipelined realization is derived by addingpoles(and zeros) at . The pipelinedrealization is given by
– The pole-zero locations of a 3-stage pipelined 2nd-order filter withpoles at z=1/2 and z=3/4 are shown in Fig. 6
• CONCLUSIONS:– If the original filter is stable, then the scattered look-ahead
approach leads to stable pipelined filters, because the distance ofthe additional poles from the original is the same as that of theoriginal filter
– Multiplication Complexity: (NM+1) for the non-recursive portionin (10.17), and N for the recursive portion. Total pipelined filtermultiplication complexity is (NM+N+1)
{ }322
321 , ππ jj erzerz ±± ==
( ) ( ) ( )( ) 63
23
133
23
1
422
21
32121
22221
21
121
11
)(−−
−−−−
+⋅+−++++++++
=zrrzrr
zrrzrrrrzrrrrzrrzH
Chapter 10 25
21
43 1 Re[z]
Im[z]
Fig. 6: Pole-zero representation of a 3-stage pipelined equivalent stable filter derived using scattered look-ahead approach
– (cont’d) The multiplication complexity is linear with respect to M.and is much greater than that of clustered look-ahead
– Also the latch complexity is square in M, because each multiplieris pipelined by M stages
Chapter 10 26
4. Scattered Look-Ahead Pipelining with Power-of-2Decomposition
• This kind of decomposition technique will lead to a logarithmicincrease in multiplication complexity (hardware) with respect to thelevel of pipelining
• Let the transfer function of a recursive digital filter be
– A 2-stage pipelined implementation can be obtained bymultiplying by in the numerator anddenominator. The equivalent 2-stage pipelined implementation isdescribed by
)()(
1)(
1
0
zDzN
za
zbzH N
i
ii
N
i
ii =
⋅−=
∑∑
=−
=−
(10.18)
( )∑ =−⋅−−
N
i
ii
i za1
)1(1
( ) ( )( )( )( ) )('
)('
)1(11
11)(
11
10
zDzN
zaza
zazbzH N
ii
iiN
ii
i
N
ii
iiN
ii
i =⋅−−⋅−
⋅−−=
∑∑∑∑
=−
=−
=−
=−
(10.19)
Chapter 10 27
– (cont’d) Similarly, subsequent transformations can be applied toobtain 4, 8, and 16 stage pipelined implementations, respectively
– Thus, to obtain an M-stage pipelined implementation (for power-of-2 M), sets of such transformations need to be applied.
– By applying sets of such transformations, anequivalent transfer function (with M pipelining stages inside therecursive loop) can be derived, which requires a complexity of
multiplications, a logarithmic complexitywith respect to M.
M2log( )1log 2 −M
( )1log2 2 ++ MNN
– Note: the number of delays (or latches) is linear: the total number ofdelays (or latches) is approximately , about NMdelays in non-recursive portion, and delays forpipelining each of the multipliers by M stages
– In the decomposed realization, the 1st stage implements an N-thorder non-recursive section, and the subsequent stages respectivelyimplement 2N, 4N,…, NM/2-order non-recursive sections
( )1log 2 +MNMMNM 2log
MN 2log
Chapter 10 28
• Example (Example 10.4.7, p.334) Consider a 2nd-order recursive filterdescribed by
– The poles of the system are located at (see Fig. 7(a)).– The pipelined filter requires multiplications and is
described by
221
22
110
cos21)()(
)( −−
−−
+⋅−++
==zrzr
zbzbbzUzY
zHθ
{ }θjrez ±=
×+⋅−
= −−=
−∑MMMM
ii
i
zrzMr
zbzH 22
2
0
cos21)(
θ
( )∏ −
=−− ++
+⋅+1log
022222 11
2cos21M
ii iiii
zrzr θ
{ }Mwhere πθ 2≠
– The 2M poles of the transformed transfer function (shown in Fig.7(b)) are located at
( )( ) ( )1,2,1,0,2 −⋅⋅⋅== +± Mirez Mij πθ
( )5log2 2 +M
Chapter 10 29
Fig. 7: (also see Fig.10.11, p.336) Pole-zero representation for Example 10.4.7,the decompositions of the zeros is shown in (c)
Chapter 10 30
Parallel Processing in IIR Filters• Parallel processing can also be used in design of IIR filters• First discuss parallel processing for a simple 1st-order IIR filter, then
we discuss higher order filters• Example: (Example 10.5.1, p.339) Consider the transfer function of a 1st-
order IIR filter given by
– where for stability. This filter has only 1 pole located at . Thecorresponding input-output can be written as y(n+1)=ay(n)+u(n)
– Consider the design of a 4-parallel architecture (L=4) for the foregoingfilter. Note that in the parallel system, each delay element is referred to asa block delay, where the clock period of the block system is 4 times thesample period. Therefore, the loop update equation should update y(n+4)by using inputs and y(n).
1
1
1)(
−
−
−=
azz
zH
1≤a az =
Chapter 10 31
– By iterating the recursion (or by applying look-ahead technique), we get
– Substituting
– The corresponding architecture is shown in Fig.8.
)3()2()1()()()4( 234 +++++++=+ nunaunuanuanyany
kn 4=
)34()24()14()4()4()44( 234
+++++++=+
kukaukuakuakyaky
D
2a 3aa
)24( +ku )14( +ku )4( ku)34( +ku4a
)44( +ky )4( ky
Fig. 8: (also see Fig.10.14, p. 340)
(10.20)
Chapter 10 32
– The pole of the original filter is at , whereas the pole for theparallel system is at , which is much closer to the origin since
– An important implication of this pole movement is the improvedrobustness of the system to the round-off noise
– A straightforward block processing structure for L=4 obtained bysubstituting n=4k+4, 4k+5, 4k+6 and 4k+7 in (10.20) is shown in Fig. 9.
– Hardware complexity of this architecture: multiply-add operations(Because L multiply-add operations are required for each output and thereare L outputs in total)
– The square increase in hardware complexity can be reduced by exploitingthe concurrency in the computation (the decomposition property in thescattered look-ahead mode can not be exploited in the block processing modebecause one hardware delay element represents L sample delays)
az =4az =
{ }1since,4 ≤≤ aaa
2L
Chapter 10 33
Fig. 9: (also Fig.10.15, p.341) A 4-parallel 1st-order recursive filter
D
2a3a
)44( +ku )54( +ku )64( +ku)34( +ku4a
)74( +ky )34( +ky
a
D
2a3a
4a
)64( +ky )24( +ky
a
D
D
2a3a
4a
)54( +ky )14( +ky
a
D
3a 2a
4a
)64( +ky )4( ky
a
D
D
Chapter 10 34
• Trick:– Instead of computing y(4k+1), y(4k+2) & y(4k+3) independently,
we can use y(4k) to compute y(4k+1), use y(4k+1) to computey(4k+2), and use y(4k+2) to compute y(4k+3), at the expense of anincrease in the system latency, which leads to a significantreduction in hardware complexity.
– This method is referred as incremental block processing, andy(4k+1), y(4k+2) and y(4k+3) are computed incrementally.
• Example (Example 10.5.2, p.341) Consider the same 1st-order filter in lastexample. To derive its 4-parallel filter structure with the minimum hardwarecomplexity instead of simply repeating the hardware 4 times as in Fig.15, theincremental computation technique can be used to reduce hardware complexity– First, design the circuit for computing y(4k) (same as Fig.14)– Then, derive y(4k+1) from y(4k), y(4k+2) from y(4k+1), y(4k+3) from
y(4k+2) by using
+++=++++=+
+=+
)24()24()34()14()14()24(
)4()4()14(
kukaykykukayky
kukayky
Chapter 10 35
– The complete architecture is shown in Fig.10– The hardware complexity has reduced from to (2L-1) at the expense of
an increase in the computation time for y(4k+1), y(4k+2) and y(4k+3)
Fig.10: (also see Fig.10.16, p.342) Incremental block filter structure with L=4
2L
D
2a 3aa
)24( +ku )14( +ku )4( ku)34( +ku4a
)44( +ky)4( ky
a
a
a
)14( +ky
)24( +ky
)34( +ky
Chapter 10 36
• Example (Example 10.5.3, p.342) Consider a 2nd-order IIR filter describedby the transfer function (10.21). Its pole-zero locations are shown inFig.11. Derive a 3-parallel IIR filter where in every clock cycle 3inputs are processed and 3 outputs are generated
– Since the filter order is 2, 2 outputs need to be updated independently andthe 3rd output can be computed incrementally outside the feedback loopusing the 2 updated outputs. Assume that y(3k) and y(3k+1) are computedusing loop update operations and y(3k+2) is computed incrementally.From the transfer function, we have:
– The loop update process for the 3-parallel system is shown in Fig.12where y(3k+3) and y(3k+4) are computed using y(3k) and y(3k+1)
2831
45
21
1)1(
)( −−
−
+−+
=zz
zzH (10.21)
)2()1(2)()(
);()2()1()( 83
45
−+−+=
+−−−=
nunununf
nfnynyny(10.22)
Chapter 10 37
21
43 11−
D
D
y(3k+3) y(3k)
y(3k+1)y(3k+4)
Fig.11: Pole-zero plots for the transfer function
Fig.12: Loop update for block size=3
Chapter 10 38
– The computation of y(3k+3) using y(3k) & y(3k+1) can be carried out ify(n+3) can be computed using y(n) & y(n+1). Similarly y(3k+4) can becomputed using y(3k) & y(3k+1) if y(n+4) can be expressed in terms ofy(n) & y(n+1) (see Fig.13). These state update operations correspond toclustered look-ahead operation for M=2 and 3 cases. The 2-stage and 3-stage clustered look-ahead equations are derived as:
[ ]
[ ])()1(
)3()2()4()3(
)(
)2()1()3()2(
)()2()1()(
45
3215
83
45
1619
83
83
45
45
83
45
nfnf
nynfnyny
nf
nynfnyny
nfnynyny
+−+
−−−+−−−=+
−−−+−−−=
+−−−=
(10.23)
Chapter 10 39
– Substituting n=3k+3 & n=3k+4 into (10.23), we have the following 2loop update equations:
– The output y(3k+2) can be obtained incrementally as follows:
– The block structure is shown in Fig. 14
)43()33(
)23()3()13()43(
)23(
)23()3()13()33(
45
1619
12857
6465
45
3215
1619
++++
++−+=+++
++−+=+
kfkf
kfkykyky
kf
kfkykyky
(10.24)
)23()3()13()23( 38
45 ++−+=+ kfkykyky
3
2n+2 n+3 n+4n+1n
Fig.13: Relationship of the recursive outputs
Chapter 10 40
Fig. 14: Block structure of the 2nd-order IIR filter (L=3)(also see Fig.10.20, p.344)
D
2
)43( +ku
D
2
2
D
D
45 16
19
45
1619
3215−
6465
12851−
83−
45
)33( +ku )23( +ku
)43(1 +kf
)33(1 +kf
)23(1 +kf
)43(2 +kf
)33(2 +kf
)13( +ky
)3( ky
)23( +ky
Chapter 10 41
• Comments– The original sequential system has 2 poles at {1/2, 3/4}. Now consider the
pole locations of the new parallel system. Rewrite the 2 state updateequations in matrix form: Y(3k+3)=AY(3k)+F, i.e.
– The eigenvalues of system matrix A are , which are the poles ofthe new parallel system. Thus, the parallel system is more stable. Note:the parallel system has the same number of poles as the original system
– For a 2nd-order IIR filter (N=2), there are total 3L+[(L-2)+(L-1)]+4+2(L-2)=7L-3 multiplications, (the numerator part — 3L; the overhead of loopupdate — [(L-2)+(L-1)]; the loop multiplications — 4; the incrementalcomputation — 2(L-2)). The multiplication complexity is linear functionof block size L. This multiplication complexity can be further reduced byusing fast parallel filter structures and substructure sharing for theincrementally-computed outputs
+
+
⋅
=
++
−
−
2
1
6465
12857
1619
3215
)13()3(
)43()33(
ff
kyky
kyky
(10.25)
( ) ( )3433
21 ,
Chapter 10 42
Combined Pipelining and Parallel ProcessingFor IIR Filters
• Pipelining and parallel processing can also be combined for IIR filtersto achieve a speedup in sample rate by a factor L×M, where L denotesthe levels of block processing and M denotes stages of pipelining, or toachieve power reduction at the same speed
• Example (Example 10.6.1, p.345) Consider the 1st-order IIR with transferfunction (10.26). Derive the filter structure with 4-level pipelining and3-level block processing (i.e., M=4, L=3)
– Because the filter order is 1, only 1 loop update operation is required.Theother 3 outputs can be computed incrementally.
111
)( −⋅−=
zazH (10.26)
Chapter 10 43
– Since pipelining level M=4, the loop must contain 4 delay elements(shown in Fig.15). Since the block size L=3, each delay elementrepresents a block delay (corresponds to 3 sample delays). Therefore,y(3k+12) needs to be expressed in terms of y(3k) and inputs (see Fig. 15).
Fig.15: Loop update for the pipelined block system(also see Fig.10.21, p. 346)
4Dy(3k)y(3k+12)
12a
Chapter 10 44
– Substituting n=3k+12, we get:
– Finally, we have:
– The parallel-pipelined filter structure is shown in Fig. 16
)()11()12(
)()1()(
1112 nunuanya
nunayny
+⋅⋅⋅⋅⋅⋅+−+−=
⋅⋅⋅⋅⋅⋅=+−=
)123()93()63()3(
)123()13()3()123(
113
2612
1112
++++++=
++⋅⋅⋅⋅⋅⋅+++=+
kfkfakfakya
kukuakyaky
+++=+
+++++=+
)123()93()123(
)123()113()103()123(
113
2
21
kfkfakf
kukaukuakfwhere
+++=+++=+
++++++=+
)23()13()23()13()3()13(
)123()93()63()3()123( 113
2612
kukaykykukayky
kfkfakfakyaky
(10.27)
Chapter 10 45
Fig. 16: The filter structure of the pipelined block systemwith L=3 & M=4 (also see Fig.10.22, p.347)
3D
)103( +ku
2a
)113( +ku)123( +ku
a
3D
D3a
2D3a
4D
12a
)123(1 +kf )123(2 +kf
)13( +ku
)23( +ku )23( +ky
)13( +ky
)3( ky
Chapter 10 46
• Comments– The parallel-pipelined filter has 4 poles: . Since the
pipelining level is 4 and the filter order is 1, there are total 4 poles in thenew system, which are separated by the same angular distance. Since theblock size is 3, the distance of the poles from the origin is .
– Note: The decomposition method is used here in the pipelining phase.– The multiplication complexity ( assuming the pipelining level M to be
power of 2) can be calculated as (10.28), which is linear with respect to L,and logarithmic with respect to M:
• Example (Example 10.6.2, p. 347) Consider the 2nd-order filter in Example10.5.3 again, design a pipelined-block system for L=3 and M=2
( )3333 ,,, jajaaa −−
3a
( ) ( ) MLLML 22 log1211log1 +−=−+++−
)2()1(2)()(
);()2()1()( 83
45
−+−+=
+−−−=
nunununf
nfnynyny
(10.28)
(10.29)
Chapter 10 47
– A method similar to clustered look-ahead can be used to update y(3k+6)and y(3k+7) using y(3k) and y(3k+1). Then by index substitution, the finalsystem of equations can be derived.
– Suppose the system update matrix is A. Since the poles of the originalsystem are , the eigenvalues of A can be verified to be
– The poles of the new parallel-pipelined second-order filter are the squareroots of eigenvalues of A, i.e.,
• Comments: In general, the systematic approach below can be used tocompute the pole location of the new parallel pipelined system:– 1. Write the loop update equations using LM-level look-ahead, where M
and L denote the level of pipelining and parallel processing, respectively.– 2. Write the state space representation of the parallel pipelined filter,
where state matrix A has dimension N×N and N is the filter order– 3. Compute the eigenvalues of matrix A,– 4. The NM poles of the new parallel-pipelined system correspond to the
M-th roots of the eigenvalues of A, i.e.,
( )43
21 , ( ) ( )6
436
21 ,
( ) ( ) ( ) ( )3433
433
213
21 ,,, −−
iλ
Ni ≤≤1( )Mi
1
λ