Upload
truongtuyen
View
213
Download
0
Embed Size (px)
Citation preview
ASIC LabYi-Ping, Yeh
2012/5/22
Hardware-Efficient Filterbank
Design for Fast Recursive
MDST and IMDST Algorithms
1111
Shin-Chi Lai, Yi-Ping Yeh, Sheau-Fang Lei
Department of Electric Engineering National Cheng Kung University
Tainan, Taiwan
2012 IEEE International Symposium on Circuits and Systems, May 22
ASIC LabYi-Ping, Yeh
2012/5/22
Outline
• Introduction
• Proposed Algorithm Derivation and Its
Architecture
• Discussion and Comparison
• Conclusion
2222
ASIC LabYi-Ping, Yeh
2012/5/22
Outline
• Introduction
– AC-3 Encoder
– AC-3 Decoder
• Proposed Algorithm Derivation and Its
Architecture
• Discussion and Comparison
• Conclusion
3333
ASIC LabYi-Ping, Yeh
2012/5/22
Introduction
4444
http://socialmediaunity.com/why-mobile-marketing/http://www.htc.com/tw/
ASIC LabYi-Ping, Yeh
2012/5/22
AC-3 Encoder
5555
‧ Watkinson, John, 1994: The Art of Digital Audio‧ “Digital Audio Compression Standard (AC–3, E–AC–3), Revision B”, Document A/52B of Advanced Television Systems
Committee (ATSC), Washington D. C., June 2005.
Analysis window block m Analysis window block m+1
ASIC LabYi-Ping, Yeh
2012/5/22
AC-3 Decoder
6666
‧ L. D. Fielder et al., “Introduction to Dolby Digital Plus, an enhancement to the Dolby digital coding system,” 117th AES Convention, San Francisco, CA, Oct. 2004.
Complexity : 1.3 %
Complexity : 1.7 % Complexity : 13.6 %
Complexity : 83.4 %
ASIC LabYi-Ping, Yeh
2012/5/22
Outline
• Introduction
• Proposed Algorithm Derivation and Its
Architecture
– MDST and IMDST Algorithm
– Converted the Algorithm into DST-IV
– DST-IV to Modified IDST-II (1-4)
• Discussion and Comparison
• Conclusion
7777
ASIC LabYi-Ping, Yeh
2012/5/22
MDST and IMDST Algorithm
8888
MDST
IMDST
ASIC LabYi-Ping, Yeh
2012/5/22
MDCT and IMDCT Algorithm
9999
MDCT
IMDCT
ASIC LabYi-Ping, Yeh
2012/5/22
Converted the Algorithm into DST-IV
10101010
ASIC LabYi-Ping, Yeh
2012/5/22
DST-IV to Modified IDST-II (1)
11111111
ASIC LabYi-Ping, Yeh
2012/5/22
DST-IV to Modified IDST-II (2)
• Divided k-index into two Parts
• Periodical
• Symmetric
• Defined Modified IDST-II• Chebyshev method
• Sum and Difference formulas
12121212
ASIC LabYi-Ping, Yeh
2012/5/22
DST-IV to Modified IDST-II (3)
13131313
11
22
initial
ASIC LabYi-Ping, Yeh
2012/5/22
DST-IV to Modified IDST-II (4)
14141414
Design of Modified IDST-II hardware accelerator
11
22 MDST Data Flow
completed MDST computational flow
(the transform length of N =16)
ASIC LabYi-Ping, Yeh
2012/5/22
Outline
• Introduction
• Proposed Algorithm Derivation and Its
Architecture
• Discussion and Comparison
– Computational Complexity
– Hardware Accelerator Cost
– FPGA implementation result
• Conclusion
15151515
ASIC LabYi-Ping, Yeh
2012/5/22
Computational Complexity
16161616
Table I. Computational Complexity of Various MDST and IMDST for 512-
point Transform Length
Method Item MDST IMDST
2003 [11]V. Nikolajevic and G. Fettweis, “New recursive algorithms for the
unified forward and inverse MDCT/MDST,” Journal of VLSI Signal
Processing Systems for Signal, Image and Video Technology.
Multiplication 131,584 131,584
Addition 262,400 262,656
2008 [12]R. Koenig, T. Stripf, J. Becker, “A novel recursive algorithm for bit-
efficient realization of arbitrary length inverse modified cosine
transforms”, Proceedings of Design, Automation and Test in Europe.
Multiplication 262,144 262,144
Addition 262,144 262,144
2009 [14]P. Jain, B. Kumar, S. B. Jain, “Unified recursive structure for forward and
inverse modified DCT/DST/DHT”, IETE Journal of Research.
Multiplication 33,276 65,792
Addition 132,352 98,048
This workMultiplication 98,432 98,432
Addition 163,968 163,712
62.5%
37.5%
25.2%
37.5%
Hardware
Costs
ASIC LabYi-Ping, Yeh
2012/5/22
Method Item MDST IMDST
2003 [11]
Coefficient 3M 6M
Cpt. C 2M2 2M2
DTPT 1 1
Critical path TM+2TA TM+2TA
2008 [12]
Coefficient 3M 4M
Cpt. C 2M2 2M2
DTPT 1 1
Critical path 2TM+2TA 2TM+2TA
2009 [14]
Coefficient 2M 3M/2
Cpt. C M2/4 M2/4
DTPT 2 4
Critical path 2TM+5TA 2TM+4TA
This work
Coefficient M/2
Cpt. C M2/2-M/2
DTPT 2 4
Critical path TM+TA
Hardware Accelerator Cost
17171717
Table II. Requirement Comparison of Hardware Accelerator for Various MDST and IMDST Designs
Method Item MDST IMDST
2003 [11]
Multiplier 3
Adder 3
Register 2
2008 [12]
Multiplier 2
Adder 2
Register 3
2009 [14]
Multiplier 4 4
Adder 8 9
Register 5 5
This work
Multiplier 3
Adder 5
Register 6
ASIC LabYi-Ping, Yeh
2012/5/22
FPGA implementation result
18181818
Table III. FPGA Implementation Results of the
Proposed Hardware Accelerator
Device Xilinx Virtex4
Word Length (Input, coefficient,
multiplier, accumulator, and output)
21 / 24 / 24 / 32 /
32 (bits)
# of Slices and # of Slice Flip Flops 1197 and 390
# of 4 input LUT and # of bonded IOBs 2202 and 213
Coefficient-ROM Size 12824 bits
Clock Rate (MHz) 95.612
Time Cost Per Transform 343 s (N=512)
PSNR values of MDST and IMDST (dB) 79.55 and 79.76
ASIC LabYi-Ping, Yeh
2012/5/22
PSNR Analysis
19191919
Table III. FPGA Implementation Results of the
Proposed Hardware Accelerator
Device Xilinx Virtex4
Word Length (Input, coefficient,
multiplier, accumulator, and output)
21 / 24 / 24 / 32 /
32 (bits)
# of Slices and # of Slice Flip Flops 1197 and 390
# of 4 input LUT and # of bonded IOBs 2202 and 213
Coefficient-ROM Size 12824 bits
Clock Rate (MHz) 95.612
Time Cost Per Transform 343 s (N=512)
PSNR values of MDST and IMDST (dB) 79.55 and 79.76
Audio Application PSNR (dB)
Plain Old Telephone System or POTS 40
AM Radio, LP Records 50
FM Radio, Cassettes 70
CD Player 90
‧ A. J. Aude, “Audio Quality Measurement Primer,”Intersil Corp., app. note AN9789, Feb. 1998.
ASIC LabYi-Ping, Yeh
2012/5/22
Time Cost Analysis
20202020
Table III. FPGA Implementation Results of the
Proposed Hardware Accelerator
Device Xilinx Virtex4
Word Length (Input, coefficient,
multiplier, accumulator, and output)
21 / 24 / 24 / 32 /
32 (bits)
# of Slices and # of Slice Flip Flops 1197 and 390
# of 4 input LUT and # of bonded IOBs 2202 and 213
Coefficient-ROM Size 12824 bits
Clock Rate (MHz) 95.612
Time Cost Per Transform 343 μs (N=512)
PSNR values of MDST and IMDST (dB) 79.55 and 79.76
AC-3 Standard (5.1 channel)
Sampling Rate 32k、44.1k、48k Hz
Window Length 256、512
Our Implementation Analysis
Cpt. C M2/2-M/2 = 32512 (clock cycle / frame)
Time Cost Per
TransformCpt. C x 1/Clock Rate = 343 μs
Real Time
Cpt. C x Sampling RateMAX x 2overlap & add x Channel #
= Clock Rate x NMAX
Our Clock Rate ≒ 95MHz > Real Time Clock Rate ≒ 31MHz
‧ “Digital Audio Compression Standard (AC–3, E–AC–3), Revision B”, Document A/52B of Advanced Television Systems Committee (ATSC), Washington D. C., June 2005
ASIC LabYi-Ping, Yeh
2012/5/22
Outline
• Introduction
• Proposed Algorithm Derivation and Its
Architecture
• Discussion and Comparison
• Conclusion
21212121
ASIC LabYi-Ping, Yeh
2012/5/22
Conclusion
• The Proposed Algorithm
– recursive-structure
– hardware-efficient filterbank design
– The proposed algorithm can be derived to obtain a common computation, and then converted into the Modified IDST-II.
• The Implementation Results
– directly employ the same hardware accelerator to calculate the forward and inverse MDST
– greatly reduce the hardware costs
22222222
ASIC LabYi-Ping, Yeh
2012/5/22
Reference
[11] V. Nikolajevic and G. Fettweis, “New recursive algorithms for the unified forward
and inverse MDCT/MDST,” Journal of VLSI Signal Processing Systems for Signal,
Image and Video Technology, Vol. 3, pp. 203-208, July 2003.
[12] R. Koenig, T. Stripf, J. Becker, “A novel recursive algorithm for bit-efficient
realization of arbitrary length inverse modified cosine transforms”, Proceedings of
Design, Automation and Test in Europe(DATE2008), Munich Germany, March 10-14,
2008, pp. 604–609.
[14] P. Jain, B. Kumar, S. B. Jain, “Unified recursive structure for forward and inverse
modified DCT/DST/DHT”, IETE Journal of Research, Vol. 55, No. 4, pp. 180-191, Jul-
Aug. 2009.
23232323
ASIC LabYi-Ping, Yeh
2012/5/22
Summary
• Proposed Algorithm and Implementation
– Using a “ Common Structure ” compute both
MDST and IMDST
– Reduce “ half number of Multiplications ” for
computing coefficients
– Critical path ↓, DTPT ↑, lower Cpt. C
24242424
Costs
Resource
Reuse
Speed
Email : [email protected]
Lab Web : http://140.116.216.48/
Thanks – Application:AC-3, E-AC-3