Upload
hannah-atkins
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1/21
1,2M. I. Taj, 2O. Hammami and 1M. Akil 1ESIEE, Noisy Le Grand Cedex, France. {[email protected]}
2ENSTA, Bvd.Victor, Paris, France {[email protected]} December, 14, 2010
SDR waveform components implementation on single FPGA Multiprocessor Platform
2/21
Talk Overview
Motivation and Context
Major Contribution
Software Defined Radio (SDR) Platforms
Related Work
Embedded Multiprocessor platform and Processor
FFT and Viterbi Parallelization Strategy on MPSoC
Performance Evaluation Results
Conclusion
Motivation and Context
Software Defined Radio is entering the mainstream Challenges of Software Defined Radio (SDR) lie in the implementation of
compact embedded multiprocessor platform for wireless mobile terminals Performance requirements under resource constraints make SDR
implementation a challenging task Reconfigurable embedded mutiprocessor cores are increasingly being used
in implementing SDR solutions The two most important SDR waveform components:• FFT.• Viterbi Decoding This work is a step towards an efficient SDR waveform implementation
over the Microblaze multiprocessor environment, developed at ENSTA.
3/21
Major Contribution
This work addresses the mapping of FFT and Viterbi Decoding on embedded multiprocessor platform.
A significant speed-up has been achieved.
This shows that a whole wave-form including all the DSP algorithms can be more efficiently implemented on the addressed multiprocessor platform.
4/21
SDR Platforms-Introduction
Massive parallel SDR Baseband platforms with • ILP: Instruction Level Parallelism• DLP: Data Level Parallelism• MP: Multiprocessor ILP and DLP asks for well fit algorithms as their architecture and
associated compiler requirements results in difficult mapping of various advanced algorithms.
• Typical example is SODA This calls for MP parallelism, however there exists little effort in
employing SDR architecture on multiprocessor architecture.• Parallel Processing computing element => Multiprocessor FPGA platform
developed at ENSTA
5/21
Related Work
SDR performance has been evaluated in numerous platforms but very few efforts have been made to parallelize the SDR waveforms.
Single Multiprocessor efforts• 1999 => Reed and Cummings described the transition of FPGAs from ASICs to embedded
products, long ago.• 2008 => OSSIE Signal Processing Library functions are mapped on single processor
platform, Xilinx ML-403 board that is based on Virtex-4 FX FPGA to identify the functions to be optimized.
• 2008 => G. Abgrall estimated Latency, CPU load and memory utilization of OSSIE.
Parallelizing efforts• 2008 => M.Palkovic parallelized the baseband processing in a space division multiplexing
(SDM)-orthogonal frequency division multiplexing (OFDM) on a platform called ADRES.• 2008 => D. Cabric addresses feasibility of Cognitive Radios by implementing five Spectrum
Sensing Algorithms on Berkeley Emulation Engine (BEE) that consists of five Xilinx V-2 FPGAs with each FPGA embedded a Power PC 405.
• 2009=> We parallelized OSSIE SigProc Filter functions over the same platform.
6/21
SDR waveform design
SDR Wavefrom Components:• Filter Functions• Algebraic Functions• Modulation/Demodulation Functions.
Advance Functionality => Massive intercommunication.
General purpose hardware.• Strong emergence of MPSoC.• actual prototypes ( not simulation )
7/21
SDR waveform Communication Architecture
FFT.• Multi-resolution spectrum
sensing. Viterbi Decoding• Efficient bandwidth
utilization.
NoC based MPSoC, programming and evaluation platform
8/21
Alpha-Data Board Architecture
NoC Based Multiprocessor SoC
Xilinx Virtex4FX140
4 Banks DDR2 (1 Gb)
Micoblaze v. 6.00. b
Local memory 32 kB
Embedded Processing Element
Microblaze softcore V 7.00b Instruction side Local Memory Bus (ILMB)Data side Local Memory Bus (DLMB) ILMB controllerDLMB controllerBRAM for Local Memory (32KBytes) Fast simplex link (FSL) OCP-IP AdapterOptinal:Timer, local memory, PCI express, UART
MicroBlaze 32bits RISC Softcore processor harvard architectureHighly configurableInterfacesPLB, OPB, LMB, Upto 16 FSL
Microblaze Core Block Diagram
Microblaze based Processing Element
9/21
Parallelization Primitives and their usage
10/21
Syn_start_work( )• Give slaves the command to start to calculate the respective coefficients
Syn_work_finished( )• Check if slaves have finished calculating their share of coefficients.
Syn_wait_for_start ( ) • Each slave waits for master to give this signal before starting and followed by this signal, it starts its assigned calculations.
barrier ( )• Once each slave is finished its assigned filter coefficient calculation, it uses this function to tell master that slave have finished their assigned tasks. After all the slaves finish executing this function, then master captures the number of clock cycles.
FFT Parallelization Description - 1
Radix-2 DIT Algorithm.
FFT => Two phases:
1. Step 0 to step log (n/p)-1
2. Step log (n/p) to step log (n-1)• n => number of transformed points of DFT.
=> divided into p consecutive subsequences.• No intercommunication in first phase.• In second phase: p/2 PEs compute in parallel.• Each PE computes 2n/p points. • One step => One data exchange.• Each PE computes 2n/p points with its peer’s help.
11/21
FFT Parallelization Description - 2
12/21Phase 1 of FFT
FFT Parallelization Description - 3
13/21Phase 2 of FFT
Viterbi Decoding Parallelization Description - 1
14/21
Two passes:• Forward Pass => Parallel Implementation • Backward pass.
Forward Pass => Likelihood calculation over all possible states.
• Most likely predecessor opt_J = argmax(J) log{[prob. system is state J at t-1]+[prob. Of transition from J to I]}
• Update probability of I.
prob I= log{[prob of opt_J at t-1]+transition prob from opt_J to I]+ emission prob.state I gave rise to obs at t]}
• Bookkeeping.
Viterbi Decoding Parallelization Description - 2
15/21Forward pass of Viterbi
decoding
Performance Evaluation Results on MPSoC - 1
16/21
Fast Fourier TransformTest Mode Number of clock
cyclesSpeed up
Single processor 2,087,142 -4 processors 668956 3.128 processors 348438 5.99
16 processors 187,693 11.12
Viterbi DecodingTest Mode Number of clock
cyclesSpeed up
Single processor 1667911 -4 processors 483452 3.458 processors 277522 6.01
16 processors 145287 11.48
Speed-up for Viterbi Decoding Multi-processor platform.
Speed-up for FFT using Multi-processor platform.
Performance Evaluation Results on MPSoC - 2
17/21
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12 14 16
Number of processors
Sp
eed
up
Viterbi Decoding
FFT
Number of Processors
Spe
ed u
p
Speed up is evaluated by the Master PE.
Different slave processors access the DDR via NoC after they finish their share of calculation.
Master and Slaves invoke appropriate synchronization primitives.
Speed-up for FFT and Viterbi Decoding using MPSoC
Conclusion
Multicore and embedded multiprocessor architectures are strongly emerging for SDR applications.
FFT and Viterbi Decoding are the most important components of SDR waveform.
This work enhanced our SDR waveform functionality by optimized parallel implementation of the two most important SDR waveform components.
Speed-up achieved answers the ITRS Roadmap prediction for SDR.
18/21
19/21
Thank You
Questions ???
20/21
References-1
1. www.wirelessinnovation.org
2. http://www.itrs.net
3. M.I. Taj, O.Hammami and K. Huggins“Performance Evaluation of SDR on embedded platform: The case of OSSIE” The 2nd IEEE IC-4 Best Paper Award
4. M.I. Taj, K. Huggins and O.Hammami “OSSIE Signal Processing Functions Performance Enhancements through Parallelization in an embedded multiprocessor architecture” 2009 Software Defined Radio Technical Conference and Product Exposition, Madrid, Spain, April,2009
5. R.J.Lackey and D.W.Upmal,”SPEAKeasy:The Military Software Radio”, IEEE Commun.Mag. vol 33, no.5, May 1995, pp.56-61.
6. M.Woh, Y.Lin, S.Seo,S.Mahle, T.Mudge and C.Chakrabarti,”From SODA to Scotch: The Evolution of a Wireless Baseband Processor”, 41st IEEE/ACM International Symposium on Micoarchitecture 8-12 Nov.2008. Page(s) 152-163: Location: Como, Italy.
7. J.Glossner, D.Iancu, M.Moudgill, G.Nacer, S.Jinturkar, S.Stanley and M.Schulte, “The Sandbridge SB3011 Platform”, EURASIP Journal on Embedded Systems, Volume 2007 , Issue 1 (January 2007)
8. B.Bougard, B.D.Sutter, S.Rabou, D.Novo, O.Allam, S.Dupont, L.V.derPerre, “A Coarse-Grained Array based Baseband Processor for 100 Mbps+Software Defined Radio”,Design, Automation and Test in Europe,March10-14, 2008. Munich, Germany.
9. M.Palkovic, H.Cappelle, M. Bougard, L.Van der Perre, “Mapping of 40 MHz MIMO SDM-OFDM Baseband Processing on Multi-Processor SDR Platform”, The 11th workshop on Design and Diagnostics of Electronic Circuits and Systems, 16-18 April, 2008
10. B.Mei, S.Vernalde, D.Verkest,H.De Man, R.Lauwereins, “ADRES: an architecture with tightly coupled VLIW processor and coarse-grained configurable matrix”, Proc IEEE Conf. on Field Programmable Logic and its Applications (FPL), Lisbon, Portugal, pp.61-70, Sep, 2003.
21/21
References-2
11. A.N.Choudhary,S.Das, N. Ahuja and H. Patel”A Reconfigurable and Hierarchical Parallel Processing Architecture: Performance Results for Stereo Vision” 10th International Conference on Pattern Recognition, 1990. Proceedings Volume ii, Issue , 16-21 Jun 1990 Page(s):389 - 393 vol.2.
12. J.H. Bahn, J. Yang and N. Bagherzadeh,”Parallel FFT Algorithms on Network-on-Chips”, Fifth International Conference on Information Technology: New Generations Pages.1087-1093. Year of publication:2008. ISBN:978-0-7965-3099-4
13. A.H.Kamalizad, C.Pan and N. Bagherzadeh,”Fast Parallel FFT on a Reconfigurable Computation Platform” 15th Symposium on Computer Architecture and High Performance Computing, 2003. Proceedings. Volume , Issue , 10-12 Nov. 2003 Page(s): 254 - 259
14. G.Zhong, F.Xu and A.N.Wilson,”A Power-Scalable Reconfigurable FFT/IFFT IC Based on a Multi-Processor Ring”, IEEE Journal of Solid State Circuits, Vol. 41. No.2, February, 2006. Page(s):483-495, ISSN:0018-9200
15. Xilinx Virtex-4 http://www.xilinx.com/products/silicon_solutions/fpgas/virtex/virtex4/index.htm
16. OCP-IPOpenCoreProtocolSpecification2.2.pdf http://www.ocpip.org/home, 2008
17. Microblaze processor reference guide. Xilinx user guide 081 (v.7.0)
18. Arteris S.A http://www.arteris.com/
19. J.S. Reeve, K.Amarasingh “A parallel Viterbi decoder for Block Cyclic and Convolution codes” IEEE Signal Processing, 2005.Page(s) 273-278.
20. M.Monchiero, G.Palermo, C.Silvano, O.Villa, Exploration of distributed shared memory architectures for NoC-based multiprocessors, Journal of Systems Architecture, Volume 53, Issue 10, October 2007, Pages 719-732.