Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Software Defined Radio -How to implement the outer modem ?
„Entwurf einer dynamisch rekonfigurierbaren Plattform für Kanalcodierung zukünftiger Mobilfunksysteme“
Timo VogtNorbert Wehn
Microelectronic System Design Research GroupUniversity of Kaiserslautern
www.eit.uni-kl.de/wehn
Schwerpunktprogramm 1148„Rekonfigurierbare Rechensysteme“
Anschlusskolloquium der zweiten Förderperiode in Lübeck
2
Project Overview
Phase 1 (2005-2007)Implementation of dynamically reconfigurable decoder fortrellis based decoding algorithms - FlexiTrePFirst studies on High-Throughput
Phase 2 (2007-2009)Optimization of FlexiTreP architectureSilicon Implementation of FlexiTreP (65nm technology)Enhancement of Platform for flexible LDPC decodingHigh-Throughput (e.g. dynam. Reconfigurable Multiprocessor Architecture)Consideration of Reliability Issues in the platform design
2
3
Software Defined Radio (SDR)
Mobile communications systemsFlexibility
– Multi-mode (e.g. Uplink UMTS, Downlink DVB-H), Multi-channel– Adaptivity („cognitive radio“)
Energy efficiencyFlexibility/Cost Trade-off
Programmable Architectures (SDR): SIMD/Vector EnginesSandblaster SB3011 Platform (Sandbridge)Music Architecture (Infineon)OnDSP (NXP)Samira (Univ. Dresden)SODA (ARM, Univ. Michigan)….
Dynamically reconfigurable ArchitecturesPleiades Wireless Reconfigurable Processor Architecture (Berkeley)ADRES Architecture (IMEC)
– Factor 5 smaller area than Music architecture– Factor 10 higher energy efficiency than DSP
…
4
Filtering: suppress signals outside frequency band
Modulation: map source information onto signal waveforms
Channel Estimation: Estimate channel condition for transceivers
Error Correction: correct errors induced by noisy channel
Interleaver Channelencoder
deinteleaverChanneldecoder
(turbo/viterbi)
Source: Scott Mahlke / MPSoC‘06
UMTS/W-CDMA Physical Layer
3
5
UMTS Physical Layer (2Mbps)
54022001823073115239
SODAMcycles/s
50100Turbo Encoder32.417500Turbo Decoder
132.526500Searcher21.43900FIR(Rx)25.77900FIR (Tx)33.3100Combiner327.33600Despreader60300Spreader1132600Descrambler26.6240Scrambler
Speed-upFactor
GPPMcycles/s
Algorithm
General Purpose Processor Superscalar ArchitectureSignal-processing On-Demand Architecture (SODA)
SIMD pipeline with 32 16bit datapathsScalar pipeline (one 16bit datapath)AGU (Address generation unit)
Quelle: „SODA: A Low-powerArchitecture For Software Radio“,T. Mudge et al, ARMISCA´06
6
ADRES Architecture (IMEC)
VLIW processor tightly coupled withreconfigurable array
1D VLIW processor (controlflow)2D VLIW Reconfigurablematrix (dataflow kernel)
Reconfiguration via config. RAM
4
7
Channel Coding Law: doubling Complexitiy every 15 months
8
Standards/Flexibility
…14.4 Mbps40-511481/2…3/4bTCHSDPA
...32 Mbps(broadcast)641/4...7/8CCDVB-T/H
…64kbps...2608161/2 bTCInmarsat
>100Mbps...~2500-1/2...3/4LDPC
IEEE802.16(WiMax) ...54 Mbps... 64881/2...3/4 dbTC
2562/3CC
...2040
1…4095
...1944
1…4095
378...20736
1-744
40-5114
1-504
39...870
33...876
Blocksizes Throughput*StatesRatesCodesStandard
...54 Mbps641/2...7/8CC
6...54 Mbps641/2,9/16,3/4CCHiperlan
...450 Mbps-1/2...5/6LDPC
6...54 Mbps641/2...3/4CCIEEE802.11
(WLAN)
...2 Mbps81/2...1/5bTC
...38 kbps2561/2...1/6CCCDMA-2k
...2 Mbps81/3bTC
...32 kbps2561/2,1/3CCUMTS
5...62 kbps646/7,1/3CCEDGE
...12 kbps16, 643/4...1/4CCGSM
* throughput/channel
5
9
Implementation Approach
Channel decoding algorithms
Complex iterative algorithms: control and dataflow
Calculations are not the bottleneck (log domain)
Data management (bandwidth/routing/storage) is key
SIMD/Vector architectures for inner modem are not suited
Basic Architecture
Application specific instruction set processors (ASIP)
Efficient support of control and dataflow
Design time e.g. Tensilica, LISATek, ARC
Processors with dynamically reconfigurable Hardware
– Loose coupling of processor with an FPGA
– Reconfigurable array as functional unit in pipeline (e.g. XiRISC)
10
FlexiTreP: Flexible Trellis Processor
Exploit programmability
Simple programming model
Decoding algorithms e.g. Log-MAP, Viterbi (control flow)
Exploit hardware reconfigurability (data management)
Fast context switching
Multi context instructions: simplifies instructions & reduces programmsize
(similar to ADRES architecutre)
Partially dynamically reconfigurable ASIP
Specific application: application knowledge is key
Full ASIP approach i.e. no predefined configurable pipeline template
„Just enough flexibility“: energy efficiency
Assembler code
6
11
FlexiTreP Features
Supports all trellis-based decoding techniques in current standards
Binary Turbo DecodingConstraint length between 3 and 5Arbitrary generator and feedback polynomialsRates down to 1/7Interleaver table loadable
Doubinary Turbo DecodingConstraint lengths 4 and 5Arbitrary generator and feedback polynomialsRates down to 1/3
Binary MAP and Viterbi decodingConstraint length between 5 and 9Arbitrary generator and feedback polynomialsRates down to 1/4
12
FlexiTreP configured for Turbo Decoding
Channel code structure specified in DRCCCNSC/RSC, constraint length, BMU/Butterfly assignment… (red boxes)
Decoding algorithm programmable in Software
7
13
DRCCC
Dynamic reconfigurable channel code control (DRCCC)LUT table
Controlls dataflow in datapath pipeline
Controlls address- and datawidth of memories
Several configurations can be stored for fast context switch(Shadow LUT table)
Each configuration memory contains 383 bits
Simplifies programming and reduces instruction lengthMulti context processing24 bits instead of 68 bits
Saves power, area and improves throughput
14
Multi context processing example
Operand routing for ACS recursion varies for partial parallel processingE.g. 64 states: 16 states processed concurrently -> 4 steps
Control for operand shuffling is stored in DRCCC for each trellis-stepInstructions only specify trellis-stepThe proper context is loaded into the pipeline
Butterfly calculation (ACS recursion)
8
15
FlexiTreP configured for Viterbi decoding
16
Assembler code examples
MAP (2 Windows, blocklength 20)
Reconf….RPT ->STD_WIN #2
ldSMR (0)RPT ->FW2 #10fwdrec 3,3,1fwdrec 3,3,1FW2:modCVA 57modRDA 19ldSMR (127)RPT ->AQ2 #10bwdacq -3,3bwdacq -3,3AQ2:bwdrecllr (19)RPT ->BW2 #9bwdrecllr -1bwdrecllr -1BW2:bwdrecllr (0)
STD_WIN:
64 State Viterbi (blocklength 26)
ReconfRPT ->LOOP_END #26
ldSMR -3,1VA1 3,3,+4ldSMR -1,-1VA2 +2ldSMR -4,-1VA3 +5ldSMR -4,1VA4 +5
LOOP_END:RPT ->TB_END #13
VATBVATB
TB_END:
Each command 1 clock cycleZero-overhead loop control
9
17
Synthesis and Performance Results
Synthesis with 65nm low power standard cell libraryArea : 73Kgates (~ 0.15mm2 )400 MHz clock frequencyCC-Throughput ~ 190 Mbps @ Kc=5/16 states (w/o IO)
~ 40 Mbps @ Kc=7/64 states (w/o IO)~ 10 Mbps @ Kc=9/258 states (w/o IO)
TC-Throughput up to 19 Mbps @ 5 iterations (w/o IO)
UMTS TC ComparisonXiRisc : ~ 0.1 Mbps @ 100MHz @ 130 nmOptimized Tensilica : ~ 0.4 Mbps @ 133MHz @ 104 Kgates @ 180 nmENST ASIP : ~ 4.4 Mbps @ 335MHz @ 93Kgates @ 90 nm
Comparison SODA <-> FlexiTrePUMTS (2Mbps service, Turbo) : 540MIPS <-> 50MIPSWLAN (24Mbps service, Kc=7, Viterbi) : 398MIPS <-> 240MIPS
18
Area Results/Memories
Logic: 73 Kgates
Memories (for full support W-CDMA/UMTS)Interleaver : (5120*13) [ ≈ 35.4 Kgates]Channel values : 2*(4096*12) [ ≈ 2 * 26.1 Kgates]Apriori : 4*(2048* 8) [ = 4 * 8.7 Kgates]LIFO : (128*48) [ ≈ 3.9 Kgates]State Metric (DP) : 2*(128*96) [ = 2 * 13.1 Kgates]Programm Mem : (512*24) [ = 6.5 Kgates]Total Memory : 159 Kgates
Logic and memories together less than 0.5mm2 @ 65nm technology
10
19
Reconfigurable FlexiTreP Array
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
dr-ASIP
RI F R R R I F
RI F R R R I F
RI F R R R I F
RI F R R R I F
2D mesh topology, “Dimension-Order Routing”, Input-queued router with two virtual channels
20
Standards/Flexibility
…14.4 Mbps40-511481/2…3/4bTCHSDPA
...32 Mbps(broadcast)641/4...7/8CCDVB-T/H
…64kbps...2608161/2 bTCInmarsat
>100Mbps...~2500-1/2...3/4LDPC
IEEE802.16(WiMax) ...54 Mbps... 64881/2...3/4 dbTC
2562/3CC
...2040
1-4095
...1944
1-4095
378...20736
1-744
40-5114
1-504
39...870
33...876
Blocksizes Throughput*StatesRatesCodesStandard
...54 Mbps641/2...7/8CC
6...54 Mbps641/2,9/16,3/4CCHiperlan
...450 Mbps-1/2...5/6LDPC
6...54 Mbps641/2...3/4CCIEEE802.11
(WLAN)
...2 Mbps81/2...1/5bTC
...38 kbps2561/2...1/6CCCDMA-2k
...2 Mbps81/3bTC
...32 kbps2561/2,1/3CCUMTS
5...62 kbps646/7,1/3CCEDGE
...12 kbps16, 643/4...1/4CCGSM
* throughput/channel
11
21
LDPC Architecture Overview
CombinedLayeredSingle PhaseTwo-Phase +PN branch
Two-Phase
Permutation Network Π
Perm
utat
ion
RAMC
ontro
ller
CNB/VNB
CNB/VNBA
ddre
ss R
AM
CNB VNB
Cha
nnel
RA
M
IN M
sg R
AM
CNPVNP
ZigZag Network
Permutation Network Π
Add
ress
RA
MP
erm
utat
ion
RA
MCon
trolle
r
CNB VNB
Cha
nnel
RA
M
CNB/VNB
CNB/VNB
IN M
sg R
AM
CNPVNP
PN
Msg
RA
M
Permutation Network Π
Add
ress
RA
MP
erm
utat
ion
RA
M
CNP CNP
Con
trolle
r
VNB
Permutation Network Π−1
Cha
nnel
RA
M
Sum
RA
M 1
Sum
RA
M 2
+
+
VNB VNB
Msg
RA
M
-
CNP
CNB
Permutation Network Π
Add
ress
RA
MP
erm
utat
ion
RA
M
Cha
nnel
RA
M
CNP
FIFO
Msg
RA
M
+
+ -
CNB CNBC
ontro
ller C
hann
el R
AM
Cha
nnel
RA
M
Permutation Network Π
Add
ress
RA
MP
erm
utat
ion
RA
M
CNP CNP
Con
trolle
r
VNB
Permutation Network Π−1
Cha
nnel
RA
M
Sum
RA
M 1
Sum
RA
M 2
+
+
VNB VNB
CNB
CNP
FIFO
Msg
RAM
+
+ -
Algorithms/implementation complexity strongly depends onCode structure, code rates, flexibility
Data management problem even worse than trellis-based decodersFactor 5 compared to TC
Memory Cuts determine area and throughputLarge complexity e.g. 802.11n WiFi Standard ~ 500 Kgates
22
UKL LDPC Decoder Implementations
PN branch
725-2025-2050-15Max. Iterations
0.14-0.70
274 Mbps / mm2
6.0-5.8 µs
54-281 Mbps
1.023
0.467
0.065
0.395
0.096
1-phase
27-81
1/2-5/6
648, 1296, 1944
WiFi(802.11n)
3.080.12-0.830.58-6.700.15-1.77Infobit/Cycle
3.2 Gbps / mm2
4.4 µs
1.63 Gbps
0.504
0.265
0.027
0.212
0
@ 528 MHz
Layered
MinSum+MSF/Lay.
80
3/4
9600
U-S LDPC
250 Mbps / mm2
6.0-5.7 µs
48-333 Mbps
1.337
0.551
0.206
0.470
0.110
Combined
24-96
1/2-5/6
576-2304
WiMax (802.16e)
69-21 µs270-82 µsLatency
430 Mbps / mm2183 Mbps / mm2
Max. Efficiency
0.23-2.68 Gbps
6.115
4.428
0.270
1.200
0.217
360
60-708 MbpsNet Throughput
3.861Overall Area
3.357Memory
0.046Network
0.328CNP
0.130VNP
Area [mm2] 65nm @ 400 MHz
1-phaseArchitecture
3-MinAlgorithm
6 bitQuantization
90Parallelism
1/4-9/10Code Rate
64800Codeword Size
DVB-S2LDPC Code
12
23
Conclusion
ASIP for Trellis based Decoding
Combination of application specific IS programmability with dynamichardware reconfigurability provides very good trade-off between flexibilityand performance
Fabrication on 65nm Technology in 2007
– October 2007
– Energy measurements
Multiprocessor solution for High-Throughput CC and TC decoding
Flexible LDPC decoder implementation is the next big challenge
Efficient memory sharing
E.g. different architectures dynamically reloadable
Consideration of reliability issues in the platform design
24
Publications and Cooperations
Publications
A Reconfigurable Outer Modem Platform for Future Communications Systems T. Vogt, C. Neeb and N. Wehn. Dagstuhl Seminar "Dynamically ReconfigurableArchitectures" Dagstuhl Seminar Proceedings 06141, April 2006, Dagstuhl, Germany. A Reconfigurable Multi-Processor Platform for Convolutional and Turbo Decoding T. Vogt, C. Neeb and N. Wehn. Reconfigurable Communication-centric SoCs(ReCoSoC) 2006, Montpellier, France.Channel Decoding in Software Defined Radio N. Wehn. MPSoC 2006, August 2006, Estes Park, Colorado, USA. A Reconfigurable Application Specific Instruction Set Processor for Viterbiand Log-MAP Decoding T. Vogt and N. Wehn. IEEE Workshop on Signal Processing(SIPS'06), pages 142-147, October 2006, Banff, Canada.
CooperationsMiniworkshop „Applikationen und Compiler für grobgranularerekonfigurierbare Architekturen“ (Prof. Becker, Prof. Rosenstiel)Prof. Dr. J. Teich (Universität Erlangen-Nürnberg)
13
Thanks for your Attention
Schwerpunktprogramm 1148„Rekonfigurierbare Rechensysteme“
Anschlusskolloquium der zweiten Förderperiode in Lübeck