Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 1
Chen Tang [email protected]
Institute of Communication and Navigation
German Aerospace Center
Preamble
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 2
• German Aerospace Center • National aeronautics and space research center of Germany
Preamble
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 3
• German Aerospace Center • National aeronautics and space research center of Germany • Wide range of R&D projects in national and international partnerships
• DLR & NASA operate the flying infrared telescope SOFIA • DLR operates/coordinate the Columbus (European lab module on ISS) • Galileo satellite navigation system
• The work presented here has been developed in the scope of
NEXT (Network Coding Satellite Experiment) project funded by German Space Agency
that paved the way to the GEO research communication satellite H2Sat (2017)
H2Sat: explore and test new broadband (high data rate) satellite communication
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 4
Preamble
Overview
• What Problems? • Introduction and Motivation
• How to Solve?
• Multi-User Detection (MUD) System Design • GPU-accelerated SDR Implementation of MUD
• Result and Outlook
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 5
Overview
• What Problems? • Introduction and Motivation
• How to Solve?
• Multi-User Detection (MUD) System Design • GPU-accelerated SDR Implementation of MUD
• Result and Outlook
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 6
Introduction and Motivation
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 7
• Unidirectional satellite broadcast service
Introduction and Motivation
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 8
• Bidirectional satellite communication • Forward link • Return link • e.g. internet over satellite;
interactive satellite TV services
• Multi-user access issue
Introduction and Motivation
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 9
• Bidirectional satellite communication • Forward link • Return link • e.g. internet over satellite;
interactive satellite TV services
• Multi-user access issue
Introduction and Motivation
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 10
• Bidirectional satellite communication • Forward link • Return link • e.g. internet over satellite;
interactive satellite TV services
• Multi-user access issue
• Multi-access schemes: Time Division Multiple Access
TDMA
f
t
Introduction and Motivation
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 11
• Bidirectional satellite communication • Forward link • Return link • e.g. internet over satellite;
interactive satellite TV services
• Multi-user access issue
• Multi-access schemes: Frequency Division Multiple Access
FDMA
f
t
Introduction and Motivation
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 12
• Bidirectional satellite communication • Forward link • Return link • e.g. internet over satellite;
interactive satellite TV services
• Multi-user access issue
• Scarcity and high cost of satellite frequency spectrum (millions of dollars) • How to improve spectrum efficiency?
• Multi-User Detection (MUD)
MF-TDMA (e.g. DVB-RCS)
f
t
Overview
• What Problems? • Introduction and Motivation
• How to Solve?
• Multi-User Detection (MUD) System Design • GPU-accelerated SDR Implementation of MUD
• Result and Outlook
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 13
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 14
Multi-User Detection (MUD) System
• Multiple users transmit at the same frequency and time
• A transparent satellite return link
• Main objectives: • Develop a MUD receiver • Increase decoding throughput real-time processing
• Multiuser Detection (MUD) • Increase spectrum efficiency • Few practical MUD implementations for satellite systems
• High complexity • Sensitive to synchronization and channel estimation errors
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 15
MUD System Design
• Successive Interference Cancellation (SIC) • Sequentially decode users & cancel interference
• Linear complexity on number of users
• Straightforward extension to support more users
p
f
user 1 user 2
transmit user 2 for “free”
Overview
• What Problems? • Introduction and Motivation
• How to Solve?
• Multi-User Detection (MUD) System Design • GPU-accelerated SDR Implementation of MUD
• Result and Outlook
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 16
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 17
MUD System Design
• SDR = Software Defined Radio • Components (e.g. filter, amplifier, modulator etc.) in a communication system are implemented via software • Benefits vs hardware-based devices:
• Flexible to change • Lower cost • Shorter development time
• Drawback vs hardware-based devices:
• Low processing power
• Programmable radio devices
• DSP (Digital Signal Processor) • FPGA (Field Programmable Gate Arrays) • SoC (Programmable System on Chip) • GPGPU (General-Purpose GPU)
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 18
SDR
• Restriction of FPGA-based SDR • Long development time and complexity • No standardized protocols, interfaces or architectures less portable
• Nvidia CUDA GPU-based SDR • High performance
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 19
GPU-based SDR
Nvidia Tesla c2070: 448 cores; 515 GFLOPs of double-precision peak performance
• Restriction of FPGA-based SDR • Long development time and complexity • No standardized protocols, interfaces or architectures less portable
• Nvidia CUDA GPU-based SDR • High performance • Less effort to develop • Unified architecture more portable
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 20
GPU-based SDR
Ref: GPU vs FPGA for high productivity computing, 2010 (David H. Jones, A. Powell, C. Bouganis, Peter Y.K. Cheung)
GPU: Nvidia GTX285 HC1: 5 x Virtex-5 FPGA
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 21
MUD System Design
• Real-time implementation of MUD is challenging • 𝑇𝑑𝑑𝑑 ≤ 𝑇𝑓𝑓𝑓𝑓𝑑
• Processing bottlenecks:
• LDPC channel decoding • EM channel estimation • Resampling and interference cancellation
LDPC
U1: n = 4800 k = 3200
𝐶𝑗 → 𝑉𝑖
C 1 C 2 C 3 C n - k
V 1 V 2 V 3 V 4 V n
… ...
… ... 𝑉𝑖 → 𝐶𝑗
U2: n = 4800 k = 2400
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 22
GPU-based MUD
Processing bottlenecks To be accelerated by GPU
LDPC Channel Decoding 4800 nodes to be processed iteratively
EM Channel Estimation Thousands-points FFT iteratively
Interference Cancellation Resampling, thousands-points FFT
MUD receiver on GPU
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 23
• Processing bottlenecks: • LDPC channel decoding • EM channel estimation • Resampling and interference cancellation • Data transfer between host and device memory
(144GB/s of Nvidia Tesla vs. 8GB/s of PCIe*16)
• All parts of each single user receiver and interference cancellation on GPU
• Minimize the latency of intermediate data transfer between host and device memory
GPUCPU
GPUCPU
GPUCPU
GPUCPU
Overview
• What Problems? • Introduction and Motivation
• How to Solve?
• Multi-User Detection (MUD) System Design • GPU-accelerated SDR Implementation of MUD
• Result and Outlook
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 24
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 25
Simulation Setup
• GPU Nvidia Tesla c2070 (1.15GHz, CUDA compatibility: 2.0) • Comparison benchmark: Intel Xeon CPU E5620 (2.4GHz)
• Channel coding: LDPC
• Irregular Repeat Accumulate • Blocklength: 4800 bits • U1 coderate: 2/3 , U2 coderate: 1/2
• Baud-rate: 62500 symbols/second real-time decoding threshold: ca. 85ms (66 kbps)
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 26
Simulation Result
Comparison of total processing time of MUD between CPU and GPU
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 27
Simulation Result
Comparison of total processing time of MUD between CPU and GPU
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 28
Simulation Result
Real-time threshold
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 29
Summary
• SDR implementation of MUD receiver • High flexibility and low cost • Extension to support more users
• GPU acceleration • 1.8x ~ 3.8x faster than the real-time decoding threshold • Still space to improve • New GPU better performance
• GPU CUDA is very promising for powerful parallel computing • Low learning curve • Heterogeneous: mixed serial-parallel programming • Scalable
• Days/weeks of simulation hours
Thank you very much! Q&A
> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 30