GPU-Accelerated SDR Implementation of a Multi-User Detector for ...€¦ · In this session a novel GPU-based Software Defined Radio $SDR$ implementation of a Multi-User Detector

$Page 1: GPU-Accelerated SDR Implementation of a Multi-User Detector for ...€¦ · In this session a novel GPU-based Software Defined Radio $SDR$ implementation of a Multi-User Detector$
GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

> GTC 2014 > Chen Tang > 03.2014 DLR.de • Chart 1

Chen Tang [email protected]

Institute of Communication and Navigation

German Aerospace Center

Preamble


• German Aerospace Center • National aeronautics and space research center of Germany

Preamble


• German Aerospace Center • National aeronautics and space research center of Germany • Wide range of R&D projects in national and international partnerships

• DLR & NASA operate the flying infrared telescope SOFIA • DLR operates/coordinate the Columbus (European lab module on ISS) • Galileo satellite navigation system

• The work presented here has been developed in the scope of

NEXT (Network Coding Satellite Experiment) project funded by German Space Agency

that paved the way to the GEO research communication satellite H2Sat (2017)

H2Sat: explore and test new broadband (high data rate) satellite communication


Preamble

Overview

• What Problems? • Introduction and Motivation

• How to Solve?

• Multi-User Detection (MUD) System Design • GPU-accelerated SDR Implementation of MUD

• Result and Outlook


Overview


• How to Solve?




Introduction and Motivation


• Unidirectional satellite broadcast service



• Bidirectional satellite communication • Forward link • Return link • e.g. internet over satellite;

interactive satellite TV services

• Multi-user access issue











• Multi-access schemes: Time Division Multiple Access

TDMA

f

t






• Multi-access schemes: Frequency Division Multiple Access

FDMA

f

t






• Scarcity and high cost of satellite frequency spectrum (millions of dollars) • How to improve spectrum efficiency?

• Multi-User Detection (MUD)

MF-TDMA (e.g. DVB-RCS)

f

t

Overview


• How to Solve?





Multi-User Detection (MUD) System

• Multiple users transmit at the same frequency and time

• A transparent satellite return link

• Main objectives: • Develop a MUD receiver • Increase decoding throughput real-time processing

• Multiuser Detection (MUD) • Increase spectrum efficiency • Few practical MUD implementations for satellite systems

• High complexity • Sensitive to synchronization and channel estimation errors


MUD System Design

• Successive Interference Cancellation (SIC) • Sequentially decode users & cancel interference

• Linear complexity on number of users

• Straightforward extension to support more users

p

f

user 1 user 2

transmit user 2 for “free”

Overview


• How to Solve?





MUD System Design

• SDR = Software Defined Radio • Components (e.g. filter, amplifier, modulator etc.) in a communication system are implemented via software • Benefits vs hardware-based devices:

• Flexible to change • Lower cost • Shorter development time

• Drawback vs hardware-based devices:

• Low processing power

• Programmable radio devices

• DSP (Digital Signal Processor) • FPGA (Field Programmable Gate Arrays) • SoC (Programmable System on Chip) • GPGPU (General-Purpose GPU)


SDR

• Restriction of FPGA-based SDR • Long development time and complexity • No standardized protocols, interfaces or architectures less portable

• Nvidia CUDA GPU-based SDR • High performance


GPU-based SDR

Nvidia Tesla c2070: 448 cores; 515 GFLOPs of double-precision peak performance

• Restriction of FPGA-based SDR • Long development time and complexity • No standardized protocols, interfaces or architectures less portable

• Nvidia CUDA GPU-based SDR • High performance • Less effort to develop • Unified architecture more portable


GPU-based SDR

Ref: GPU vs FPGA for high productivity computing, 2010 (David H. Jones, A. Powell, C. Bouganis, Peter Y.K. Cheung)

GPU: Nvidia GTX285 HC1: 5 x Virtex-5 FPGA


MUD System Design

• Real-time implementation of MUD is challenging • 𝑇𝑑𝑑𝑑 ≤ 𝑇𝑓𝑓𝑓𝑓𝑑

• Processing bottlenecks:

• LDPC channel decoding • EM channel estimation • Resampling and interference cancellation

LDPC

U1: n = 4800 k = 3200

𝐶𝑗 → 𝑉𝑖

C 1 C 2 C 3 C n - k

V 1 V 2 V 3 V 4 V n

… ...

… ... 𝑉𝑖 → 𝐶𝑗

U2: n = 4800 k = 2400


GPU-based MUD

Processing bottlenecks To be accelerated by GPU

LDPC Channel Decoding 4800 nodes to be processed iteratively

EM Channel Estimation Thousands-points FFT iteratively

Interference Cancellation Resampling, thousands-points FFT

MUD receiver on GPU


• Processing bottlenecks: • LDPC channel decoding • EM channel estimation • Resampling and interference cancellation • Data transfer between host and device memory

(144GB/s of Nvidia Tesla vs. 8GB/s of PCIe*16)

• All parts of each single user receiver and interference cancellation on GPU

• Minimize the latency of intermediate data transfer between host and device memory

GPUCPU

GPUCPU

GPUCPU

GPUCPU

Overview


• How to Solve?





Simulation Setup

• GPU Nvidia Tesla c2070 (1.15GHz, CUDA compatibility: 2.0) • Comparison benchmark: Intel Xeon CPU E5620 (2.4GHz)

• Channel coding: LDPC

• Irregular Repeat Accumulate • Blocklength: 4800 bits • U1 coderate: 2/3 , U2 coderate: 1/2

• Baud-rate: 62500 symbols/second real-time decoding threshold: ca. 85ms (66 kbps)


Simulation Result

Comparison of total processing time of MUD between CPU and GPU


Simulation Result

Comparison of total processing time of MUD between CPU and GPU


Simulation Result

Real-time threshold


Summary

• SDR implementation of MUD receiver • High flexibility and low cost • Extension to support more users

• GPU acceleration • 1.8x ~ 3.8x faster than the real-time decoding threshold • Still space to improve • New GPU better performance

• GPU CUDA is very promising for powerful parallel computing • Low learning curve • Heterogeneous: mixed serial-parallel programming • Scalable

• Days/weeks of simulation hours

Thank you very much! Q&A


Documents

GPU-Accelerated SDR Implementation of a Multi-User Detector for ...€¦ · In this session a novel GPU-based Software Defined Radio \(SDR\) implementation of a Multi-User Detector