37
Embedded Electronics for Telecom DSP Aldebaro Klautau Embedded Systems Lab (LASSE) @ Federal Univ. of Pará (UFPA) V International Workshop on Trends in Optical Technologies (WTON) CPqD – Campinas – Brazil - May 19, 2016 UFPA

Embedded Electronics for Telecom DSP

  • Upload
    cpqd

  • View
    125

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Embedded Electronics for Telecom DSP

Embedded Electronics for Telecom DSP

Aldebaro Klautau

Embedded Systems Lab (LASSE) @ Federal Univ. of Pará (UFPA)

V International Workshop on Trends in Optical Technologies (WTON)

CPqD – Campinas – Brazil - May 19, 2016

UFPA

Page 2: Embedded Electronics for Telecom DSP

Goal and Agenda

Goal: discuss options for prototyping new physical layers (PHY) of DSP-based telecommunication systems

From the perspective of a digital signal processing R&D group that (furiously) targets the highest possible bit rates

No ASICs, but discrete components & development boards

Agenda Motivation: demand for increased bit rates

Options for prototyping: emphasis on DSP processor and FPGA

Examples of prototypes using the most from available hardware

May 19, 2016 Aldebaro Klautau 2

Page 3: Embedded Electronics for Telecom DSP

Bit-rate hungry applications

Optical transmission with flexible transceivers

Software-defined radios and 5G Architecture: Small cells and centralized-RAN PHY: Spectrum aggregation, massive MIMO, mmWaves

Example of 4G traffic: 4 signals with BW=20 MHz ~3.7 Gbps

In newer versions of LTE number of antennas can be 16 or 32 Bit rate = 15 Gbps or 30 Gbps

Aldebaro Klautau 3 May 19, 2016

Page 4: Embedded Electronics for Telecom DSP

Electronic components and associated development boards for prototyping

Aldebaro Klautau 4 May 19, 2016

Prototype

GPU DSP ASSP ASIC FPGA

Standard cells

Full custom IC

GPU: graphics processing unit ASSP: application specific standard product

Page 5: Embedded Electronics for Telecom DSP

Complete DMT transceiver development

FFT-based Discrete Multi-Tone (DMT) bitloading supporting up to 10 bits per tone (1024-QAM)

5

Bits per tone

Page 6: Embedded Electronics for Telecom DSP

For DMT task: a DSP processor (SoC) chosen as platform

Aldebaro Klautau 6

4 cores FFT coprocessors

Network coprocessor

Viterbi coprocessors

Page 7: Embedded Electronics for Telecom DSP

C language programming

Our main motivation: program in C language

Besides, free open source routines available. Example: Forward Error Correction (FEC)

But good performance required heavy optimization

Comparison of Reed-Solomon (RS) implementations, per codeword

7

Page 8: Embedded Electronics for Telecom DSP

Many routines to split among cores

Issues related to concurrency and parallelism

April 6, 2016 Aldebaro Klautau 8

Page 9: Embedded Electronics for Telecom DSP

Architectural split of functionalities among DSP cores

9

Page 10: Embedded Electronics for Telecom DSP

Significant effort to optimize code for the platform

April 6, 2016 Aldebaro Klautau 10

Level 1 - Compiler Optimizations Level 2 - Code Organization/Refactoring Level 3 - Architecture Optimization

Page 11: Embedded Electronics for Telecom DSP

From “programmable logic” to the “platform FPGA”

11

[Lyke, 2015]

May 19, 2016

evolution

Page 12: Embedded Electronics for Telecom DSP

FPGA boards support several interfaces and peripherals

Several FMC (FPGA mezzanine card) boards

PC interface: PCIe to FPGA (up to 30 Gbps) Commonly present in FPGA evaluation boards

Aldebaro Klautau 12

High speed ADC/DAC cards

8x SFP expansion card

General purpose

Page 13: Embedded Electronics for Telecom DSP

Prototyping with FPGAs

HDL (VHDL, Verilog, etc.) is more difficult than C and most engineers are exposed to “programmable” logic (digital electronics) but not digital signal processing on FPGAs and parallel programming

Go for DSP “general-purpose” chips?

Note that multicore alternatives also require good skills on concurrent and parallel programming and often a profound knowledge of the chip architecture

Changing the DSP chip manufacturer requires studying the new architecture while FPGAs are more “generic”

FPGAs are more natural step towards silicon / ASIC than using DSP chips

Aldebaro Klautau 13

Page 14: Embedded Electronics for Telecom DSP

ADC trends

Photonic ADCs

Undersampling : signals sampled below their Nyquist rates

Compressive sampling E.g. Bayesian approach

May 23, 2016 Aldebaro Klautau 14

[Khilo, 2012]

Limits on ENOB (effective number of bits) due to Jitter

ADCs up to 2007

Darker blue: ADCs later than 2007

Page 15: Embedded Electronics for Telecom DSP

Some DAC performance numbers

Summary: DACs and AWGs (arbitrary waveform generators), together with ADCs and DSOs (digital storage oscilloscopes) operating at ~100 GSa/s

Hence, the computing platform (DSP, FPGA, ASSP, etc.) may be the bottleneck! 15

bits BW (GHz) Fs (Gsa/s) ENOB

Micram DAC-4 6 42 100 -

Micram DAC-3 6 23.8 72 4.5

Micram DACII 6 20 34 4

[Nagatani, 2011] 6 - 60 -

[Huang, 2014] 8 10 100 5.3

Page 16: Embedded Electronics for Telecom DSP

“Design gap” does not help those aiming at bit rate records

“Gap”: FPGA has enough capacity to accomodate most of the ASIC designs

But achieving symbol rates of tens of Gbauds is hard for a real-time transmitter implementation and often impossible for a receiver

Aldebaro Klautau 16

[Trimberger, 2015]

May 19, 2016

Page 17: Embedded Electronics for Telecom DSP

Architectures for PHY testbeds and demonstrations

Offline processing Both transmitter (Tx) and receiver (Rx) processing are performed offline

Often FPGA-based

Transmitter: samples are pre-computed, stored at e.g. FPGA memory and sent to channel via fast DAC

Receiver: fast digital storage oscilloscope (DSO) digitizes received signal

Real-time receiver processing Often based on ASICs or ASSPs

Real-time transmitter processing May use FPGA with internal PRBS generation to avoid “slow” interface to PC

Aldebaro Klautau 17 May 19, 2016

Page 18: Embedded Electronics for Telecom DSP

State of art offline processing example

1.125 Tb/s 15-carrier super-channel

Two DACs at 32 GSa/s (oversampling of 4 samples/symbol)

DSO with 62.5 GSa/s using two interleaved 33 GSa/s ADCs

Aldebaro Klautau 18 May 19, 2016

[Maher, 2016]

Page 19: Embedded Electronics for Telecom DSP

State of art Tx + Rx real-time processing example

[Eiselt, 2016] “First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of SSMF at 1550 nm”

ASIC chips

Extra info: 8 x 25.78125 GBaud signals, PAM-4, 100 km; 𝜆 = 1550 𝑛𝑚

19

Page 20: Embedded Electronics for Telecom DSP

Real-time transmitter processing example

Implementation by Ilan Sousa (UFPa). Joint work with CPqD IMOC 2015 Second Best Student Paper Award

Example of reaching limit of available hardware via DSP

Real-time fractional oversampling of high order modulation signals with Nyquist pulse shaping

Issues: Fractional sampling rate conversion: interpolate by L and decimate by M

FPGA clock is slow and parallelism is required

Need to minimize the number of multipliers

Aldebaro Klautau 20

Page 21: Embedded Electronics for Telecom DSP

DAC with Fs = 25 GSa/s and FPGA with 156.25 MHz clock

Parallelism level: 160 (= 25 GSa/s / 156.25 MHz)

Hardware limitation required parallelism

May 19, 2016 Aldebaro Klautau 21

Page 22: Embedded Electronics for Telecom DSP

Real-time Nyquist pulse shaping

Input symbols at given rate Rsym (e.g. 12.5 Gbauds) must be converted to samples at Fs (e.g. 25 Gsa/s) to feed the DAC

Often the oversampling factor L=Rsym/Fs is an integer Then “shaping” is equivalent to interpolation: upsampling followed by an FIR filter h[n] (the Nyquist pulse) with N coefficients

Aldebaro Klautau 22 May 19, 2016

Page 23: Embedded Electronics for Telecom DSP

Fractional sampling rate conversion (FSRC)

Fractional oversampling factor L/M Example 1: L=3 and M=2 implies L/M=1.5 samples/sym and Fs=1.5 Rsym

Example 2: L=10 and M=9 implies L/M=1.11 samples/sym and Fs=1.11 Rsym

Gives flexibility for Nyquist pulse shaping with respect to relation between symbol rate Rsym and sampling frequency Fs

May 23, 2016 Aldebaro Klautau 23

LPF Gain=L, ωc=π/L

L

𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]

LPF Gain=1, ωc=π/M

M

𝒚[𝒏] 𝐳′[𝒎]

Interpolator Decimator

Page 24: Embedded Electronics for Telecom DSP

Nyquist pulse shaping implementations

May 23, 2016 Aldebaro Klautau 24

Resampling = interpolation + decimation

LPF Gain=L,

ωc=min{π/L,π/M} M L

𝒚[𝒏] 𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]

LPF Gain=L, ωc=π/L

L

𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]

LPF Gain=1, ωc=π/M

M

𝒚[𝒏] 𝐳′[𝒎]

Interpolator Decimator

Combine the filters

Polyphase efficient implementation

Page 25: Embedded Electronics for Telecom DSP

Minimum number of multipliers and efficient use of memory

Example: L=3, M=5, parallelism P=15, V=5 stacked FSRCs

25 Aldebaro Klautau

Proposed Parallel FSRC

Page 26: Embedded Electronics for Telecom DSP

Results with parallel FSRC

Decreases computational cost by LM (for example: with L=16 and M=15 2 orders of magnitude)

FPGAs resources usage for L=5, M=4, with filter lengths N=51 or 101 using V = 32 stacked FSRCs (XC5 and XC7 and boards for Virtex 5 and 7, respectively)

26

Look-Up Tables:

Multipliers:

Page 27: Embedded Electronics for Telecom DSP

Validation results

Constellations for back-to-back (B2B) – first set of tests 28.125 GBd Sampling rate 𝐹𝑠 = 30 𝐺𝑆𝑎/𝑠

𝑂𝑣𝑒𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 = 16/15 = 1.0667 samples per symbol

Symbol rate Rsym = 28.125 GBauds

Aldebaro Klautau 27

X polarization Y polarization

Page 28: Embedded Electronics for Telecom DSP

Channelization for FDM over fiber

An example in which smart (polyphase) filtering is not enough:

Aldebaro Klautau 28 May 19, 2016

Page 29: Embedded Electronics for Telecom DSP

Channelization: Digital signal processing

Page 30: Embedded Electronics for Telecom DSP

Mux signal transformations via DSP

~

Resample 𝑰𝒑

~

30

Carrier Carrier Complex Real

Page 31: Embedded Electronics for Telecom DSP

Demux signal transformations via DSP

~

Resample

𝑫𝒑

~

31

Carrier Carrier Complex Real

Adjacent channel strong interference

Page 32: Embedded Electronics for Telecom DSP

Classical filtering result

Filter length may not be enough

Problem: FPGA does not suport real-time operation with more than 3k multipliers

Signal

Gen

DEMUX

Analyzer

May 19, 2016 Aldebaro Klautau 32

Page 33: Embedded Electronics for Telecom DSP

Demux with improved filtering

~

Resample

𝑫𝒑

~

May 19, 2016 Aldebaro Klautau 33

Carrier Carrier Complex Real

Page 34: Embedded Electronics for Telecom DSP

Effect of improved filtering on received signal

May 19, 2016 34

FIR filters with length 90, 150 and 200 With significant improvement regarding distortion, etc.

Page 35: Embedded Electronics for Telecom DSP

Conclusions “Platform FPGAs” have been chosen for cutting-edge research testbeds due to their price and reconfigurability There are wonderful EDA flows to simplify design for FPGAs (e.g. Matlab VHDL FPGA), but for cutting-edge implementations, a skilled developer is often required with

Capability to write custom and efficient VHDL code Good understanding of corresponding IPs Trained to explore parallelism

Along with microelectronics and photonics, telecom algorithms will also evolve towards parallel implementations to cope with the increase on information processing rate

Benefit of increased degrees of freedom (e.g. spatial multiplexing in wireless and optical fibers)

Virtuous cycle: We develop better algorithms when evaluating their real-time implementation on hardware

35

Academia needs to update DSP courses!

Page 36: Embedded Electronics for Telecom DSP

Thanks! Obrigado!

LASSE @ Espaço Inovação – Parque Ciência e Tecnologia Guamá

[email protected] - www.lasse.ufpa.br

April 6, 2016 Aldebaro Klautau 36

Page 37: Embedded Electronics for Telecom DSP

References [Khilo, 2012] Photonic ADC: overcoming the bottleneck of electronic jitter

[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS

[Wong, 2014] Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design

[Trimberger, 2015] Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology

[Lyke, 2015] An Introduction to Reconfigurable Systems

[Shannon, 2015] Technology Scaling in FPGAs: Trends in Applications and Architectures

[Maher, 2016] Increasing the information rates of optical communications via coded modulation: a study of transceiver performance

[Nagatani, 2011] A 60-GS/s 6-Bit DAC in 0.5-µm InP HBT Technology for Optical Communications Systems

[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS

[Eiselt, 2016] First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of SSMF at 1550 nm

[Ilan, 2015] Parallel Polyphase Filtering for Pulse Shaping on High-Speed Optical Communication Systems

[Kuon, 2007] Measuring the Gap Between FPGAs and ASICs

[Jamieson, 2005] Mapping multiplexers onto hard multipliers in FPGAs

Aldebaro Klautau 37 May 19, 2016