Upload
cpqd
View
125
Download
2
Embed Size (px)
Citation preview
Embedded Electronics for Telecom DSP
Aldebaro Klautau
Embedded Systems Lab (LASSE) @ Federal Univ. of Pará (UFPA)
V International Workshop on Trends in Optical Technologies (WTON)
CPqD – Campinas – Brazil - May 19, 2016
UFPA
Goal and Agenda
Goal: discuss options for prototyping new physical layers (PHY) of DSP-based telecommunication systems
From the perspective of a digital signal processing R&D group that (furiously) targets the highest possible bit rates
No ASICs, but discrete components & development boards
Agenda Motivation: demand for increased bit rates
Options for prototyping: emphasis on DSP processor and FPGA
Examples of prototypes using the most from available hardware
May 19, 2016 Aldebaro Klautau 2
Bit-rate hungry applications
Optical transmission with flexible transceivers
Software-defined radios and 5G Architecture: Small cells and centralized-RAN PHY: Spectrum aggregation, massive MIMO, mmWaves
Example of 4G traffic: 4 signals with BW=20 MHz ~3.7 Gbps
In newer versions of LTE number of antennas can be 16 or 32 Bit rate = 15 Gbps or 30 Gbps
Aldebaro Klautau 3 May 19, 2016
Electronic components and associated development boards for prototyping
Aldebaro Klautau 4 May 19, 2016
Prototype
GPU DSP ASSP ASIC FPGA
Standard cells
Full custom IC
GPU: graphics processing unit ASSP: application specific standard product
Complete DMT transceiver development
FFT-based Discrete Multi-Tone (DMT) bitloading supporting up to 10 bits per tone (1024-QAM)
5
Bits per tone
For DMT task: a DSP processor (SoC) chosen as platform
Aldebaro Klautau 6
4 cores FFT coprocessors
Network coprocessor
Viterbi coprocessors
C language programming
Our main motivation: program in C language
Besides, free open source routines available. Example: Forward Error Correction (FEC)
But good performance required heavy optimization
Comparison of Reed-Solomon (RS) implementations, per codeword
7
Many routines to split among cores
Issues related to concurrency and parallelism
April 6, 2016 Aldebaro Klautau 8
Architectural split of functionalities among DSP cores
9
Significant effort to optimize code for the platform
April 6, 2016 Aldebaro Klautau 10
Level 1 - Compiler Optimizations Level 2 - Code Organization/Refactoring Level 3 - Architecture Optimization
From “programmable logic” to the “platform FPGA”
11
[Lyke, 2015]
May 19, 2016
evolution
FPGA boards support several interfaces and peripherals
Several FMC (FPGA mezzanine card) boards
PC interface: PCIe to FPGA (up to 30 Gbps) Commonly present in FPGA evaluation boards
Aldebaro Klautau 12
High speed ADC/DAC cards
8x SFP expansion card
General purpose
Prototyping with FPGAs
HDL (VHDL, Verilog, etc.) is more difficult than C and most engineers are exposed to “programmable” logic (digital electronics) but not digital signal processing on FPGAs and parallel programming
Go for DSP “general-purpose” chips?
Note that multicore alternatives also require good skills on concurrent and parallel programming and often a profound knowledge of the chip architecture
Changing the DSP chip manufacturer requires studying the new architecture while FPGAs are more “generic”
FPGAs are more natural step towards silicon / ASIC than using DSP chips
Aldebaro Klautau 13
ADC trends
Photonic ADCs
Undersampling : signals sampled below their Nyquist rates
Compressive sampling E.g. Bayesian approach
May 23, 2016 Aldebaro Klautau 14
[Khilo, 2012]
Limits on ENOB (effective number of bits) due to Jitter
ADCs up to 2007
Darker blue: ADCs later than 2007
Some DAC performance numbers
Summary: DACs and AWGs (arbitrary waveform generators), together with ADCs and DSOs (digital storage oscilloscopes) operating at ~100 GSa/s
Hence, the computing platform (DSP, FPGA, ASSP, etc.) may be the bottleneck! 15
bits BW (GHz) Fs (Gsa/s) ENOB
Micram DAC-4 6 42 100 -
Micram DAC-3 6 23.8 72 4.5
Micram DACII 6 20 34 4
[Nagatani, 2011] 6 - 60 -
[Huang, 2014] 8 10 100 5.3
“Design gap” does not help those aiming at bit rate records
“Gap”: FPGA has enough capacity to accomodate most of the ASIC designs
But achieving symbol rates of tens of Gbauds is hard for a real-time transmitter implementation and often impossible for a receiver
Aldebaro Klautau 16
[Trimberger, 2015]
May 19, 2016
Architectures for PHY testbeds and demonstrations
Offline processing Both transmitter (Tx) and receiver (Rx) processing are performed offline
Often FPGA-based
Transmitter: samples are pre-computed, stored at e.g. FPGA memory and sent to channel via fast DAC
Receiver: fast digital storage oscilloscope (DSO) digitizes received signal
Real-time receiver processing Often based on ASICs or ASSPs
Real-time transmitter processing May use FPGA with internal PRBS generation to avoid “slow” interface to PC
Aldebaro Klautau 17 May 19, 2016
State of art offline processing example
1.125 Tb/s 15-carrier super-channel
Two DACs at 32 GSa/s (oversampling of 4 samples/symbol)
DSO with 62.5 GSa/s using two interleaved 33 GSa/s ADCs
Aldebaro Klautau 18 May 19, 2016
[Maher, 2016]
State of art Tx + Rx real-time processing example
[Eiselt, 2016] “First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of SSMF at 1550 nm”
ASIC chips
Extra info: 8 x 25.78125 GBaud signals, PAM-4, 100 km; 𝜆 = 1550 𝑛𝑚
19
Real-time transmitter processing example
Implementation by Ilan Sousa (UFPa). Joint work with CPqD IMOC 2015 Second Best Student Paper Award
Example of reaching limit of available hardware via DSP
Real-time fractional oversampling of high order modulation signals with Nyquist pulse shaping
Issues: Fractional sampling rate conversion: interpolate by L and decimate by M
FPGA clock is slow and parallelism is required
Need to minimize the number of multipliers
Aldebaro Klautau 20
DAC with Fs = 25 GSa/s and FPGA with 156.25 MHz clock
Parallelism level: 160 (= 25 GSa/s / 156.25 MHz)
Hardware limitation required parallelism
May 19, 2016 Aldebaro Klautau 21
Real-time Nyquist pulse shaping
Input symbols at given rate Rsym (e.g. 12.5 Gbauds) must be converted to samples at Fs (e.g. 25 Gsa/s) to feed the DAC
Often the oversampling factor L=Rsym/Fs is an integer Then “shaping” is equivalent to interpolation: upsampling followed by an FIR filter h[n] (the Nyquist pulse) with N coefficients
Aldebaro Klautau 22 May 19, 2016
Fractional sampling rate conversion (FSRC)
Fractional oversampling factor L/M Example 1: L=3 and M=2 implies L/M=1.5 samples/sym and Fs=1.5 Rsym
Example 2: L=10 and M=9 implies L/M=1.11 samples/sym and Fs=1.11 Rsym
Gives flexibility for Nyquist pulse shaping with respect to relation between symbol rate Rsym and sampling frequency Fs
May 23, 2016 Aldebaro Klautau 23
LPF Gain=L, ωc=π/L
L
𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]
LPF Gain=1, ωc=π/M
M
𝒚[𝒏] 𝐳′[𝒎]
Interpolator Decimator
Nyquist pulse shaping implementations
May 23, 2016 Aldebaro Klautau 24
Resampling = interpolation + decimation
LPF Gain=L,
ωc=min{π/L,π/M} M L
𝒚[𝒏] 𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]
LPF Gain=L, ωc=π/L
L
𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]
LPF Gain=1, ωc=π/M
M
𝒚[𝒏] 𝐳′[𝒎]
Interpolator Decimator
Combine the filters
Polyphase efficient implementation
Minimum number of multipliers and efficient use of memory
Example: L=3, M=5, parallelism P=15, V=5 stacked FSRCs
25 Aldebaro Klautau
Proposed Parallel FSRC
Results with parallel FSRC
Decreases computational cost by LM (for example: with L=16 and M=15 2 orders of magnitude)
FPGAs resources usage for L=5, M=4, with filter lengths N=51 or 101 using V = 32 stacked FSRCs (XC5 and XC7 and boards for Virtex 5 and 7, respectively)
26
Look-Up Tables:
Multipliers:
Validation results
Constellations for back-to-back (B2B) – first set of tests 28.125 GBd Sampling rate 𝐹𝑠 = 30 𝐺𝑆𝑎/𝑠
𝑂𝑣𝑒𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 = 16/15 = 1.0667 samples per symbol
Symbol rate Rsym = 28.125 GBauds
Aldebaro Klautau 27
X polarization Y polarization
Channelization for FDM over fiber
An example in which smart (polyphase) filtering is not enough:
Aldebaro Klautau 28 May 19, 2016
Channelization: Digital signal processing
Mux signal transformations via DSP
~
Resample 𝑰𝒑
~
30
Carrier Carrier Complex Real
Demux signal transformations via DSP
~
Resample
𝑫𝒑
~
31
Carrier Carrier Complex Real
Adjacent channel strong interference
Classical filtering result
Filter length may not be enough
Problem: FPGA does not suport real-time operation with more than 3k multipliers
Signal
Gen
DEMUX
Analyzer
May 19, 2016 Aldebaro Klautau 32
Demux with improved filtering
~
Resample
𝑫𝒑
~
May 19, 2016 Aldebaro Klautau 33
Carrier Carrier Complex Real
Effect of improved filtering on received signal
May 19, 2016 34
FIR filters with length 90, 150 and 200 With significant improvement regarding distortion, etc.
Conclusions “Platform FPGAs” have been chosen for cutting-edge research testbeds due to their price and reconfigurability There are wonderful EDA flows to simplify design for FPGAs (e.g. Matlab VHDL FPGA), but for cutting-edge implementations, a skilled developer is often required with
Capability to write custom and efficient VHDL code Good understanding of corresponding IPs Trained to explore parallelism
Along with microelectronics and photonics, telecom algorithms will also evolve towards parallel implementations to cope with the increase on information processing rate
Benefit of increased degrees of freedom (e.g. spatial multiplexing in wireless and optical fibers)
Virtuous cycle: We develop better algorithms when evaluating their real-time implementation on hardware
35
Academia needs to update DSP courses!
Thanks! Obrigado!
LASSE @ Espaço Inovação – Parque Ciência e Tecnologia Guamá
[email protected] - www.lasse.ufpa.br
April 6, 2016 Aldebaro Klautau 36
References [Khilo, 2012] Photonic ADC: overcoming the bottleneck of electronic jitter
[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS
[Wong, 2014] Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design
[Trimberger, 2015] Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology
[Lyke, 2015] An Introduction to Reconfigurable Systems
[Shannon, 2015] Technology Scaling in FPGAs: Trends in Applications and Architectures
[Maher, 2016] Increasing the information rates of optical communications via coded modulation: a study of transceiver performance
[Nagatani, 2011] A 60-GS/s 6-Bit DAC in 0.5-µm InP HBT Technology for Optical Communications Systems
[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS
[Eiselt, 2016] First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of SSMF at 1550 nm
[Ilan, 2015] Parallel Polyphase Filtering for Pulse Shaping on High-Speed Optical Communication Systems
[Kuon, 2007] Measuring the Gap Between FPGAs and ASICs
[Jamieson, 2005] Mapping multiplexers onto hard multipliers in FPGAs
Aldebaro Klautau 37 May 19, 2016