Algorithms and Architecture s for Future Wireless Base-Stations

Algorithms and Architectures for Future Wireless Base-Stations

Sridhar Rajagopal and Joseph CavallaroECE Department Rice University

April 19, 2000

This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF

4/19/00 TI Meeting 2

Overview

Future Base-Stations

Current DSP Implementation

Our Approach– Make Algorithms Computationally effective

– Task Partitioning for pipelining, parallelism

Processor Design for Accelerating Wireless

Evolution of Wireless Comm

First Generation

Second/Current Generation

Voice + Low-rate Data (9.6Kbps)

Third Generation +Voice + High-rate Data (2 Mbps) + Multimedia

W-CDMA

Communication System Uplink

Direct PathReflected Paths

Noise +MAI

User 1

User 2

Base Station

Main Processing Blocks

Channel Estimation Detection Decoding

Baseband Layer of Base-Station Receiver

Proposed Base-Station No Multiuser Detection

TI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm)

Real -Time Requirements

Multiple Data Rates by Varying Spreading Factors

Detection needs to be done in real-time– 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128

SpreadingFactor

Number ofBits / Frame

Data RateRequirement

4 10240 1024 Kbps32 1280 128 Kbps

256 160 16 Kbps

Current DSP Implementation

9 10 11 12 13 14 150

18x 10

Number of Users

Data Rate Comparisons for Matched Filter and Multiuser Detector

Multiuser Detector(C67) Matched Filter(C67) Multiuser Detector(C64)*Matched Filter(C64)*

Targeted Data Rate

Targeted Data Rate = 128Kbps

C67 at 166MHz

Projected (8x)

Complexity

Algorithm Choice Limited by Complexity– Multistage reduces data rate by half.

Main Features– Matrix based operations

– High levels of parallelism

– Bit level computations

32x32 problem size for the Detector shown

Estimation, Decoding assumed pipelined.

Reasons

Sophisticated, Compute-Intensive Algorithms

Need more MIPs/FLOPs performance

Unable to fully exploit pipelining or parallelism

Bit - level computations / Storage

Our Approach

Make algorithms computationally effective

– without sacrificing error rate performance

Task Partitioning on Multiple Processing Elements– DSPs : Core

– FPGAs : Application Specific / Bit-level Computations

Processor with reconfigurable support and extensions for

wireless

Algorithms

Channel Estimation– Avoid inversion by iterative scheme

Detection– Avoid block-based detection by pipelining

Computations Involved

Compute Correlation Matrices

iibr L 1

iibb L 1

2 Bits of K async. users aligned at times I and I-1

Received bits of spreading length N for K users

iiii bAr ri

bibi+1

Multishot Detection

CAKDND

Multishot Detection

AAA 10i

Solve for the channel estimate, Ai

RAR bribb

Differencing Multistage Detection

Stage 0- Matched Filter

Stage 1

Successive Stages

ysignd

dSAAyy

ysignd

]Re[11

ysignd

xSAAyy

S=diag(AHA)

y - soft decision

d - detected bits

(hard decision)

Iterative Scheme

Tracking

Method of Steepest Descent

Stable convergence behavior

Same Performance

TTLLbbbb bbbbRR 00 **

HHLLbrbr rbrbRR 00 **

)*( brbb RRAAA rbR

iibr bbR

RAR bribb *

Simulations - AWGN Channel

Detection Window =

SINR = 0

Paths =3

Preamble L =150

Spreading N = 31

Users K = 15

10000 bits/userMF – Matched Filter

ML- Maximum

Likelihood

ACT – using inversion4 5 6 7 8 9 10 11 1210

10-1 Comparison of Bit Error Rates (BER)

Signal to Noise Ratio (SNR)

MF ActMFML ActML

O(K2N)

O(K3+K2N)

Fading Channel with Tracking

4 5 6 7 8 9 10 11 1210

MF - Static MF - TrackingML - Static ML - Tracking

Doppler = 10 Hz, 1000 Bits,15 users, 3 Paths

Block Based Detector

Matched Filter

Stage 1

Stage 2

Stage 3

Matched Filter

Stage 1

Stage 2

Stage 3

Bits 2-11

Bits 12-21

Pipelined Detector

1 2 3 4 5 6 7 8 9 10 11 12

Matched Filter

Stage 1

Stage 2

Stage 3

1 2 3 4 5 6 7 8 9 10 11 12

Task Decomposition [Asilomar99]

Matrix Products

InverseCorrelation Matrices (Per

Rbr[I]O(KN)

O(K2N)

AHrO(KND)

O(K2N)

O(K2N)RbbAH = Rbr[I]O(K2N)

Multistage Detection

(Per Window)

O(DK2Me)

Data’MUX

= Rbr[R]O(K2N)

Rbr[R]O(KN)

Block I Block II Block III

Block IV

Channel Estimation Matched Filter

Multistage Detector

Achieved Data Rates

9 10 11 12 13 14 150

Number of Users

Data Rates for Different Levels of Pipelining and Parallelism

(Parallel A) (Parallel+Pipe B)(Parallel A) (Pipe B) (Parallel A) B A B Sequential A + B

Data Rate Requirement = 128 Kbps

VLSI Implementation

Channel Estimation as a Case Study

Area - Time Efficient Architecture

Real - Time Implementation

Bit- Level Computations - FPGAs

Core Operations - DSPs

Motivation for Architecture

Wireless, the next wave after Multimedia

Highly Compute-Intensive Algorithms

Real-Time Requirements

Outline

Processor Core with Reconfigurable Support

Permutation Based Interleaved Memory

Processor Architecture -EPIC

Instruction Set Extensions

Truncated Multipliers

Software Support Needed

Characteristics of Wireless Algorithms

Massive Parallelism

Bit-level Computations

Matrix Based Operations

Memory Intensive

Complex-valued Data

Approximate Computations

What’s wrong with Current Architectures for these applications?

Problems with Current Architectures

UltraSPARC, C6x, MMX, IA-64

Not enough MIPs/FLOPs

Unable to fully exploit parallelism

Bit Level Computations

Memory Bottlenecks

Specialized Instructions for Wireless Communications

Why Reconfigurable

Adapt algorithms to environment

Seamless and Continuous Data Processing during

Handoffs

Home Area Wireless LAN

High Speed Office Wireless LAN

Outdoor CDMA Cellular Network

Reconfigurable Support

User InterfaceTranslation

SynchronizationTransport Network

OSILayers

Data Link Layer(Converts Frames

to Bits)

OSILayer

Physical Layer(hardware;

raw bit stream)

OSILayer

Different Protocols

Source Coding Channel Coding

Channel

Decoding

Source

Decoding

Multiuser

Detection

Channel

Estimation

MPEG-4, H.723 - Voice,Multimedia

Convolutional,Turbo - Channel Coding

A New Architecture

Processor Core

(GPP/DSP)

Crossbar

Reconfigurable

Real-Time I/O

Bit Stream

Memory

RF Unit

Processor

Add-on PCMCIACard

Why Reconfigurable

Process initial bit level computations

Optimize for fast I/O transfer

Reconfigurable

Real-Time I/O

Bit StreamRF Unit

Configuration Caches

2 64-bit data buses1 64-bit address bus

ControlBlocks

SequencerGARP Architecture at UC,Berkeley

Boolean values 64-bit Datapath Fast I/O

Wide Path to Memory

– Data Transfer

– Minimize Load Times

Configuration Caches

– Recently Displaced Configurations(5 cycles)

– Can hold 4 full size Configurations

Independent Execution

Access to same Memory System as Processor

– Minimize overhead

When idle

– Load Configurations

– Transfer Data

Memory Interface

Access to Main Memory and L1 Data Cache– Large, fast Memory Store

Memory Prefetch Queues for Sequential Accesses– Read aheads and Write Behinds

Processor Core

(GPP/DSP)

L1 Data Cache

Crossbar

Memory

Instruction Cache

Permutation Based Interleaved Memory (PBI)

High Memory Bandwidth Needed

Stride-Insensitive Memory System for Matrices

Multiple Banks

Sustained Peak Throughput (95%)

L1 Data Cache

Memory

Processor Core

64-bit EPIC Architecture with Extensions(IA-64/C6x)

Statically determined Parallelism;exploit ILP

Execution Time Predictability

Processor Core

(GPP/DSP)

Crossbar

EPIC Principle

Explicitly Parallel Instruction Computing

Evolution of VLIW Computing

Compiler- Key role

Architecture to assist Compiler

Better cope with dynamic factors

– which limited VLIW Parallelism

Instruction Set Extensions

To accelerate Bit level computations in Wireless

Real/Complex Integer - Bit Multiplications

– Used in Multiuser Detection, Decoding

Bit - Bit Multiplications

– Used in Outer Product Updates

– Correlation, Channel Estimation

Complex Integer-Integer Multiplications

Useful in other Signal Processing applications

– Speech, Video,,,

Architecture Support

Support via Instruction Set Extensions

Minimal ALU Modifications necessary

Transparent to Register Files/Memory

Additional 8-bit Special Purpose Registers

Integer - Bit Multiplications

64-bit Register A 64-bit Register C

+/- +/- +/-

64-bit Register D

D = D + b*CEg: Cross-Correlation

8-bit Register b

Register Renaming?

8-bit to 64-bit conversions

D = D + b*bT

Eg: Auto-Correlation

b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8)

b(1)..b(8) b(1) b(1) b(8)

b(1)..b(8) b(1) b(2) b(8)b(7)

8-bit Register b 64-bit Register A

1.1 1.2

Bit-Bit Multiplications

D = D + b*bT

64-bit Register A = b1 64-bit Register B=b2

Ex-NOR

b1*b2Bit-Bit Multiplications

64-bit Register C=b1*b2

B1 B2 B1*B2

0 0 10 1 01 0 01 1 1

Increment/Decrement

64-bit Register D

+/- +/- +/-

64-bit Register (D+b1*b2)

8-bit Register b1*b2

D = D + b*bT

Complex-valued Data Processing

Is it easy to add ?

Is this worth an additional ALU Support ?

Typically supported by Software!

Truncated Multipliers

Many applications need approximate computations

Adaptive Algorithms :Y = Y + mu*(Y*C)

Truncate lower bits

Truncated Multipliers - half the area/half the delay

Can do 2 truncated multiplies in parallel with regular

Multiplier 1 Multiplier 2Truncated

Multiplier

ALU Multipliers

Software Support

Greater Interaction between Compilers and Architectures– EPIC

– Reconfigurable Logic

Compiler needs to find and exploit bit level computations

Reconfigurable Logic Programming

Other Uses

Reconfigurable Logic– For accelerating loops of general purpose processors

Bit Level Support– For other voice, video and multimedia applications

Software Suggestions

Limited OS Support

Compiler Efficiency – No more Assembly!

Performance Analysis Tools

Code Composer Studio 1.2

Conclusions

DSPs to play major role in Future Base-Station

Search for Computationally Efficient Algorithms and Better

Processor Designs to meet Real-Time

Reduced Complexity Algorithms designed

Processor Core with Reconfigurable Support developed

Extra Slides

PBI Scheme

N- address length

M = 2n Banks

2N-n words in each bank

To access a word,

– n-bit bank number

– N-n bit address (high-order)

Calculation of the n-bit Bank Number

Calculate Bank Number

Use all N bits to get n-bit vector Y = A X , A = n*N matrix of 0’s & 1’s Y = AhXh + Al Xl (N-n,n) [Al -rank n] N-bit parity circuit with logkN levels of XOR gates (k-Fanin)

Parity Ckt.

Row 0 of A

Parity Ckt.

Row 1 of A

Parity Ckt.

Row n-1 of A

N-bit address

Decoder

n parity bit signals

2n bank select signals

Interleaved Memory Model

Address Source

M(0) M(1) M(M-1)

Data Sink Data Sequencer

Input Buffers

Output Buffers

Memory Banks

Aspects of EPIC

Designing Plan of Execution(POE) at Compile Time

Permitting Compiler to play Statistics– Conditional Branches, Memory references

Communicating POE to the hardware– Static Scheduling

– Branch information

Architecture Features in EPIC

Static Scheduling– MultiOP

– Non-Unit Assumed Latency (NUAL)

The Branch Problem– Predicated Execution

– Control Speculation

– Predicated Code Motion

The Memory Problem– Cache Specifiers

– Data Speculation

Operation of Reconfigurable Logic

Load Configuration

– If in configuration cache, minimal time

Copy initial data with coprocessor move instructions

Start execution

Issue wait that interlocks while active

Copy registers back at kernel completion

Algorithms and Architecture s for Future Wireless Base-Stations

Documents

Algorithms for Wireless Network Design : A Cell Breathing Heuristic Algorithms for Wireless Network Design : A Cell Breathing Heuristic MohammadTaghi HajiAghayi

Base Stations and Wireless Networks: Exposures andwhqlibdoc.who.int/publications/2007/9789241595612_eng.pdf · Base Stations and Wireless Networks: ... Base stations and wireless

Algorithms and Optimization for Wireless Networks

Reconï¬gurable stream processors for wireless base-stations

Routing Algorithms Analysis for Wireless Sensor …mwang2/projects/WSN_routingAlgorithms_15w.pdf · Routing Algorithms Analysis for Wireless Sensor Networks ... Routing algorithm

Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This

Agile-Link Wireless Base Stationsfiles.microstrain.com/manuals/base_station_manual.pdf · Agile-Link Wireless Base Stations 2.4 GHz USB, Analog and Serial Base Stations . MicroStrain,

MAC Algorithms in Wireless Networks - cs.umu. · PDF fileMAC Algorithms in Wireless Networks ... algorithms and protocols together with their issues and comparison is ... version is

DISTRIBUTED OPTIMIZATION ALGORITHMS FOR MULTIHOP WIRELESS ... · DISTRIBUTED OPTIMIZATION ALGORITHMS FOR MULTIHOP WIRELESS NETWORKS Andre´ Schumacher. TKK Dissertations in Information

Base Stations and Wireless Networks: Exposures

Machine learning algorithms for cognitive radio wireless ... · Machine learning algorithms for cognitive radio wireless ... Machine Learning Algorithms for Cognitive ... 6.4.4 ECOC

LOCALIZATION ALGORITHMS FOR WIRELESS SENSOR …

Using genetic algorithms to optimise Wireless SensorNetwork … · Loughborough University Institutional Repository Using genetic algorithms to optimise Wireless Sensor Network Design

DSPs for Future Wireless Base-Stations

Localization Algorithms for Wireless Sensor Networks

Effective Multithreshold Decoding Algorithms for Wireless ... · Effective Multithreshold Decoding Algorithms for Wireless Communication Channels Zolotarev V., Ovechkin G. Department

Designing Localization Algorithms for Wireless Sensor

ALGORITHMS FOR DATA-GATHERING IN WIRELESS SENSOR …

Wireless Network Algorithms, Systems, and Applications

HomePlug 802.11g Access Point - usermanual.wiki · 802.11g Access Point links your 802.11b or 802.1x Wireless Stations to your HomePlug network.. The Wireless stations and HomePlug