59
Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF

Algorithms and Architecture s for Future Wireless Base-Stations

Embed Size (px)

DESCRIPTION

Algorithms and Architecture s for Future Wireless Base-Stations. Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000. This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF. Overview. Future Base-Stations - PowerPoint PPT Presentation

Citation preview

Page 1: Algorithms and Architecture s for Future Wireless Base-Stations

Algorithms and Architectures for Future Wireless Base-Stations

Sridhar Rajagopal and Joseph CavallaroECE Department Rice University

April 19, 2000

This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF

Page 2: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 2

Overview

Future Base-Stations

Current DSP Implementation

Our Approach– Make Algorithms Computationally effective

– Task Partitioning for pipelining, parallelism

Processor Design for Accelerating Wireless

Page 3: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 3

Evolution of Wireless Comm

First Generation

Voice

Second/Current Generation

Voice + Low-rate Data (9.6Kbps)

Third Generation +Voice + High-rate Data (2 Mbps) + Multimedia

W-CDMA

Page 4: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 4

Communication System Uplink

Direct PathReflected Paths

Noise +MAI

User 1

User 2

Base Station

Page 5: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 5

Main Processing Blocks

Channel Estimation Detection Decoding

Baseband Layer of Base-Station Receiver

Page 6: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 6

Proposed Base-Station No Multiuser Detection

TI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm)

Page 7: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 7

Real -Time Requirements

Multiple Data Rates by Varying Spreading Factors

Detection needs to be done in real-time– 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128

Kbps

SpreadingFactor

Number ofBits / Frame

Data RateRequirement

4 10240 1024 Kbps32 1280 128 Kbps

256 160 16 Kbps

Page 8: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 8

Current DSP Implementation

9 10 11 12 13 14 150

2

4

6

8

10

12

14

16

18x 10

4

Number of Users

Da

ta R

ate

s A

ch

iev

ed

Data Rate Comparisons for Matched Filter and Multiuser Detector

Multiuser Detector(C67) Matched Filter(C67) Multiuser Detector(C64)*Matched Filter(C64)*

Targeted Data Rate

Targeted Data Rate = 128Kbps

C67 at 166MHz

Projected (8x)

Page 9: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 9

Complexity

Algorithm Choice Limited by Complexity– Multistage reduces data rate by half.

Main Features– Matrix based operations

– High levels of parallelism

– Bit level computations

32x32 problem size for the Detector shown

Estimation, Decoding assumed pipelined.

Page 10: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 10

Reasons

Sophisticated, Compute-Intensive Algorithms

Need more MIPs/FLOPs performance

Unable to fully exploit pipelining or parallelism

Bit - level computations / Storage

Page 11: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 11

Our Approach

Make algorithms computationally effective

– without sacrificing error rate performance

Task Partitioning on Multiple Processing Elements– DSPs : Core

– FPGAs : Application Specific / Bit-level Computations

Processor with reconfigurable support and extensions for

wireless

Page 12: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 12

Algorithms

Channel Estimation– Avoid inversion by iterative scheme

Detection– Avoid block-based detection by pipelining

Page 13: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 13

Computations Involved

Model

Compute Correlation Matrices

rbRH

iibr L 1

bbRT

iibb L 1

CrRb

N

i

K

i

2 Bits of K async. users aligned at times I and I-1

Received bits of spreading length N for K users

iiii bAr ri

bibi+1

time

delay

Page 14: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 14

Multishot Detection

b

b

b

b

A

AAAA

DK

D

K

0

10

10

r

,

,1

1,

1,1

000

00

00

CAKDND

Multishot Detection

AAA 10i

Solve for the channel estimate, Ai

RAR bribb

CANK

i

2

Page 15: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 15

Differencing Multistage Detection

Stage 0- Matched Filter

Stage 1

Successive Stages

)(

]Re[

)(

]Re[

11

001

00

0

ysignd

dSAAyy

ysignd

rAy

H

H

)(

]Re[11

1

1

ll

lHll

lll

ysignd

xSAAyy

ddx

S=diag(AHA)

y - soft decision

d - detected bits

(hard decision)

Page 16: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 16

Iterative Scheme

Tracking

Method of Steepest Descent

Stable convergence behavior

Same Performance

TTLLbbbb bbbbRR 00 **

HHLLbrbr rbrbRR 00 **

)*( brbb RRAAA rbR

H

iibr bbR

T

iibb

RAR bribb *

Page 17: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 17

Simulations - AWGN Channel

Detection Window =

12

SINR = 0

Paths =3

Preamble L =150

Spreading N = 31

Users K = 15

10000 bits/userMF – Matched Filter

ML- Maximum

Likelihood

ACT – using inversion4 5 6 7 8 9 10 11 1210

-3

10-2

10-1 Comparison of Bit Error Rates (BER)

Signal to Noise Ratio (SNR)

BER

MF ActMFML ActML

O(K2N)

O(K3+K2N)

Page 18: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 18

Fading Channel with Tracking

4 5 6 7 8 9 10 11 1210

-3

10-2

10-1

100

SNR

BE

R

MF - Static MF - TrackingML - Static ML - Tracking

Doppler = 10 Hz, 1000 Bits,15 users, 3 Paths

Page 19: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 19

Block Based Detector

1 12

1 12

1 12

1 12

11 22

11 22

11 22

11 22

Matched Filter

Stage 1

Stage 2

Stage 3

Matched Filter

Stage 1

Stage 2

Stage 3

Bits 2-11

Bits 12-21

Page 20: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 20

Pipelined Detector

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

Matched Filter

Stage 1

Stage 2

Stage 3

1 2 3 4 5 6 7 8 9 10 11 12

Page 21: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 21

Task Decomposition [Asilomar99]

Matrix Products

InverseCorrelation Matrices (Per

Bit)

Rbr[I]O(KN)

A0HA1

O(K2N)

AHrO(KND)

A1HA1

O(K2N)

A0HA0

O(K2N)RbbAH = Rbr[I]O(K2N)

Multistage Detection

(Per Window)

O(DK2Me)

b

Pilot

Data

MUX

d

Data’MUX

RbbAH

= Rbr[R]O(K2N)

d

Rbr[R]O(KN)

Rbb

O(K2)

Block I Block II Block III

Block IV

Channel Estimation Matched Filter

Multistage Detector

Page 22: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 22

Achieved Data Rates

9 10 11 12 13 14 150

0.5

1

1.5

2

2.5

3x 10

5

Number of Users

Dat

a R

ates

Data Rates for Different Levels of Pipelining and Parallelism

(Parallel A) (Parallel+Pipe B)(Parallel A) (Pipe B) (Parallel A) B A B Sequential A + B

Data Rate Requirement = 128 Kbps

Page 23: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 23

VLSI Implementation

Channel Estimation as a Case Study

Area - Time Efficient Architecture

Real - Time Implementation

Bit- Level Computations - FPGAs

Core Operations - DSPs

Page 24: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 24

Motivation for Architecture

Wireless, the next wave after Multimedia

Highly Compute-Intensive Algorithms

Real-Time Requirements

Page 25: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 25

Outline

Processor Core with Reconfigurable Support

Permutation Based Interleaved Memory

Processor Architecture -EPIC

Instruction Set Extensions

Truncated Multipliers

Software Support Needed

Page 26: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 26

Characteristics of Wireless Algorithms

Massive Parallelism

Bit-level Computations

Matrix Based Operations

Memory Intensive

Complex-valued Data

Approximate Computations

Page 27: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 27

What’s wrong with Current Architectures for these applications?

Page 28: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 28

Problems with Current Architectures

UltraSPARC, C6x, MMX, IA-64

Not enough MIPs/FLOPs

Unable to fully exploit parallelism

Bit Level Computations

Memory Bottlenecks

Specialized Instructions for Wireless Communications

Page 29: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 29

Why Reconfigurable

Adapt algorithms to environment

Seamless and Continuous Data Processing during

Handoffs

Home Area Wireless LAN

High Speed Office Wireless LAN

Outdoor CDMA Cellular Network

Page 30: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 30

Reconfigurable Support

User InterfaceTranslation

SynchronizationTransport Network

OSILayers

3-7

Data Link Layer(Converts Frames

to Bits)

OSILayer

2

Physical Layer(hardware;

raw bit stream)

OSILayer

1

Page 31: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 31

Different Protocols

Source Coding Channel Coding

Channel

Decoding

Source

Decoding

Multiuser

Detection

Channel

Estimation

MPEG-4, H.723 - Voice,Multimedia

Convolutional,Turbo - Channel Coding

Page 32: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 32

A New Architecture

Processor Core

(GPP/DSP)

Cache

Q Q

Crossbar

Reconfigurable

Logic

Real-Time I/O

Bit Stream

Main

Memory

RF Unit

Processor

Add-on PCMCIACard

Page 33: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 33

Why Reconfigurable

Process initial bit level computations

Optimize for fast I/O transfer

Reconfigurable

Logic

Real-Time I/O

Bit StreamRF Unit

Page 34: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 34

Reconfigurable Support

Configuration Caches

2 64-bit data buses1 64-bit address bus

ControlBlocks

SequencerGARP Architecture at UC,Berkeley

Boolean values 64-bit Datapath Fast I/O

Page 35: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 35

Reconfigurable Support

Wide Path to Memory

– Data Transfer

– Minimize Load Times

Configuration Caches

– Recently Displaced Configurations(5 cycles)

– Can hold 4 full size Configurations

Independent Execution

Page 36: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 36

Reconfigurable Support

Access to same Memory System as Processor

– Minimize overhead

When idle

– Load Configurations

– Transfer Data

Page 37: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 37

Memory Interface

Access to Main Memory and L1 Data Cache– Large, fast Memory Store

Memory Prefetch Queues for Sequential Accesses– Read aheads and Write Behinds

Processor Core

(GPP/DSP)

L1 Data Cache

Q Q

Crossbar

Main

Memory

FPGA

Instruction Cache

Page 38: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 38

Permutation Based Interleaved Memory (PBI)

High Memory Bandwidth Needed

Stride-Insensitive Memory System for Matrices

Multiple Banks

Sustained Peak Throughput (95%)

L1 Data Cache

Main

Memory

Page 39: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 39

Processor Core

64-bit EPIC Architecture with Extensions(IA-64/C6x)

Statically determined Parallelism;exploit ILP

Execution Time Predictability

Processor Core

(GPP/DSP)

Cache

Q Q

Crossbar

FPGA

Page 40: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 40

EPIC Principle

Explicitly Parallel Instruction Computing

Evolution of VLIW Computing

Compiler- Key role

Architecture to assist Compiler

Better cope with dynamic factors

– which limited VLIW Parallelism

Page 41: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 41

Instruction Set Extensions

To accelerate Bit level computations in Wireless

Real/Complex Integer - Bit Multiplications

– Used in Multiuser Detection, Decoding

Bit - Bit Multiplications

– Used in Outer Product Updates

– Correlation, Channel Estimation

Complex Integer-Integer Multiplications

Useful in other Signal Processing applications

– Speech, Video,,,

Page 42: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 42

Architecture Support

Support via Instruction Set Extensions

Minimal ALU Modifications necessary

Transparent to Register Files/Memory

Additional 8-bit Special Purpose Registers

Page 43: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 43

Integer - Bit Multiplications

64-bit Register A 64-bit Register C

+/- +/- +/-

64-bit Register D

D = D + b*CEg: Cross-Correlation

8-bit Register b

Register Renaming?

Page 44: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 44

8-bit to 64-bit conversions

D = D + b*bT

Eg: Auto-Correlation

b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8)

b(1)..b(8) b(1) b(1) b(8)

b(1)..b(8) b(1) b(2) b(8)b(7)

b(8)

8-bit Register b 64-bit Register A

1.1 1.2

2.1

Page 45: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 45

Bit-Bit Multiplications

D = D + b*bT

Eg: Auto-Correlation

64-bit Register A = b1 64-bit Register B=b2

Ex-NOR

b1*b2Bit-Bit Multiplications

64-bit Register C=b1*b2

B1 B2 B1*B2

0 0 10 1 01 0 01 1 1

Page 46: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 46

Increment/Decrement

64-bit Register D

+/- +/- +/-

64-bit Register (D+b1*b2)

8-bit Register b1*b2

1

D = D + b*bT

Eg: Auto-Correlation

Page 47: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 47

Complex-valued Data Processing

Is it easy to add ?

Is this worth an additional ALU Support ?

Typically supported by Software!

?

Page 48: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 48

Truncated Multipliers

Many applications need approximate computations

Adaptive Algorithms :Y = Y + mu*(Y*C)

Truncate lower bits

Truncated Multipliers - half the area/half the delay

Can do 2 truncated multiplies in parallel with regular

Multiplier 1 Multiplier 2Truncated

Multiplier

ALU Multipliers

Page 49: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 49

Software Support

Greater Interaction between Compilers and Architectures– EPIC

– Reconfigurable Logic

Compiler needs to find and exploit bit level computations

Reconfigurable Logic Programming

Page 50: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 50

Other Uses

Reconfigurable Logic– For accelerating loops of general purpose processors

Bit Level Support– For other voice, video and multimedia applications

Page 51: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 51

Software Suggestions

Limited OS Support

Compiler Efficiency – No more Assembly!

Performance Analysis Tools

Code Composer Studio 1.2

Page 52: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 52

Conclusions

DSPs to play major role in Future Base-Station

Search for Computationally Efficient Algorithms and Better

Processor Designs to meet Real-Time

Reduced Complexity Algorithms designed

Processor Core with Reconfigurable Support developed

Page 53: Algorithms and Architecture s for Future Wireless Base-Stations

Extra Slides

Page 54: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 54

PBI Scheme

N- address length

M = 2n Banks

2N-n words in each bank

To access a word,

– n-bit bank number

– N-n bit address (high-order)

Calculation of the n-bit Bank Number

Page 55: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 55

Calculate Bank Number

Use all N bits to get n-bit vector Y = A X , A = n*N matrix of 0’s & 1’s Y = AhXh + Al Xl (N-n,n) [Al -rank n] N-bit parity circuit with logkN levels of XOR gates (k-Fanin)

Parity Ckt.

Row 0 of A

Parity Ckt.

Row 1 of A

Parity Ckt.

Row n-1 of A

N-bit address

Decoder

n parity bit signals

2n bank select signals

Page 56: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 56

Interleaved Memory Model

Address Source

M(0) M(1) M(M-1)

Data Sink Data Sequencer

Input Buffers

Output Buffers

Memory Banks

Page 57: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 57

Aspects of EPIC

Designing Plan of Execution(POE) at Compile Time

Permitting Compiler to play Statistics– Conditional Branches, Memory references

Communicating POE to the hardware– Static Scheduling

– Branch information

Page 58: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 58

Architecture Features in EPIC

Static Scheduling– MultiOP

– Non-Unit Assumed Latency (NUAL)

The Branch Problem– Predicated Execution

– Control Speculation

– Predicated Code Motion

The Memory Problem– Cache Specifiers

– Data Speculation

Page 59: Algorithms and Architecture s for Future Wireless Base-Stations

4/19/00 TI Meeting 59

Operation of Reconfigurable Logic

Load Configuration

– If in configuration cache, minimal time

Copy initial data with coprocessor move instructions

Start execution

Issue wait that interlocks while active

Copy registers back at kernel completion