42
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 [email protected]

RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 [email protected]

Embed Size (px)

Citation preview

Page 1: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Implementing the Viterbi algorithm on programmable processors

Sridhar Rajagopal

Elec [email protected]

Page 2: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Motivation

Viterbi decoding - One of the major bottlenecks in baseband processing [PHY]

Need for flexibility in the algorithm parameters due to different protocols “read programmable”

No architecture developed yet to meet real-time requirements of 3G systems.

2 - 8 Mbps range for wideband CDMA

100 Mbps range for wireless LAN

Page 3: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Today

Background

Advanced DSP architectures -- TI C6x [15]

Viterbi algorithm basics [10]

Viterbi on TI DSPs [10]

A programmable processor specifically designed for Viterbi [15]

Page 4: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

VLIW [Very Long Instruction Word] arch.Similar to a vector processor -- butmultiple instructions -> multiple Func. UnitsFU’s are not all the same

32-bit architecture 8 functional units

TI C6x architecture

Inst 1 Inst 2 Inst 3 Inst 4

FU 1 FU 2 FU 3 FU 4

4-wide VLIW

Page 5: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Page 6: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

8 VelociTI principles

Parallel fetch, decode and execute

Pipelined enough to make ADD critical path

Instructions based on RISC

Load - Store architecture

Orthogonal - Instruction Set and Reg. File

Determinism

Conditional Instructions

Instruction Packing

Page 7: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

2 * 4 = 8 Functional Units

.M Multiplication unit

16 bit x 16 bit signed/# packed/# .L arithmetic Logic unit

Comparisons and logic operationsSaturation arithmetic and absolute value

.S Shifter unitBit manipulation (set, get, shift, rotate)Branching, addition and packed addition

.D Data unit Load/store to memoryAddition and pointer arithmetic

Page 8: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

How powerful am I?

8 instructions per cycle

Max: 6 adds per cycle2 multiplies per cycle2 load/stores per cycle2 branches per cycle

Idea is you will be using instructions in these ratios to get full FU utilization.

Page 9: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

C6x DSP Core

Page 10: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

C6x Datapath

Page 11: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

C6x Resource Constraints

Instructions using the same FU1 inst. / FU

Cross Pathsonly 1 operand from other reg. file to (L,S,M)

Loads and stores2 loads and stores from 2 different reg. files

Reads and writesmax 4-reads from the same registerNo 2 writes to the same register :)

Page 12: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Instruction Packing

Fetch Packet Execute Packet

Avoid NOPs in the instruction code Multi-cycle NOPs if absolutely necessary LSB- “p” bit of instruction for packing

A || B || C ,D || E, F, G || H8 instructions instead of 32

A B C D

1 1 0 1

E F G H

0 0 1 0

Page 13: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Conditional Instructions

All instructions can be conditioned based on the value in registers A1,A2,B0,B1,B2

Avoids branch latencies

If condition not met by end of first phase of execution, results not written back to reg. file

Conditional loads/stores squashed before data phase

Page 14: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

C6x Pipeline

Fetch (if necessary) - 4 phasesAddress GenerateAddress SendAccess Ready WaitFetch Packet Receive

Decode - 2 phasesInstruction dispatch (if necessary)Instruction decode

Execute - 10 phases Most 1 phase

Page 15: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Some interesting instructions

Saturation Bit-counting -- Image coding Integer-comparison Bit-manipulation Seed generation for reciprocal instructions

Page 16: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Other details

64 KB internal program and data DMA - peripherals to memory

Intrinsics in code for better programmingsimilar to using “ViS” in UltraSPARCSoftware pipelining of loops

PERFORMANCE:5-10X higher clock -- higher pipeline (2-4X) Additional ALUs

Page 17: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Additional features in C64x

SIMD support

Communication-specific instructions

interleaving, galois field multiply

Bit count and rotate hardware

64 32-bit registers

Lower resource constraints

No more NOPs needed ever [no boundaries]

Page 18: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

C64x DSP Core

Page 19: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Today

Background

Advanced DSP architectures -- TI C6x [15]

Viterbi algorithm basics [10]

Viterbi on TI DSPs [10]

A programmable processor specifically designed for Viterbi [15]

Page 20: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Viterbi Decoding

Encoder Decoderk kn > k n

Rate k/n = 1/2 Convolutional Encoder

Page 21: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Error Protection

States = 2^(FFs) = 2^(Constraint Length - 1) Cannot go from any state to any state

Page 22: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Trellis for decoding

Page 23: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Trellis for an input sequence

Page 24: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Error detection

Branch metric = “Distance” between received symbol pair and possible symbol pairs

Path metric = Accumulated error metric

Page 25: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Error-correction

Page 26: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Stages in Viterbi Decoding

Calculate Branch metrics for all states every stage

Update Path metrics for all states every stage

At the end, Traceback the trellis to get the decoded bits

Page 27: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Computations

Branch metrics:Hamming distance: (XOR) and Count 1’sEuclidean distance: squared distance

Path metrics:Add Branch metrics to existing path metricsCompare for minimum and Select minimum

Survivor Traceback:Linked list /Pointer chasing

Memory Intensive / Sequential Operations

Page 28: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Today

Background

Advanced DSP architectures -- TI C6x [15]

Viterbi algorithm basics [10]

Viterbi on TI DSPs [10]

A programmable processor specifically designed for Viterbi [15]

Page 29: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Viterbi support in different processors

C54xSpecial hardware acceleratorACS unit with 2 ACC and split ALUViterbi butterfly (2 ACS) in 4 cycles

C62xnothing special

C6416Viterbi coprocessorK = 5-9,Rate = 1/2,1/3,1/4

Page 30: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Viterbi Coprocessor in C6416

Page 31: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Viterbi Coprocessor in C6416

SM, SD and HD memory not accessible to DSP

Page 32: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Today

Background

Advanced DSP architectures -- TI C6x [15]

Viterbi algorithm basics [10]

Viterbi on TI DSPs [10]

A programmable processor specifically designed for Viterbi [15]

Page 33: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Need for VSP architecture

Large amount of memory access

Traceback decoding

Not efficient on a GPP

Program instructions in a GPP is of a higher order than complexity of the algorithm

Page 34: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

VSP architecture

Page 35: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Branch Metric Calculation

Page 36: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Path Metric Calculation

Page 37: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Traceback Unit

Page 38: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Traceback with survivor updates

Start Filling the Trellis

Start Traceback5*Constraint Length

Symbol Decoded

Update Survivor Path for most recent symbol

Page 39: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Survivor Path Updates

Page 40: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Circular updates

Page 41: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Software Programming

Small but specialized instruction setLOAD, ACS

Shorter execution time All 3 subprocessors programmed independently

10 ns, (100 MHz) in 1990 to get 1.5 Mbps

Page 42: RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu

RICE UNIVERSITY

Conclusions

Viterbi algorithm important for implementation in a programmable communication receiver

Approaches have been as co-processor support to DSPs or specialized processors.

We are yet to design programmable processors that meet real-time requirements for 100 Mbps applications.