1 Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603...

Preview:

Citation preview

1

Architectural Analysis of a DSP Device,

the Instruction Set and the Addressing Modes

SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications

Miodrag Bolic

2

Outline

• FIR filter on ADPS-21x

DSP Requirements• Fast Multiply-Accumulates (Data-path)• Extended Precision Accumulator Register (Data-path)• Dual Operand Fetch (Memory)• Circular Buffering (Addressing)• Zero-Overhead Looping (Instruction set)

Analog Devices Architectures and Programming• SHARC• Blackfin• Performance Optimization

3

ADSP -21x

Copied from [Kester03]

4

CALCULATING OUTPUTS OF 4-TAP FIR FILTER USING A CIRCULAR BUFFER

y(3) = h(0) x(3) + h(1) x(2) + h(2) x(1) + h(3) x(0)

y(4) = h(0) x(4) + h(1) x(3) + h(2) x(2) + h(3) x(1)

y(5) = h(0) x(5) + h(1) x(4) + h(2) x(3) + h(3) x(2)

MemoryLocation

0

1

2

3

Read

x(0)

x(1)

x(2)

x(3)

Write

x(4)

Read

x(4)

x(1)

x(2)

x(3)

Write

x(5)

Read

x(4)

x(5)

x(2)

x(3)

Copied from [Kester03]

5

FIR filter steps

1. Obtain a sample with the ADC; generate an interrupt

2. Detect and manage the interrupt

3. Move the sample into the input signal's circular buffer

4. Update the pointer for the input signal's circular buffer

5. Zero the accumulator

6. Control the loop through each of the coefficients

7. Fetch the coefficient from the coefficient's circular buffer

8. Update the pointer for the coefficient's circular buffer

9. Fetch the sample from the input signal's circular buffer

10. Update the pointer for the input signal's circular buffer

11. Multiply the coefficient by the sample

12. Add the product to the accumulator

13. Move the output sample (accumulator) to a holding buffer

14. Move the output sample from the holding buffer to the DAC

Copied from [Kester03]

6

FIR filter steps (cont.)

ADSP21xx Example code:

CNTR = N-1;DO convolution UNTIL CE;convolution: MR = MR + MX0 * MY0(SS), MX0 = DM(I0,M1), MY0 = PM(I4,M5);

SingleCycle

Instruction

Copied from [Kester03]

7

Outline

• FIR filter on ADPS-21x

DSP Requirements• Fast Multiply-Accumulates (Data-path)• Extended Precision Accumulator Register (Data-path)• Dual Operand Fetch (Memory)• Circular Buffering (Addressing)• Zero-Overhead Looping (Instruction set)

Analog Devices Architectures and Programming• SHARC• Blackfin• Performance Optimization

8Copied from [Takala05]

9Copied from [Takala05]

10

Motorola DSP5600X

Copied from [Takala05]

11Copied from [Takala05]

12Copied from [Takala05]

13

ADSP -21x

MAC

www.analog.com/dsp

14Copied from [Takala05]

15

SHARC Architecture ADSP-2106X

Copied from [Takala05]

16

Outline

• FIR filter on ADPS-21x

DSP Requirements• Fast Multiply-Accumulates (Data-path)• Extended Precision Accumulator Register (Data-path)• Dual Operand Fetch (Memory)• Circular Buffering (Addressing)• Zero-Overhead Looping (Instruction set)

Analog Devices Architectures and Programming• SHARC• Blackfin• Performance Optimization

17Copied from [Takala05]

18Copied from [Takala05]

19Copied from [Takala05]

20

Outline

• FIR filter on ADPS-21x

DSP Requirements• Fast Multiply-Accumulates (Data-path)• Extended Precision Accumulator Register (Data-path)• Dual Operand Fetch (Memory)• Circular Buffering (Addressing)• Zero-Overhead Looping (Instruction set)

Analog Devices Architectures and Programming• SHARC• Blackfin• Performance Optimization

21Copied from [Takala05]

22Copied from [Takala05]

23

Hardware loops

• Software loop:MOVE #16,B Initialize loop counter B

LOOP: MAC (R0)+,(R4)+,A Register-indirect addressing with post-increment

DEC B

JNE LOOP

• Hardware loops: no time is spent on – Decrementing counters– Checking to see if the loop is finished– Branching back to the top of the loop

RPT #16

MAC (R0)+,(R4)+,A

[Lapsley97]

24Copied from [Kester03]

25

Upto 3000MMACS• Image compression• Digital Still/Video Camera• MMOIP• Telematics• Biometrics

Upto 160MMACS• Wired Voice• Wireless Voice• VOIP/VON• Industrial Control

ADSP-218x/9xADSP-218x/9xPower EfficientPower Efficient

$5 - $10$5 - $10

ADSP-218x/9xADSP-218x/9xPower EfficientPower Efficient

$5 - $10$5 - $10

Upto 4800MMACS (16-bit) or 1200MMACS (32-bit)

• 2.5G/3G Infrastructure• Medical Imaging

• Industrial Imaging• Multiprocessing

TigerSHARCTigerSHARCHigh-PerformanceHigh-Performance

$35 - $200$35 - $200

TigerSHARCTigerSHARCHigh-PerformanceHigh-Performance

$35 - $200$35 - $200

Per

form

ance

Blackfin Blackfin Media EnabledMedia Enabled

$5 - $30$5 - $30

Blackfin Blackfin Media EnabledMedia Enabled

$5 - $30$5 - $30

ADI General Purpose DSP Product Families

Upto 600MMACS (32-bit)

• Audio

• Infotainment

• Industrial

SHARCSHARCLow-CostLow-Cost

Floating PointFloating Point$10 - $100$10 - $100

SHARCSHARCLow-CostLow-Cost

Floating PointFloating Point$10 - $100$10 - $100

www.analog.com/dsp

26

Outline

• FIR filter on ADPS-21x

DSP Requirements• Fast Multiply-Accumulates (Data-path)• Extended Precision Accumulator Register (Data-path)• Dual Operand Fetch (Memory)• Circular Buffering (Addressing)• Zero-Overhead Looping (Instruction set)

Analog Devices Architectures and Programming• SHARC• Blackfin• Performance Optimization

27

SHARC Architecture

Copied from [Smith97]

28

SHARC Architecture - Features

• The SSuper HHarvard ARCARChitecture• 100MHz Core / 300 MFLOPS Peak• Parallel Operation of: Multiplier, ALU, 2 Address Generators &

Sequencer– No Arithmetic Pipeline; All Computations Are Single-Cycle

• High Precision and Extended Dynamic Range– 32/40-Bit IEEE Floating-Point Math

– 32-Bit Fixed-Point MAC’s with 64-Bit Product & 80-Bit Accumulation

• Single-Cycle Transfers with Dual-Ported Memory Structures– Supported by Cache Memory and Enhanced HarvardArchitecture

• Glueless Multiprocessing Features• JTAG Test and Emulation Port• DMA Controller, Serial Ports, Link Ports, External Bus, SDRAM

Controller, Timers

www.analog.com/dsp

29

ADSP-2106x Core ArchitectureADSP-2106x Core Architecture

DAG 2

8 x 4 x 24

DAG 1

8 x 4 x 32

CACHE

MEMORY

32 x 48

PROGRAM

SEQUENCER

PMD BUS

DMD BUS

24PMA BUS

PMD

DMD

PMA

32DMA BUSDMA

48

40

JTAG TEST &

EMULATION

FLAGS

FLOATING & FIXED-POINT

MULTIPLIER,

FIXED-POINT

ACCUMULATOR

32-BIT

BARREL

SHIFTER

FLOATING-POINT

& FIXED-POINT

ALU

REGISTER

FILE

16 x 40

BUS CONNECT

TIMER

www.analog.com/dsp

30

Example- Dot product

• C code

Copied from [Smith97]

31

Example- Dot product - Assembly

Copied from [Smith97]

32

Example- Dot product - Assembly

Copied from [Smith97]

33

C or Assembly

• How complicated is the program?• Are you pushing the maximum speed of the DSP?• How many programmers will be working together?• Which is more important, product cost or development

cost?• What is your background?• What does the DSP's manufacturer suggest you use?

Copied from [Smith97]

34

Outline

• FIR filter on ADPS-21x

DSP Requirements• Fast Multiply-Accumulates (Data-path)• Extended Precision Accumulator Register (Data-path)• Dual Operand Fetch (Memory)• Circular Buffering (Addressing)• Zero-Overhead Looping (Instruction set)

Analog Devices Architectures and Programming• SHARC• Blackfin• Performance Optimization

35

BLACKfin Processor Core

Acc1

40BarrelShifter

Acc0

40

16168 8 8 8

Address Arithmetic Unit

DAG0 DAG1

I3 L3 B3 M3I2 L2 B2 M2I1 L1 B1 M1I0 L0 B0 M0

P0P1P2P3P4P5FPSP

R0R1R2R3R4R5R6R7

Data Arithmetic Unit

Sequencer

Two 16-bit MultipliersTwo 40-bit ALUs, Four 8-bit Video ALUsBarrel ShifterSixteen 16-bit /Eight 32-bit Math Registers

Two DAGs, byte addressingEight 32-bit pointer registersFour Sets of 32-bit Index, Modify, Length, Base

16-bit Instructions, 32-bit InstructionsMulti-Issue, 64-bit Instructions

Interlocked PipelineMicro Signal Architecture, developed with Intel

www.analog.com/dsp

36

ADSP-BF535 BLACKfin Processor Architecture

Great Performance Value• Highest Frequency (350

MHz) • 1.0V to 1.6V • 260 PBGA

High System Integration• Address range 768Mbytes• SPORTs support 8

Channels of I2S Audio• (532Mbps) I/O Bandwidth,

DMA Bandwidth & Memory Bandwidth

• Microcontroller features include WDT, PCI, USB1.1 SDRAM controller

To 350 MHzBLACKfin

Processor Core

SDRAM

FLASH/SRAM

Interfaces

Real Time Clock

Watchdog

JTAG

System Peripherals

308 KbytesOn-ChipSRAM

DMA

SPI 2

UART 2

Timers 3 (32bit)

GPIO 16

User Peripherals

Dynamic Power

Management

SPORTs 2

PCI

Memory

PLL

264KbytesOn-ChipSRAM

48 KbytesOn-ChipCache

USB 1.1

www.analog.com/dsp

42Seminars about Blackfin

43Seminars about Blackfin

44Seminars about Blackfin

45Seminars about Blackfin

46Seminars about Blackfin

47Seminars about Blackfin

48Seminars about Blackfin

49Seminars about Blackfin

50Seminars about Blackfin