Notes for Digital Signal Processor

8/6/2019 Notes for Digital Signal Processor

1/25

Cliff Notes for Digital Signal Processor

C54x

1. With a neat diagram explain the important features of TMS320C54x Processors?


2/25

Overview

The C54x DSP has a high degree of operational flexibility and speed. It combines an advanced modified

Harvard architecture (with one program memory bus, three data memory buses, and four address

buses), a CPU with application-specific hardware logic, on-chip memory, on-chip peripherals, and a

highly specialized instruction set

The C54x devices offer these advantages:

Enhanced Harvard architecture built around one program bus, three data buses, and fouraddress buses for increased performance and versatility

Advanced CPU design with a high degree of parallelism and application specific hardware logicfor increased performance

A highly specialized instruction set for faster algorithms and for optimized high-level languageoperation

Modular architecture design for fast development of spinoff devices Advanced IC processing technology for increased performance and low power consumption Low power consumption and increased radiation hardness because of new static design

techniques

Key Features

This section lists the key features of the C54x DSPs.

Key Features - CPU

Advanced multibus architecture with one program bus, three data buses, and four addressbuses

40-bit arithmetic logic unit (ALU), including a 40-bit barrel shifter and two independent 40-bitaccumulators

17-bit 17-bit parallel multiplier coupled to a 40-bit dedicated adder for nonpipelined single-cycle multiply/accumulate (MAC) operation

Compare, select, store unit (CSSU) for the add/compare selection of the Viterbi operator Exponent encoder to compute the exponent of a 40-bit accumulator value in a single cycle Two address generators, including eight auxiliary registers and two auxiliary register arithmetic

units


3/25

Multiple-CPU/core architecture on some devicesKey Features - Memory

192K words 16-bit addressable memory space (64K-words program, 64K-words data, and 64K-words I/O), with extended program memory in the C548, C549, C5402, C5410, and C5420

Key Features - Instruction set

Single-instruction repeat and block repeat operations Block memory move instructions for better program and data management Instructions with a 32-bit long operand Instructions with 2- or 3-operand simultaneous reads Arithmetic instructions with parallel store and parallel load Conditional-store instructions Fast return from interrupt

Key Features - On-chip peripherals

Software-programmable wait-state generator Programmable bank-switching logic On-chip phase-locked loop (PLL) clock generator with internal oscillator or external clock source. External bus-off control to disable the external data bus, address bus, and control signals Data bus with a bus holder feature Programmable timer Available Ports: HPI (Host Port Interface), Synchronous Serial Ports, Buffered Serial Port,

Multichannel Buffered Serial Port, TDM (Time Division Multiplexed Serial Port)

Speed Supported: 25/20/15/12.5/10ns execution time for a single cycle fixed point instruction(40 MIPS/50 MIPS/66 MIPS/80 MIPS/100 MIPS)

Key Features Power

Power consumption control with IDLE 1, IDLE 2, and IDLE 3 instructions for power-down modes Control to disable the CLKOUT signal


4/25

Key Features - Emulation

IEEE Standard 1149.1 boundary scan logic interfaced to on-chip scan-based emulation logicReferences: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1


5/25

2. With a neat diagram explain the bus-architecture of TMS320C54x processors?

The C54xE DSPs use an advanced modified Harvard architecture that maximizes processing power witheight buses.

Separate program and data spaces allow simultaneous access to program instructions and data,

providing a high degree of parallelism. For example, three reads and one write can be performed in a

single cycle. Instructions with parallel store and application-specific instructions fully utilize this

architecture. In addition, data can be transferred between data and program spaces.


6/25

The C54xE DSP architecture is built around eight major 16-bit buses (four program/data buses and four

address buses):

The program bus (PB) carries the instruction code and immediate operands from programmemory.

Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, dataaddress generation logic, program address generation logic, on-chip peripherals, and data

memory.

o The CB and DB carry the operands that are read from data memory.o The EB carries the data to be written to memory.

Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instructionexecution.

The C54x DSP can generate up to two data-memory addresses per cycle using the two auxiliaryregister arithmetic units (ARAU0 and ARAU1).

The PB can carry data operands stored in program space (for instance, a coefficient table) to themultiplier and adder for multiply/accumulate operations or to a destination in data space for

data move instructions (MVPD and READA). This capability, in conjunction with the feature of

dual-operand read, supports the execution of single-cycle, 3-operand instructions such as the

FIRS instruction.

The C54x DSP also has an on-chip bidirectional bus for accessing on-chip peripherals. This bus isconnected to DB and EB through the bus exchanger in the CPU interface. Accesses that use thisbus can require two or more cycles for reads and writes, depending on the peripherals

structure.


7/25

Reference: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1


8/25

3. With a neat diagram explain the architecture of TMS320C54x processors?

The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power witheight buses.

Separate program and data spaces allow simultaneous access to program instructions and data,

providing a high degree of parallelism. For example, three reads and one write can be performed in a

single cycle. Instructions with parallel store and application-specific instructions fully utilize this

architecture. In addition, data can be transferred between data and program spaces. Such parallelism

supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed


9/25

in a single machine cycle. Also, the C54x DSP includes the control mechanisms to manage interrupts,

repeated operations, and function calling.

Bus Structure:

The C54x DSP architecture is built around eight major 16-bit buses (four program/data buses and four

address buses):

The program bus (PB) carries the instruction code and immediate operands from programmemory.

Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, dataaddress generation logic, program address generation logic, on-chip peripherals, and data

memory.

The CB and DB carry the operands that are read from data memory. The EB carries the data to be written to memory. Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction

execution.

Internal Memory Organization:

The C54x DSP memory is organized into three individually selectable spaces: program, data, and I/O

space. The C54x devices can contain random access memory (RAM) and read-only memory (ROM).

Among the devices, the following types of RAM are represented: dual-access RAM (DARAM), single-

access RAM (SARAM), and two-way shared RAM. The DARAM or SARAM can be shared within

subsystems of a multiple-CPU core device. You can configure the DARAM and SARAM as data memoryor program/data memory. The C54x DSP also has 26 CPU registers plus peripheral registers that are

mapped in data-memory space.

Central Processing Unit:

The CPU is common to all C54x devices. The C54x CPU contains:

40-bit arithmetic logic unit (ALU) Two 40-bit accumulators Barrel shifter 17 17-bit multiplier 40-bit adder Compare, select, and store unit (CSSU) Data address generation unit Program address generation unit


10/25

Data Addressing:

The C54x DSP offers seven basic data addressing modes:

Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses accumulator A to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode the lower seven bits of an

address. The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to

determine the actual memory address.

Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing uses the memory-mapped registers without modifying

either the current DP value or the current SP value.

Stack addressing manages adding and removing items from the system stack. During the execution of instructions using direct, indirect, or memory-mapped register

addressing, the data-address generation logic (DAGEN) computes the addresses of data-memory

operands

Pipeline Operation

An instruction pipeline consists of a sequence of operations that occur during the execution of an

instruction. The C54x DSP pipeline has six levels: prefetch, fetch, decode, access, read, and execute. Ateach of the levels, an independent operation occurs. Because these operations are independent, from

one to six instructions can be active in any given cycle, each instruction at a different stage of

completion. Typically, the pipeline is full with a sequential set of instructions, each at one of the six

stages. When a PC discontinuity occurs, such as during a branch, call, or return, one or more stages of

the pipeline may be temporarily unused

Onchip Peripherals

All the C54x devices have a common CPU, but different on-chip peripherals are connected to their CPUs.

The C54x devices may have these, or other, on-chip peripheral options:

General-purpose I/O pins Software-programmable wait-state generator Programmable bank-switching logic Clock generator Timer Direct memory access (DMA) controller Standard serial port


11/25

Time-division multiplexed (TDM) serial port Buffered serial port (BSP) Multichannel buffered serial port (McBSP) Host-port interface

o 8-bit standard (HPI)o

8-bit enhanced (HPI8)o 16-bit enhanced (HPI16)

External Bus Interface

The interfaces external ready input signal and software-generated wait states allow the processor to

interface with memory and I/O devices of many different speeds. The interfaces hold modes allow an

external device to take control of the C54x DSP buses; in this way, an external device can access the

resources in the program, data, and I/O spaces.

IEEE Standard 1149.1 Scanning Logic

The IEEE Standard 1149.1 scanning-logic circuitry is used for emulation and testing purposes only. This

logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-to-

pin continuity as well as to perform operational tests on devices peripheral to the C54x DSP. The IEEE

Standard 1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of

the on-chip resources. Thus, the C54x DSP can perform on-board emulation using the IEEE Standard

1149.1 serial scan pins and the emulation-dedicated pins.


12/25

4. Explain the pipeline stages and phases of any of the DSP?

What is meant by pipelining? Describe briefly the pipeline operation of TMS320C54x

processors?

Processors with pipelining are organized inside into stages which can semi-independently work on

separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another

stage until the job is done. This organization of the processor allows overall processing time to be

significantly reduced.

A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in

each stage. This generally means that the processor's frequency can be increased as the cycle time is

lowered. This happens because there are fewer components in each stage of the pipeline, so the

propagation delay is decreased for the overall stage

Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is

said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fullypipelined has wait cycles that delay the progress of the pipeline.

Advantages of Pipelining

The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more

circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational

circuit.

Disadvantages of Pipelining

A non-pipelined processor executes only a single instruction at a time. This prevents branchdelays (in effect, every branch is delayed) and problems with serial instructions being executed

concurrently. Consequently the design is simpler and cheaper to manufacture.

The instruction latency in a non-pipelined processor is slightly lower than in a pipelinedequivalent. This is because extra flip flops must be added to the data path of a pipelined

processor.

A non-pipelined processor will have a stable instruction bandwidth. The performance of apipelined processor is much harder to predict and may vary more widely between different

programs.

C54x Pipeline

The C54x DSP has a six-level deep instruction pipeline. The six stages of the pipeline are independent of

each other, which allows overlapping execution of instructions. During any given cycle, from one to six

different instructions can be active, each at a different stage of completion.


13/25

The six levels and functions of the pipeline structure are:

Program prefetch: Program address bus (PAB) is loaded with the address of the next instruction to be

fetched.

Program fetch: An instruction word is fetched from the program bus (PB) and loaded into the

instruction register (IR). This completes an instruction fetch sequence that consists of this and the

previous cycle.

Decode: The contents of the instruction register (IR) are decoded to determine the type of memory

access operation and the control sequence at the data-address generation unit (DAGEN) and the CPU.

Access:DAGEN outputs the read operands address on the data address bus, DAB. If a second operand is

required, the other data address bus, CAB, is also loaded with an appropriate address. Auxiliary registers

in indirect addressing mode and the stack pointer (SP) are also updated. This is considered the first of

the 2-stage operand read sequence.

Read: The read data operand(s), if any, are read from the data buses, DB and CB. This completes thetwo-stage operand read sequence. At the same time, the two-stage operand write sequence begins. The

data address of the write operand, if any, is loaded into the data write address bus (EAB). For memory-

mapped registers, the read data operand is read from memory and written into the selected memory-

mapped registers using the DB.

Execute: The operand write sequence is completed by writing the data using the data write bus (EB).

The instruction is executed in this phase.

The first two stages of the pipeline, prefetch and fetch, are the instruction fetch sequence. In one cycle,

the address of a new instruction is loaded. In the following cycle, an instruction word is read. In case of

multiword instructions, several such instruction fetch sequences are needed.


14/25

During the third stage of the pipeline, decode, the fetched instruction is decoded so that appropriate

control sequences are activated for proper execution of the instruction.

The next two pipeline stages, access and read, are an operand read sequence. If required by the

instruction, the data address of one or two operands are loaded in the access phase and the operand or

operands are read in the following read phase.

Any write operation is spread over two stages of the pipeline, the read and execute stages. During the

read phase, the data address of the write operand is loaded onto EAB. In the following cycle, the

operand is written to memory using EB.

Each memory access is performed in two phases by the C54x DSP pipeline. In the first phase, an address

bus is loaded with the memory address. In the second phase, a corresponding data bus reads from or

writes to that memory address.


15/25

5. Explain briefly all the different addressing modes of C54x Processor?

Data addressing

The TMS320C54x DSP offers seven basic addressing modes:

Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses an accumulator to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode an offset relative to DP or to SP.

The offset plus DP or SP determine the actual address in data memory.

Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing modifies the memory-mapped registers without affecting

either the current DP value or the current SP value.

Stack addressing manages adding and removing items from the system stack.Data addressing

Immediate Addressing

In immediate addressing, the instruction syntax contains the specific value of the operand. Two types of

values can be encoded in an instruction:

Short immediate values can be 3, 5, 8, or 9 bits in length. Long immediate values are always 16 bits in length.

Immediate values can be encoded in 1-word or 2-word instructions. The 3-, 5-, 8-, or 9-bit values are

encoded into 1-word instructions; 16-bit values are encoded into 2-word instructions.

Data addressing

Absolute Addressing

There are four types of absolute addressing

Data-memory address (dmad) addressing Program-memory address (pmad) addressing Port address (PA) addressing *(lk) addressing is used with all instructions that support the use of a single data-memory

(Smem) operand

Data Addressing - Accumulator Addressing

Accumulator addressing uses the value in the accumulator as an address. This addressing mode is used

to address program memory as data.

Data Addressing - Direct Addressing

In direct addressing, the instruction contains the lower seven bits of the datamemory address (dma).

The 7-bit dma is an address offset that is combined with a base address, with the data-page pointer


16/25

(DP), or with the stack pointer (SP) to form a 16-bit data-memory address. Using this form of addressing,

you can access any of 128 locations in random order without changing the DP or the SP.

Data Addressing - Indirect Addressing

In indirect addressing, any location in the 64K-word data space can be accessed using the 16-bit address

contained in an auxiliary register. The C54x DSP has eight 16-bit auxiliary registers (AR0AR7). Indirect

addressing is used mainly when there is a need to step through sequential locations in memory in fixed-

size steps.

Data Addressing - Memory-Mapped Register Addressing

Memory-mapped register addressing is used to modify the memory-mapped registers without affecting

either the current data-page pointer (DP) value or the current stack-pointer (SP) value. Because DP and

SP do not need to be modified in this mode, the overhead for writing to a register is minimal. Memory-

mapped register addressing works for both direct and indirect addressing.

Data Addressing - Stack Addressing

The system stack is used to automatically store the program counter during interrupts and subroutines.

It can also be used at your discretion to store additional items of context or to pass data values. The

stack is filled from the highest to the lowest memory address. The processor uses a 16-bit memory-

mapped register, the stack pointer (SP), to address the stack. SP always points to the last element stored

onto the stack.

Program Memory Addressing

Following program control operations that affect the value loaded in the PC:

Branches Calls Returns Conditional operations Repeats of an instruction or a block of instructions Hardware reset Interrupts


17/25

6. Explain with a neat diagram the architecture of 6x series of processors?

The C6000 devices execute up to eight 32-bit instructions per cycle. The C674x CPU consists of 64

general-purpose 32-bit registers and eight functional units. These eight functional units contain:

Two multipliers Six ALUs

Features of the C6000 devices

Advanced VLIW CPU with eight functional units, including two multipliers and six arithmeticunits

o Executes up to eight instructions per cycle for up to ten times the performance of typicalDSPs

o Allows designers to develop highly effective RISC-like code for fast development time


18/25

Instruction packingo Gives code size equivalence for eight instructions executed serially or in parallelo Reduces code size, program fetches, and power consumption

Conditional execution of most instructionso Reduces costly branchingo Increases parallelism for higher sustained performance

Efficient code execution on independent functional units 8/16/32-bit data support, providing efficient memory support for a variety of applications 40-bit arithmetic options add extra precision for vocoders and other computationally intensive

applications

Saturation and normalization provide support for key arithmetic operations Field manipulation and instruction extract, set, clear, and bit counting support common

operation found in control and data manipulation applications.

The VelociTI architecture of the C6000 platform of devices make them the first off-the-shelf DSPs to use

advanced VLIW to achieve high performance through increased instruction-level parallelism. A

traditional VLIW architecture consists of multiple execution units running in parallel, performing

multiple instructions during a single clock cycle. Parallelism is the key to extremely high performance,

taking these DSPs well beyond the performance capabilities of traditional superscalar designs. VelociTI is

a highly deterministic architecture, having few restrictions on how or when instructions are fetched,

executed, or stored. It is this architectural flexibility that is key to the breakthrough efficiency levels of

the TMS320C6000 Optimizing compiler.

The C674x CPU, contains:

Program fetch unit

16/32 bit instruction dispatch unit, advanced instruction packing

Instruction decode unit

Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2)

Two load-from-memory data paths (LD1 and LD2)

Two store-to-memory data paths (ST1 and ST2)

Two data address paths (DA1 and DA2)

Two register file data cross paths (1X and 2X) Two general-purpose register files (A and B)

Control registers

Control logic

Test, emulation, and interrupt logic

Internal DMA (IDMA) for transfers between internal memories


19/25

The program fetch, instruction dispatch, and instruction decode units can deliver up to eight 32-bit

instructions to the functional units every CPU clock cycle. The processing of instructions occurs in each

of the two data paths (A and B), each of which contains four functional units (.L, .S, .M, and .D) and 32

32-bit general-purpose registers.

General-Purpose Register Files

There are two general-purpose register files (A and B) in the CPU data paths. Each of these files contains

32 32-bit registers (A0A31 for file A and B0B31 for file B). The general-purpose registers can be used

for data, data address pointers, or condition registers.

Functional Units

The eight functional units in the C6000 data paths can be divided into two groups of four; each

functional unit in one data path is almost identical to the corresponding unit in the other data path.

Each functional unit has its own 32-bit write port, so all eight units can be used in parallel every cycle,

into a general-purpose register file. All units ending in 1 (for example, .L1) write to register file A, and allunits ending in 2 write to register file B. Each functional unit has two 32-bit read ports for source

operands src1 and src2. Four units (.L1, .L2, .S1, and .S2) have an extra 8-bit-wide port for 40-bit long

writes, as well as an 8-bit input for 40-bit long reads. Since each DSP multiplier can return up to a 64-bit

result, an extra write port has been added from the multipliers to the register file.

Register File Cross Paths

Each functional unit reads directly from and writes directly to the register file within its own data path.

That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to

register file B. The register files are connected to the opposite-side register file's functional units via the

1X and 2X cross paths. These cross paths allow functional units from one data path to access a 32-bitoperand from the opposite side register file. The 1X cross path allows the functional units of data path A

to read their source from register file B, and the 2X cross path allows the functional units of data path B

to read their source from register file A.

Memory, Load, and Store Paths

The DSP supports double word loads and stores. There are four 32-bit paths for loading data from

memory to the register file. For side A, LD1a is the load path for the 32 LSBs and LD1b is the load path

for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the load path for the 32

MSBs. There are also four 32-bit paths for storing register values to memory from each register file. For

side A, ST1a is the write path for the 32 LSBs and ST1b is the write path for the 32 MSBs. For side B, ST2a

is the write path for the 32 LSBs and ST2b is the write path for the 32 MSBs.

Data Address Paths

The data address paths (DA1 and DA2) are each connected to the .D units in both data paths. This allows

data addresses generated by any one path to access data to or from any register. The DA1 and DA2

resources and their associated data paths are specified as T1 and T2, respectively. T1 consists of the DA1

address path and the LD1 and ST1 data paths. For the DSP, LD1 is comprised of LD1a and LD1b to


20/25

support 64-bit loads; ST1 is comprised of ST1a and ST1b to support 64-bit stores. Similarly, T2 consists of

the DA2 address path and the LD2 and ST2 data paths. For the DSP, LD2 is comprised of LD2a and LD2b

to support 64-bit loads; ST2 is comprised of ST2a and ST2b to support 64-bit stores.


21/25

7. What is pipelining? Explain the pipeline stages o TMS320C6x Processors?

Processors with pipelining are organized inside into stages which can semi-independently work on

separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another

stage until the job is done. This organization of the processor allows overall processing time to be

significantly reduced.

A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in

each stage. This generally means that the processor's frequency can be increased as the cycle time is

lowered. This happens because there are fewer components in each stage of the pipeline, so the

propagation delay is decreased for the overall stage

Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is

said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully

pipelined has wait cycles that delay the progress of the pipeline.

Advantages of Pipelining

The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more

circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational

circuit.

Disadvantages of Pipelining

A non-pipelined processor executes only a single instruction at a time. This prevents branchdelays (in effect, every branch is delayed) and problems with serial instructions being executedconcurrently. Consequently the design is simpler and cheaper to manufacture.

The instruction latency in a non-pipelined processor is slightly lower than in a pipelinedequivalent. This is because extra flip flops must be added to the data path of a pipelined

processor.

A non-pipelined processor will have a stable instruction bandwidth. The performance of apipelined processor is much harder to predict and may vary more widely between different

programs.

Highlights of C6000 Pipeline

The pipeline can dispatch eight parallel instructions every cycle. Parallel instructions proceed simultaneously through each pipeline phase. Serial instructions proceed through the pipeline with a fixed relative phase difference between

instructions.

Load and store addresses appear on the CPU boundary during the same pipeline phase,eliminating read-after-write memory conflicts.


22/25

Pipeline Operation Overview

The pipeline phases are divided into three stages:

Fetch

Decode

Execute

All instructions in the DSP instruction set flow through the fetch, decode, and execute stages of the

pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has

two phases for all instructions. The execute stage of the pipeline requires a varying number of phases,

depending on the type of instruction.

Fetch Phase

The fetch phases of the pipeline are:

PG: Program address generatePS: Program address send

PW: Program access ready wait

PR: Program fetch packet receive

The DSP uses a fetch packet (FP) of eight words. All eight of the words proceed through fetch processing

together, through the PG, PS, PW, and PR phases.

During the PG phase, the program address is generated in the CPU. In the PS phase, the program

address is sent to memory. In the PW phase, a memory read occurs. Finally, in the PR phase, the fetch

packet is received at the CPU.

Decode Phase

The decode phases of the pipeline are:

DP: Instruction dispatch

DC: Instruction decode

In the DP phase of the pipeline, the fetch packets are split into execute packets. Execute packets consist

of one instruction or from two to eight parallel instructions. During the DP phase, the instructions in an

execute packet are assigned to the appropriate functional units. In the DC phase, the source registers,

destination registers, and associated paths are decoded for the execution of the instructions in the

functional units.

Execution Phase

The execute portion of the pipeline is subdivided into five phases (E1-E5). Different types of instructions

require different numbers of these phases to complete their execution.


23/25


24/25

8. With a neat diagram explain the core architecture of ADSP 21xx DSP?

ADSP 21xx has following architectural features

Computation unitsmultiplier, ALU, shifter, and data register file Program sequencer with related instruction cache, interval timer, and Data Address Generators

(DAG1 and DAG2)

Dual-blocked SRAM External ports for interfacing to off-chip memory, peripherals, and hosts Input/Output (I/O) processor with integrated DMA controllers, serial ports (SPORTs), serial

peripheral interface (SPI) ports, and a UART port

JTAG Test Access Port for board test and emulationADSP 21xx Bus

ADSP-21xx has three onchip buses - PM bus, DM bus, and DMA bus. The PM bus provides access to

either instructions or data. During a single cycle, these buses let the processor access two data operands

(one from PM and one from DM), and access an instruction (from the cache)


25/25

How ADSP addresses DSP requirements

Fast, flexible arithmetic computation unitso The ADSP-219x family DSPs execute all computational instructions in a single cycle. They

provide both fast cycle times and a complete set of arithmetic operations.

Unconstrained data flow to and from the computation units. The ADSP-219x has a modifiedHarvard architecture combined with a data register file. In every cycle, the DSP can:

o Read two values from memory or write one value to memoryo Complete one computationo Write up to three values back to the register fileo Extended precision and dynamic range in the computation units

40-Bit Extended Precision. The DSP handles 16-bit integer and fractional formats (twos-complement and unsigned). The processors carry extended precision through result registers in

their computation units, limiting intermediate data truncation errors.

Dual address generators with circular buffering supporto Dual Address Generators. The DSP has two data address generators (DAGs) that provide

immediate or indirect (pre- and post-modify) addressing. Modulus and bit-reverse

operations are supported with memory page constraints on data buffer placement only.

Efficient program sequencingo Efficient Program Sequencing. In addition to zero-overhead loops, the DSP supports

quick setup and exit for loops. Loops are both nestable (eight levels in hardware) and

interruptable. The processors support both delayed and non-delayed branches.

Documents

Notes for Digital Signal Processor