Notes for Digital Signal Processor

Embed Size (px)

Citation preview

  • 8/6/2019 Notes for Digital Signal Processor

    1/25

    Cliff Notes for Digital Signal Processor

    C54x

    1. With a neat diagram explain the important features of TMS320C54x Processors?

  • 8/6/2019 Notes for Digital Signal Processor

    2/25

    Overview

    The C54x DSP has a high degree of operational flexibility and speed. It combines an advanced modified

    Harvard architecture (with one program memory bus, three data memory buses, and four address

    buses), a CPU with application-specific hardware logic, on-chip memory, on-chip peripherals, and a

    highly specialized instruction set

    The C54x devices offer these advantages:

    Enhanced Harvard architecture built around one program bus, three data buses, and fouraddress buses for increased performance and versatility

    Advanced CPU design with a high degree of parallelism and application specific hardware logicfor increased performance

    A highly specialized instruction set for faster algorithms and for optimized high-level languageoperation

    Modular architecture design for fast development of spinoff devices Advanced IC processing technology for increased performance and low power consumption Low power consumption and increased radiation hardness because of new static design

    techniques

    Key Features

    This section lists the key features of the C54x DSPs.

    Key Features - CPU

    Advanced multibus architecture with one program bus, three data buses, and four addressbuses

    40-bit arithmetic logic unit (ALU), including a 40-bit barrel shifter and two independent 40-bitaccumulators

    17-bit 17-bit parallel multiplier coupled to a 40-bit dedicated adder for nonpipelined single-cycle multiply/accumulate (MAC) operation

    Compare, select, store unit (CSSU) for the add/compare selection of the Viterbi operator Exponent encoder to compute the exponent of a 40-bit accumulator value in a single cycle Two address generators, including eight auxiliary registers and two auxiliary register arithmetic

    units

  • 8/6/2019 Notes for Digital Signal Processor

    3/25

    Multiple-CPU/core architecture on some devicesKey Features - Memory

    192K words 16-bit addressable memory space (64K-words program, 64K-words data, and 64K-words I/O), with extended program memory in the C548, C549, C5402, C5410, and C5420

    Key Features - Instruction set

    Single-instruction repeat and block repeat operations Block memory move instructions for better program and data management Instructions with a 32-bit long operand Instructions with 2- or 3-operand simultaneous reads Arithmetic instructions with parallel store and parallel load Conditional-store instructions Fast return from interrupt

    Key Features - On-chip peripherals

    Software-programmable wait-state generator Programmable bank-switching logic On-chip phase-locked loop (PLL) clock generator with internal oscillator or external clock source. External bus-off control to disable the external data bus, address bus, and control signals Data bus with a bus holder feature Programmable timer Available Ports: HPI (Host Port Interface), Synchronous Serial Ports, Buffered Serial Port,

    Multichannel Buffered Serial Port, TDM (Time Division Multiplexed Serial Port)

    Speed Supported: 25/20/15/12.5/10ns execution time for a single cycle fixed point instruction(40 MIPS/50 MIPS/66 MIPS/80 MIPS/100 MIPS)

    Key Features Power

    Power consumption control with IDLE 1, IDLE 2, and IDLE 3 instructions for power-down modes Control to disable the CLKOUT signal

  • 8/6/2019 Notes for Digital Signal Processor

    4/25

    Key Features - Emulation

    IEEE Standard 1149.1 boundary scan logic interfaced to on-chip scan-based emulation logicReferences: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1

  • 8/6/2019 Notes for Digital Signal Processor

    5/25

    2. With a neat diagram explain the bus-architecture of TMS320C54x processors?

    The C54xE DSPs use an advanced modified Harvard architecture that maximizes processing power witheight buses.

    Separate program and data spaces allow simultaneous access to program instructions and data,

    providing a high degree of parallelism. For example, three reads and one write can be performed in a

    single cycle. Instructions with parallel store and application-specific instructions fully utilize this

    architecture. In addition, data can be transferred between data and program spaces.

  • 8/6/2019 Notes for Digital Signal Processor

    6/25

    The C54xE DSP architecture is built around eight major 16-bit buses (four program/data buses and four

    address buses):

    The program bus (PB) carries the instruction code and immediate operands from programmemory.

    Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, dataaddress generation logic, program address generation logic, on-chip peripherals, and data

    memory.

    o The CB and DB carry the operands that are read from data memory.o The EB carries the data to be written to memory.

    Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instructionexecution.

    The C54x DSP can generate up to two data-memory addresses per cycle using the two auxiliaryregister arithmetic units (ARAU0 and ARAU1).

    The PB can carry data operands stored in program space (for instance, a coefficient table) to themultiplier and adder for multiply/accumulate operations or to a destination in data space for

    data move instructions (MVPD and READA). This capability, in conjunction with the feature of

    dual-operand read, supports the execution of single-cycle, 3-operand instructions such as the

    FIRS instruction.

    The C54x DSP also has an on-chip bidirectional bus for accessing on-chip peripherals. This bus isconnected to DB and EB through the bus exchanger in the CPU interface. Accesses that use thisbus can require two or more cycles for reads and writes, depending on the peripherals

    structure.

  • 8/6/2019 Notes for Digital Signal Processor

    7/25

    Reference: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1

  • 8/6/2019 Notes for Digital Signal Processor

    8/25

    3. With a neat diagram explain the architecture of TMS320C54x processors?

    The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power witheight buses.

    Separate program and data spaces allow simultaneous access to program instructions and data,

    providing a high degree of parallelism. For example, three reads and one write can be performed in a

    single cycle. Instructions with parallel store and application-specific instructions fully utilize this

    architecture. In addition, data can be transferred between data and program spaces. Such parallelism

    supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed

  • 8/6/2019 Notes for Digital Signal Processor

    9/25

    in a single machine cycle. Also, the C54x DSP includes the control mechanisms to manage interrupts,

    repeated operations, and function calling.

    Bus Structure:

    The C54x DSP architecture is built around eight major 16-bit buses (four program/data buses and four

    address buses):

    The program bus (PB) carries the instruction code and immediate operands from programmemory.

    Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, dataaddress generation logic, program address generation logic, on-chip peripherals, and data

    memory.

    The CB and DB carry the operands that are read from data memory. The EB carries the data to be written to memory. Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction

    execution.

    Internal Memory Organization:

    The C54x DSP memory is organized into three individually selectable spaces: program, data, and I/O

    space. The C54x devices can contain random access memory (RAM) and read-only memory (ROM).

    Among the devices, the following types of RAM are represented: dual-access RAM (DARAM), single-

    access RAM (SARAM), and two-way shared RAM. The DARAM or SARAM can be shared within

    subsystems of a multiple-CPU core device. You can configure the DARAM and SARAM as data memoryor program/data memory. The C54x DSP also has 26 CPU registers plus peripheral registers that are

    mapped in data-memory space.

    Central Processing Unit:

    The CPU is common to all C54x devices. The C54x CPU contains:

    40-bit arithmetic logic unit (ALU) Two 40-bit accumulators Barrel shifter 17 17-bit multiplier 40-bit adder Compare, select, and store unit (CSSU) Data address generation unit Program address generation unit

  • 8/6/2019 Notes for Digital Signal Processor

    10/25

    Data Addressing:

    The C54x DSP offers seven basic data addressing modes:

    Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses accumulator A to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode the lower seven bits of an

    address. The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to

    determine the actual memory address.

    Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing uses the memory-mapped registers without modifying

    either the current DP value or the current SP value.

    Stack addressing manages adding and removing items from the system stack. During the execution of instructions using direct, indirect, or memory-mapped register

    addressing, the data-address generation logic (DAGEN) computes the addresses of data-memory

    operands

    Pipeline Operation

    An instruction pipeline consists of a sequence of operations that occur during the execution of an

    instruction. The C54x DSP pipeline has six levels: prefetch, fetch, decode, access, read, and execute. Ateach of the levels, an independent operation occurs. Because these operations are independent, from

    one to six instructions can be active in any given cycle, each instruction at a different stage of

    completion. Typically, the pipeline is full with a sequential set of instructions, each at one of the six

    stages. When a PC discontinuity occurs, such as during a branch, call, or return, one or more stages of

    the pipeline may be temporarily unused

    Onchip Peripherals

    All the C54x devices have a common CPU, but different on-chip peripherals are connected to their CPUs.

    The C54x devices may have these, or other, on-chip peripheral options:

    General-purpose I/O pins Software-programmable wait-state generator Programmable bank-switching logic Clock generator Timer Direct memory access (DMA) controller Standard serial port

  • 8/6/2019 Notes for Digital Signal Processor

    11/25

    Time-division multiplexed (TDM) serial port Buffered serial port (BSP) Multichannel buffered serial port (McBSP) Host-port interface

    o 8-bit standard (HPI)o

    8-bit enhanced (HPI8)o 16-bit enhanced (HPI16)

    External Bus Interface

    The interfaces external ready input signal and software-generated wait states allow the processor to

    interface with memory and I/O devices of many different speeds. The interfaces hold modes allow an

    external device to take control of the C54x DSP buses; in this way, an external device can access the

    resources in the program, data, and I/O spaces.

    IEEE Standard 1149.1 Scanning Logic

    The IEEE Standard 1149.1 scanning-logic circuitry is used for emulation and testing purposes only. This

    logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-to-

    pin continuity as well as to perform operational tests on devices peripheral to the C54x DSP. The IEEE

    Standard 1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of

    the on-chip resources. Thus, the C54x DSP can perform on-board emulation using the IEEE Standard

    1149.1 serial scan pins and the emulation-dedicated pins.

  • 8/6/2019 Notes for Digital Signal Processor

    12/25

    4. Explain the pipeline stages and phases of any of the DSP?

    What is meant by pipelining? Describe briefly the pipeline operation of TMS320C54x

    processors?

    Processors with pipelining are organized inside into stages which can semi-independently work on

    separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another

    stage until the job is done. This organization of the processor allows overall processing time to be

    significantly reduced.

    A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in

    each stage. This generally means that the processor's frequency can be increased as the cycle time is

    lowered. This happens because there are fewer components in each stage of the pipeline, so the

    propagation delay is decreased for the overall stage

    Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is

    said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fullypipelined has wait cycles that delay the progress of the pipeline.

    Advantages of Pipelining

    The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more

    circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational

    circuit.

    Disadvantages of Pipelining

    A non-pipelined processor executes only a single instruction at a time. This prevents branchdelays (in effect, every branch is delayed) and problems with serial instructions being executed

    concurrently. Consequently the design is simpler and cheaper to manufacture.

    The instruction latency in a non-pipelined processor is slightly lower than in a pipelinedequivalent. This is because extra flip flops must be added to the data path of a pipelined

    processor.

    A non-pipelined processor will have a stable instruction bandwidth. The performance of apipelined processor is much harder to predict and may vary more widely between different

    programs.

    C54x Pipeline

    The C54x DSP has a six-level deep instruction pipeline. The six stages of the pipeline are independent of

    each other, which allows overlapping execution of instructions. During any given cycle, from one to six

    different instructions can be active, each at a different stage of completion.

  • 8/6/2019 Notes for Digital Signal Processor

    13/25

    The six levels and functions of the pipeline structure are:

    Program prefetch: Program address bus (PAB) is loaded with the address of the next instruction to be

    fetched.

    Program fetch: An instruction word is fetched from the program bus (PB) and loaded into the

    instruction register (IR). This completes an instruction fetch sequence that consists of this and the

    previous cycle.

    Decode: The contents of the instruction register (IR) are decoded to determine the type of memory

    access operation and the control sequence at the data-address generation unit (DAGEN) and the CPU.

    Access:DAGEN outputs the read operands address on the data address bus, DAB. If a second operand is

    required, the other data address bus, CAB, is also loaded with an appropriate address. Auxiliary registers

    in indirect addressing mode and the stack pointer (SP) are also updated. This is considered the first of

    the 2-stage operand read sequence.

    Read: The read data operand(s), if any, are read from the data buses, DB and CB. This completes thetwo-stage operand read sequence. At the same time, the two-stage operand write sequence begins. The

    data address of the write operand, if any, is loaded into the data write address bus (EAB). For memory-

    mapped registers, the read data operand is read from memory and written into the selected memory-

    mapped registers using the DB.

    Execute: The operand write sequence is completed by writing the data using the data write bus (EB).

    The instruction is executed in this phase.

    The first two stages of the pipeline, prefetch and fetch, are the instruction fetch sequence. In one cycle,

    the address of a new instruction is loaded. In the following cycle, an instruction word is read. In case of

    multiword instructions, several such instruction fetch sequences are needed.

  • 8/6/2019 Notes for Digital Signal Processor

    14/25

    During the third stage of the pipeline, decode, the fetched instruction is decoded so that appropriate

    control sequences are activated for proper execution of the instruction.

    The next two pipeline stages, access and read, are an operand read sequence. If required by the

    instruction, the data address of one or two operands are loaded in the access phase and the operand or

    operands are read in the following read phase.

    Any write operation is spread over two stages of the pipeline, the read and execute stages. During the

    read phase, the data address of the write operand is loaded onto EAB. In the following cycle, the

    operand is written to memory using EB.

    Each memory access is performed in two phases by the C54x DSP pipeline. In the first phase, an address

    bus is loaded with the memory address. In the second phase, a corresponding data bus reads from or

    writes to that memory address.

  • 8/6/2019 Notes for Digital Signal Processor

    15/25

    5. Explain briefly all the different addressing modes of C54x Processor?

    Data addressing

    The TMS320C54x DSP offers seven basic addressing modes:

    Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses an accumulator to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode an offset relative to DP or to SP.

    The offset plus DP or SP determine the actual address in data memory.

    Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing modifies the memory-mapped registers without affecting

    either the current DP value or the current SP value.

    Stack addressing manages adding and removing items from the system stack.Data addressing

    Immediate Addressing

    In immediate addressing, the instruction syntax contains the specific value of the operand. Two types of

    values can be encoded in an instruction:

    Short immediate values can be 3, 5, 8, or 9 bits in length. Long immediate values are always 16 bits in length.

    Immediate values can be encoded in 1-word or 2-word instructions. The 3-, 5-, 8-, or 9-bit values are

    encoded into 1-word instructions; 16-bit values are encoded into 2-word instructions.

    Data addressing

    Absolute Addressing

    There are four types of absolute addressing

    Data-memory address (dmad) addressing Program-memory address (pmad) addressing Port address (PA) addressing *(lk) addressing is used with all instructions that support the use of a single data-memory

    (Smem) operand

    Data Addressing - Accumulator Addressing

    Accumulator addressing uses the value in the accumulator as an address. This addressing mode is used

    to address program memory as data.

    Data Addressing - Direct Addressing

    In direct addressing, the instruction contains the lower seven bits of the datamemory address (dma).

    The 7-bit dma is an address offset that is combined with a base address, with the data-page pointer

  • 8/6/2019 Notes for Digital Signal Processor

    16/25

    (DP), or with the stack pointer (SP) to form a 16-bit data-memory address. Using this form of addressing,

    you can access any of 128 locations in random order without changing the DP or the SP.

    Data Addressing - Indirect Addressing

    In indirect addressing, any location in the 64K-word data space can be accessed using the 16-bit address

    contained in an auxiliary register. The C54x DSP has eight 16-bit auxiliary registers (AR0AR7). Indirect

    addressing is used mainly when there is a need to step through sequential locations in memory in fixed-

    size steps.

    Data Addressing - Memory-Mapped Register Addressing

    Memory-mapped register addressing is used to modify the memory-mapped registers without affecting

    either the current data-page pointer (DP) value or the current stack-pointer (SP) value. Because DP and

    SP do not need to be modified in this mode, the overhead for writing to a register is minimal. Memory-

    mapped register addressing works for both direct and indirect addressing.

    Data Addressing - Stack Addressing

    The system stack is used to automatically store the program counter during interrupts and subroutines.

    It can also be used at your discretion to store additional items of context or to pass data values. The

    stack is filled from the highest to the lowest memory address. The processor uses a 16-bit memory-

    mapped register, the stack pointer (SP), to address the stack. SP always points to the last element stored

    onto the stack.

    Program Memory Addressing

    Following program control operations that affect the value loaded in the PC:

    Branches Calls Returns Conditional operations Repeats of an instruction or a block of instructions Hardware reset Interrupts

  • 8/6/2019 Notes for Digital Signal Processor

    17/25

    6. Explain with a neat diagram the architecture of 6x series of processors?

    The C6000 devices execute up to eight 32-bit instructions per cycle. The C674x CPU consists of 64

    general-purpose 32-bit registers and eight functional units. These eight functional units contain:

    Two multipliers Six ALUs

    Features of the C6000 devices

    Advanced VLIW CPU with eight functional units, including two multipliers and six arithmeticunits

    o Executes up to eight instructions per cycle for up to ten times the performance of typicalDSPs

    o Allows designers to develop highly effective RISC-like code for fast development time

  • 8/6/2019 Notes for Digital Signal Processor

    18/25

    Instruction packingo Gives code size equivalence for eight instructions executed serially or in parallelo Reduces code size, program fetches, and power consumption

    Conditional execution of most instructionso Reduces costly branchingo Increases parallelism for higher sustained performance

    Efficient code execution on independent functional units 8/16/32-bit data support, providing efficient memory support for a variety of applications 40-bit arithmetic options add extra precision for vocoders and other computationally intensive

    applications

    Saturation and normalization provide support for key arithmetic operations Field manipulation and instruction extract, set, clear, and bit counting support common

    operation found in control and data manipulation applications.

    The VelociTI architecture of the C6000 platform of devices make them the first off-the-shelf DSPs to use

    advanced VLIW to achieve high performance through increased instruction-level parallelism. A

    traditional VLIW architecture consists of multiple execution units running in parallel, performing

    multiple instructions during a single clock cycle. Parallelism is the key to extremely high performance,

    taking these DSPs well beyond the performance capabilities of traditional superscalar designs. VelociTI is

    a highly deterministic architecture, having few restrictions on how or when instructions are fetched,

    executed, or stored. It is this architectural flexibility that is key to the breakthrough efficiency levels of

    the TMS320C6000 Optimizing compiler.

    The C674x CPU, contains:

    Program fetch unit

    16/32 bit instruction dispatch unit, advanced instruction packing

    Instruction decode unit

    Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2)

    Two load-from-memory data paths (LD1 and LD2)

    Two store-to-memory data paths (ST1 and ST2)

    Two data address paths (DA1 and DA2)

    Two register file data cross paths (1X and 2X) Two general-purpose register files (A and B)

    Control registers

    Control logic

    Test, emulation, and interrupt logic

    Internal DMA (IDMA) for transfers between internal memories

  • 8/6/2019 Notes for Digital Signal Processor

    19/25

    The program fetch, instruction dispatch, and instruction decode units can deliver up to eight 32-bit

    instructions to the functional units every CPU clock cycle. The processing of instructions occurs in each

    of the two data paths (A and B), each of which contains four functional units (.L, .S, .M, and .D) and 32

    32-bit general-purpose registers.

    General-Purpose Register Files

    There are two general-purpose register files (A and B) in the CPU data paths. Each of these files contains

    32 32-bit registers (A0A31 for file A and B0B31 for file B). The general-purpose registers can be used

    for data, data address pointers, or condition registers.

    Functional Units

    The eight functional units in the C6000 data paths can be divided into two groups of four; each

    functional unit in one data path is almost identical to the corresponding unit in the other data path.

    Each functional unit has its own 32-bit write port, so all eight units can be used in parallel every cycle,

    into a general-purpose register file. All units ending in 1 (for example, .L1) write to register file A, and allunits ending in 2 write to register file B. Each functional unit has two 32-bit read ports for source

    operands src1 and src2. Four units (.L1, .L2, .S1, and .S2) have an extra 8-bit-wide port for 40-bit long

    writes, as well as an 8-bit input for 40-bit long reads. Since each DSP multiplier can return up to a 64-bit

    result, an extra write port has been added from the multipliers to the register file.

    Register File Cross Paths

    Each functional unit reads directly from and writes directly to the register file within its own data path.

    That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to

    register file B. The register files are connected to the opposite-side register file's functional units via the

    1X and 2X cross paths. These cross paths allow functional units from one data path to access a 32-bitoperand from the opposite side register file. The 1X cross path allows the functional units of data path A

    to read their source from register file B, and the 2X cross path allows the functional units of data path B

    to read their source from register file A.

    Memory, Load, and Store Paths

    The DSP supports double word loads and stores. There are four 32-bit paths for loading data from

    memory to the register file. For side A, LD1a is the load path for the 32 LSBs and LD1b is the load path

    for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the load path for the 32

    MSBs. There are also four 32-bit paths for storing register values to memory from each register file. For

    side A, ST1a is the write path for the 32 LSBs and ST1b is the write path for the 32 MSBs. For side B, ST2a

    is the write path for the 32 LSBs and ST2b is the write path for the 32 MSBs.

    Data Address Paths

    The data address paths (DA1 and DA2) are each connected to the .D units in both data paths. This allows

    data addresses generated by any one path to access data to or from any register. The DA1 and DA2

    resources and their associated data paths are specified as T1 and T2, respectively. T1 consists of the DA1

    address path and the LD1 and ST1 data paths. For the DSP, LD1 is comprised of LD1a and LD1b to

  • 8/6/2019 Notes for Digital Signal Processor

    20/25

    support 64-bit loads; ST1 is comprised of ST1a and ST1b to support 64-bit stores. Similarly, T2 consists of

    the DA2 address path and the LD2 and ST2 data paths. For the DSP, LD2 is comprised of LD2a and LD2b

    to support 64-bit loads; ST2 is comprised of ST2a and ST2b to support 64-bit stores.

  • 8/6/2019 Notes for Digital Signal Processor

    21/25

    7. What is pipelining? Explain the pipeline stages o TMS320C6x Processors?

    Processors with pipelining are organized inside into stages which can semi-independently work on

    separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another

    stage until the job is done. This organization of the processor allows overall processing time to be

    significantly reduced.

    A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in

    each stage. This generally means that the processor's frequency can be increased as the cycle time is

    lowered. This happens because there are fewer components in each stage of the pipeline, so the

    propagation delay is decreased for the overall stage

    Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is

    said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully

    pipelined has wait cycles that delay the progress of the pipeline.

    Advantages of Pipelining

    The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more

    circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational

    circuit.

    Disadvantages of Pipelining

    A non-pipelined processor executes only a single instruction at a time. This prevents branchdelays (in effect, every branch is delayed) and problems with serial instructions being executedconcurrently. Consequently the design is simpler and cheaper to manufacture.

    The instruction latency in a non-pipelined processor is slightly lower than in a pipelinedequivalent. This is because extra flip flops must be added to the data path of a pipelined

    processor.

    A non-pipelined processor will have a stable instruction bandwidth. The performance of apipelined processor is much harder to predict and may vary more widely between different

    programs.

    Highlights of C6000 Pipeline

    The pipeline can dispatch eight parallel instructions every cycle. Parallel instructions proceed simultaneously through each pipeline phase. Serial instructions proceed through the pipeline with a fixed relative phase difference between

    instructions.

    Load and store addresses appear on the CPU boundary during the same pipeline phase,eliminating read-after-write memory conflicts.

  • 8/6/2019 Notes for Digital Signal Processor

    22/25

    Pipeline Operation Overview

    The pipeline phases are divided into three stages:

    Fetch

    Decode

    Execute

    All instructions in the DSP instruction set flow through the fetch, decode, and execute stages of the

    pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has

    two phases for all instructions. The execute stage of the pipeline requires a varying number of phases,

    depending on the type of instruction.

    Fetch Phase

    The fetch phases of the pipeline are:

    PG: Program address generatePS: Program address send

    PW: Program access ready wait

    PR: Program fetch packet receive

    The DSP uses a fetch packet (FP) of eight words. All eight of the words proceed through fetch processing

    together, through the PG, PS, PW, and PR phases.

    During the PG phase, the program address is generated in the CPU. In the PS phase, the program

    address is sent to memory. In the PW phase, a memory read occurs. Finally, in the PR phase, the fetch

    packet is received at the CPU.

    Decode Phase

    The decode phases of the pipeline are:

    DP: Instruction dispatch

    DC: Instruction decode

    In the DP phase of the pipeline, the fetch packets are split into execute packets. Execute packets consist

    of one instruction or from two to eight parallel instructions. During the DP phase, the instructions in an

    execute packet are assigned to the appropriate functional units. In the DC phase, the source registers,

    destination registers, and associated paths are decoded for the execution of the instructions in the

    functional units.

    Execution Phase

    The execute portion of the pipeline is subdivided into five phases (E1-E5). Different types of instructions

    require different numbers of these phases to complete their execution.

  • 8/6/2019 Notes for Digital Signal Processor

    23/25

  • 8/6/2019 Notes for Digital Signal Processor

    24/25

    8. With a neat diagram explain the core architecture of ADSP 21xx DSP?

    ADSP 21xx has following architectural features

    Computation unitsmultiplier, ALU, shifter, and data register file Program sequencer with related instruction cache, interval timer, and Data Address Generators

    (DAG1 and DAG2)

    Dual-blocked SRAM External ports for interfacing to off-chip memory, peripherals, and hosts Input/Output (I/O) processor with integrated DMA controllers, serial ports (SPORTs), serial

    peripheral interface (SPI) ports, and a UART port

    JTAG Test Access Port for board test and emulationADSP 21xx Bus

    ADSP-21xx has three onchip buses - PM bus, DM bus, and DMA bus. The PM bus provides access to

    either instructions or data. During a single cycle, these buses let the processor access two data operands

    (one from PM and one from DM), and access an instruction (from the cache)

  • 8/6/2019 Notes for Digital Signal Processor

    25/25

    How ADSP addresses DSP requirements

    Fast, flexible arithmetic computation unitso The ADSP-219x family DSPs execute all computational instructions in a single cycle. They

    provide both fast cycle times and a complete set of arithmetic operations.

    Unconstrained data flow to and from the computation units. The ADSP-219x has a modifiedHarvard architecture combined with a data register file. In every cycle, the DSP can:

    o Read two values from memory or write one value to memoryo Complete one computationo Write up to three values back to the register fileo Extended precision and dynamic range in the computation units

    40-Bit Extended Precision. The DSP handles 16-bit integer and fractional formats (twos-complement and unsigned). The processors carry extended precision through result registers in

    their computation units, limiting intermediate data truncation errors.

    Dual address generators with circular buffering supporto Dual Address Generators. The DSP has two data address generators (DAGs) that provide

    immediate or indirect (pre- and post-modify) addressing. Modulus and bit-reverse

    operations are supported with memory page constraints on data buffer placement only.

    Efficient program sequencingo Efficient Program Sequencing. In addition to zero-overhead loops, the DSP supports

    quick setup and exit for loops. Loops are both nestable (eight levels in hardware) and

    interruptable. The processors support both delayed and non-delayed branches.