18
PowerPC 601 Stephen Tam

PowerPC 601

  • Upload
    duscha

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

PowerPC 601. Stephen Tam. To be tackled today. Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache Unit Memory Management Unit (MMU) Pipeline Structure Instruction buffer Multiply-Add Benchmark. PowerPC Processors. - PowerPoint PPT Presentation

Citation preview

Page 1: PowerPC 601

PowerPC 601

Stephen Tam

Page 2: PowerPC 601

To be tackled today

Architecture Execution Units

Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit

Cache Unit Memory Management Unit (MMU) Pipeline Structure Instruction buffer Multiply-Add Benchmark

Page 3: PowerPC 601

PowerPC Processors

The PowerPC 6XX line of microprocessors from IBM, Motorola and Apple viewed that personal PC’s would be required to fulfill and accommodate more

power and resource intensive applications such as those associated with multimedia.

Four implementations of the PowerPC architecture were initially announced:

PowerPC 601 - Original PowerPC microprocessor PowerPC 603 - Low-cost, least powerful and consumes the least amount of

power PowerPC 604 - Faster, higher performance. PowerPC 620 - The first 64-bit implementation of the PowerPC architecture.

The PowerPC 601 is a high performance super-scalar processor implementing 3 independent execution units and 2 register files

Execution (pipeline processing) units: Integer Unit (IU) or Fixed-Point Unit (FXU) Floating Point Unit (FPU) Branch Processing Unit (BPU)

Page 4: PowerPC 601

Features

  PowerPC 601Basic architecture Load/store

Instruction length 32 bit

Byte/halfword load and store Yes

Condition codes Yes

Conditional moves No

# of Integer registers 32

Integer register size 32/64 bit

# of Floating point registers 32

Floating point register size 64 bit

Floating point format IEEE 32 bit, 64 bit

Virtual address 52-80 bit

32/64 bit mode bit Yes

Segmentation Yes

Page Size 4 Kbytes

Instruction/data cache size 32 Kbytes

Clock speed 50-100 MHz

Page 5: PowerPC 601

PowerPC 601 Architecture

there are wide buses for memory, internal processor transfers, registers and on-board processing units.

Page 6: PowerPC 601

Fixed-Point (Integer) Unit & Floating-Point Unit

FXU(IU) Executes one instruction at a time

Most instructions are single cycle instructions Interfaces with cache and MMU

FPU Contains:

a) Single precision multiply-add arrayb) Floating-point status and control registerc) 32 64-bit registers

Buffers 2 extra instructions when FPU is busy Supports IEEE 754 FP data types

Page 7: PowerPC 601

Branch Processing Unit

Contains:a) An adder to compute the target addressb) 3 special purpose registers

i) Link register (LR)ii) Count Register (CTR)iii) Condition Register (CR)

Performs look ahead in condition branches into CR

Uses dedicated registers other than the General Purpose Registers (GPR)

Page 8: PowerPC 601

Branching &Branch Prediction

The 601 has special purpose registers in the BPU for holding, operating on and testing conditions

A single branch instruction may implement a loop-closing branch by decrementing the hardware counter CTR, testing its value and branching if non-zero

For unconditional branches or ones that only depend on the CTR, the branch is executed immediately and is considered a zero cycle branch.

Branch prediction is uses static branch prediction made by the compiler

To protect against wrong predictions the contents of the instruction buffer are save for a short

period of time until instructions from the take paths are delivered from memory

allows for instructions for the non-taken path to be available immediately if a wrong prediction is made.

Page 9: PowerPC 601

Cache Unit & Memory Management Unit

32 Kbytes 8-way associative Unified (instruction and data) Has 2 ports

1) Instruction fetch2) “snooping” transactions on system interface

Supports (externally) 4 PetaBytes(252) of virtual memory and 4 Gb of physical memory

Implements demand paging for VM

Page 10: PowerPC 601

Pipeline Structure

Fetch Up to eight instructions are fetched into an instruction buffer

Dispatch Instructions are dispatched to either the FXU or FPU

Decode Instructions are decoded, with the source registers being read

Instructions to the FXU are decoded together in the dispatch stage.

Execute This stage exists in the BPU as well as the FXU, where integer instructions execute and cache lookup and address processing also occur

Execute1 FPU multiplicationExecute2 FPU additionCache Floating-point operands are sent to

the FPU and the integer operands are sent to the FXU.

Write Register file write.

Page 11: PowerPC 601

Instruction Buffer

The 601 has several buffers in the pipelines that allow storage of multiple fetched instructions and also the storage of several dispatched instructions.

allows out-of-order dispatching (therefore, when a pipeline is blocked, dispatching may still continue to non-blocked ones)

cache is unified meaning that both the instruction and data share a cache data and instructions will need to contend for cache access fetched instruction buffer of 8 instructions (even though the maximum

processing rate is 3 instructions per cycle) data will have priority, the instructions are fetched and stored while it is able to

Page 12: PowerPC 601

Hence…

Up to three 32-bit instructions may be dispatched each cycle one each to FXU, FPU and BPU

The unified cache provides A 32-bit interface to the FXU A 64-bit interface to the FPU a 256-bit interface to both the instruction and memory queues

The I/O had a 32-bit address bus and a 64-bit data bus These buses are logically and physically decoupled from one

another for support of piplined, non-pipelined, or even split bus transactions

To reduce latency and increase performance, the 601 itself is capable of pipelining up to two outstanding operations onto the bus

Page 13: PowerPC 601

Multiply-Add

PowerPC 601 takes in three operands processes (A x B + C) or (A x B – C) in a

single instruction Assuming program and data are

cached, a 100-MHz 601 can sustain 100 million MACs (multiply-accumulate operations) per second on some digital filters

Page 14: PowerPC 601

Benchmarking

Page 15: PowerPC 601

Direct Benchmark Comparison with ADSP-2106x

Page 16: PowerPC 601

More Benchmark Comparisons

Page 17: PowerPC 601

General Purpose Processor

Why use DSP when benchmarks show GPPs like PowerPCs perform better? Performance gained from complicated

dynamic features Not suited for real-time applications

Decreased real-time predictability Complicated optimizing code

Page 18: PowerPC 601

References

Hoskins, John, “The PowerPC Initiative”, 1995, http://www.eng.uci.edu/comp.arch/processors/powerpc/PCPower.html

  Smith, James; Weis, Shlomo, “PowerPC 601 and Alpha 21064: A Tale of Two

RISCs”, IEEE- Computer, June 1994, Vol. 27, No. 6, Page 46-58

  Lee, Ben, “Chapter 2: A Simple SuperScalar Processor – PowerPC 601”,

http://www.ece.orst.edu/~benl/Courses/ECE570_w02.html  “PowerPC Microprocessor- White paper”, http://www1.ibm.com/servers/eserver/

pseries/hardware/whitepapers/power/ppc_601.html  Durisety, Chandra S.A., “PowerPC 601”,

http://www.ece.msstate.edu/~cad12/PowerPC601.ppt

“Analysts Show CPU Can Handle Some Signal-Processing Tasks”, Microprocessor Report, May 8, 1995

http://www.bdti.com/articles/info_articles.htm