Playstation2 Architecture Architecture Hardware Design

Playstation2 ArchitecturePlaystation2 Architecture

ArchitectureArchitectureHardware DesignHardware Design

System OverviewSystem Overview

The listing below is a clean view of the The listing below is a clean view of the design behind the Playstation2 hardwaredesign behind the Playstation2 hardware

CORE CPUCORE CPU

General Purpose MIPS variantGeneral Purpose MIPS variant 128bit SIMD integer multimedia 128bit SIMD integer multimedia

extensionsextensions ICACHE and DCACHEICACHE and DCACHE Scratch Pad RAMScratch Pad RAM Dedicated FPU coprocessorDedicated FPU coprocessor

CPU CORE

SPR 16 KB

I$16KB

D$8KB

Emotion Engine

FPU

Dedicated FPUDedicated FPU

FPU – Floating-Point Processing UnitFPU – Floating-Point Processing Unit This unit is used to handle fast floating-This unit is used to handle fast floating-

point operationspoint operations Playstation2 is optimized for 32bit Playstation2 is optimized for 32bit

operations.operations. ““double” data type or 64bit floating double” data type or 64bit floating

operations are much slower and cause operations are much slower and cause major bottle necksmajor bottle necks

SIMDSIMD

SIMD - Single Instruction Multiple DataSIMD - Single Instruction Multiple Data 128bit SIMD allows for a single operation 128bit SIMD allows for a single operation

to be applied to four integers / floatsto be applied to four integers / floats The operations that can be performed are The operations that can be performed are

specific to the CPUspecific to the CPU SIMD is especially useful in games for all SIMD is especially useful in games for all

of it’s complex vector and matrix mathof it’s complex vector and matrix math

How SIMD WorksHow SIMD Works

If given two packed data elements the operation is If given two packed data elements the operation is performed to all of the components in each elementperformed to all of the components in each element

Typical System Layout:Typical System Layout:Cache DependencyCache Dependency

The cache is found on the CPU and has The cache is found on the CPU and has faster access times than system memoryfaster access times than system memory

CACHECACHE

The purpose of cache is to reduce the time it The purpose of cache is to reduce the time it takes to execute redundant operations or takes to execute redundant operations or access data valuesaccess data values

ICACHE – Instruction CacheICACHE – Instruction Cache DCACHE – Data CachesDCACHE – Data Caches SPR – Scratch Pad RAMSPR – Scratch Pad RAM

How Cache WorksHow Cache Works

CPU

Fetch

ICACHE System RAM

DCACHE

Priority: Cache > System Memory

Hardware ControllersHardware Controllers

A controller is a device used to interface A controller is a device used to interface and communicate with a piece of hardwareand communicate with a piece of hardware

Every major component has a controller for Every major component has a controller for their interfacetheir interface

The user application will typically use The user application will typically use registers or interrupt calls to access the registers or interrupt calls to access the controller devicescontroller devices

DMA ControllerDMA Controller

DMA – Direct Memory AccessDMA – Direct Memory Access

DMAC is the arbiter for the DMAC is the arbiter for the main busmain bus

Used to transfer data between Used to transfer data between processesprocesses

Allows for some parallelismAllows for some parallelism

DMA Controller10 CH

CPU CORE

I$ D$

128bit

Vector UnitsVector Units

Playstation2 has two vector units that are Playstation2 has two vector units that are similarsimilar but not the same but not the same

VU0 is the CPU’s alternate processing unit.VU0 is the CPU’s alternate processing unit. VU1 is the GS’s alternate processing unitVU1 is the GS’s alternate processing unit Each Unit has a direct pipeline to it’s Each Unit has a direct pipeline to it’s

alternate processoralternate processor Vector Units are designed for vectors Vector Units are designed for vectors

(imagine (imagine thatthat))

DMAC and GraphicsDMAC and Graphics

DMAC feeds VU1 with needed data, and DMAC feeds VU1 with needed data, and does so with no CPU interventiondoes so with no CPU intervention

Data that is transferred to VU1 is resident Data that is transferred to VU1 is resident on system RAMon system RAM

CPU is now free to process any instructions CPU is now free to process any instructions that have made hits in the instruction cachethat have made hits in the instruction cache

CPU can also access any information in the CPU can also access any information in the data cachedata cache

VU ArchitectureVU Architecture

VU0/1 each have access to 32 float registers VU0/1 each have access to 32 float registers and 16 integer registerand 16 integer register

Float registers are not your average PC style Float registers are not your average PC style registers; they are 128bits in sizeregisters; they are 128bits in size

128bits can conveniently fit 4 float values at 128bits can conveniently fit 4 float values at once (very similar to SIMD architecture)once (very similar to SIMD architecture)

Integer registers are typically used as loop Integer registers are typically used as loop counters and address calculatorscounters and address calculators

VU0VU0 VU0 has two bus linesVU0 has two bus lines One bus is dedicated One bus is dedicated

to the CPUto the CPU The other bus is used The other bus is used

to communicate with to communicate with all other devicesall other devices

Access to shared bus Access to shared bus lines always need to lines always need to be monitoredbe monitored

VU0 has 4KB of $VU0 has 4KB of $

VU0

I$4KB

D$4KB

CPU CORE

SYS RAM

shared bus

dedicated

Shared Buses and VU0Shared Buses and VU0

Why do we need to monitor shared buses?Why do we need to monitor shared buses?– Only one process can access shared devices at a Only one process can access shared devices at a

timetime– Any access operations through a shared bus Any access operations through a shared bus

will cause all other processes to waitwill cause all other processes to wait Using the VU registers and reducing RAM Using the VU registers and reducing RAM

access will help prevent shared accessaccess will help prevent shared access

VU1VU1

VU1 has two bus linesVU1 has two bus lines Main bus is dedicated Main bus is dedicated

to the GSto the GS Has almost identical Has almost identical

functionality as VU0functionality as VU0 Main purpose of VU1 Main purpose of VU1

is to process the data is to process the data before the GSbefore the GS

VU1 has 16KB of $VU1 has 16KB of $

VU1

I$16KB

D$16KB

GS CORE

SYS RAM

shared bus

dedicated

ReviewReview Playstation2 is like having 4x300 MHz Playstation2 is like having 4x300 MHz

processorsprocessors– CPU + VU0 + VU1 + GSCPU + VU0 + VU1 + GS

Cache utilization is the key to reaching the Cache utilization is the key to reaching the limits of this systemlimits of this system

VU0 is primarily for CPU vector operationsVU0 is primarily for CPU vector operations VU1 is dedicated to geometry processingVU1 is dedicated to geometry processing GS manages hardware support of triangle GS manages hardware support of triangle

rasterizationrasterization

Documents

Playstation2 Architecture Architecture Hardware Design