Upload
shanon-rodgers
View
214
Download
2
Embed Size (px)
DESCRIPTION
CORE CPU General Purpose MIPS variant General Purpose MIPS variant 128bit SIMD integer multimedia extensions 128bit SIMD integer multimedia extensions ICACHE and DCACHE ICACHE and DCACHE Scratch Pad RAM Scratch Pad RAM Dedicated FPU coprocessor Dedicated FPU coprocessor CPU CORE SPR 16 KB I$ 16KB D$ 8KB Emotion Engine FPU
Citation preview
Playstation2 ArchitecturePlaystation2 Architecture
ArchitectureArchitectureHardware DesignHardware Design
System OverviewSystem Overview
The listing below is a clean view of the The listing below is a clean view of the design behind the Playstation2 hardwaredesign behind the Playstation2 hardware
CORE CPUCORE CPU
General Purpose MIPS variantGeneral Purpose MIPS variant 128bit SIMD integer multimedia 128bit SIMD integer multimedia
extensionsextensions ICACHE and DCACHEICACHE and DCACHE Scratch Pad RAMScratch Pad RAM Dedicated FPU coprocessorDedicated FPU coprocessor
CPU CORE
SPR 16 KB
I$16KB
D$8KB
Emotion Engine
FPU
Dedicated FPUDedicated FPU
FPU – Floating-Point Processing UnitFPU – Floating-Point Processing Unit This unit is used to handle fast floating-This unit is used to handle fast floating-
point operationspoint operations Playstation2 is optimized for 32bit Playstation2 is optimized for 32bit
operations.operations. ““double” data type or 64bit floating double” data type or 64bit floating
operations are much slower and cause operations are much slower and cause major bottle necksmajor bottle necks
SIMDSIMD
SIMD - Single Instruction Multiple DataSIMD - Single Instruction Multiple Data 128bit SIMD allows for a single operation 128bit SIMD allows for a single operation
to be applied to four integers / floatsto be applied to four integers / floats The operations that can be performed are The operations that can be performed are
specific to the CPUspecific to the CPU SIMD is especially useful in games for all SIMD is especially useful in games for all
of it’s complex vector and matrix mathof it’s complex vector and matrix math
How SIMD WorksHow SIMD Works
If given two packed data elements the operation is If given two packed data elements the operation is performed to all of the components in each elementperformed to all of the components in each element
Typical System Layout:Typical System Layout:Cache DependencyCache Dependency
The cache is found on the CPU and has The cache is found on the CPU and has faster access times than system memoryfaster access times than system memory
CACHECACHE
The purpose of cache is to reduce the time it The purpose of cache is to reduce the time it takes to execute redundant operations or takes to execute redundant operations or access data valuesaccess data values
ICACHE – Instruction CacheICACHE – Instruction Cache DCACHE – Data CachesDCACHE – Data Caches SPR – Scratch Pad RAMSPR – Scratch Pad RAM
How Cache WorksHow Cache Works
CPU
Fetch
ICACHE System RAM
DCACHE
Priority: Cache > System Memory
Hardware ControllersHardware Controllers
A controller is a device used to interface A controller is a device used to interface and communicate with a piece of hardwareand communicate with a piece of hardware
Every major component has a controller for Every major component has a controller for their interfacetheir interface
The user application will typically use The user application will typically use registers or interrupt calls to access the registers or interrupt calls to access the controller devicescontroller devices
DMA ControllerDMA Controller
DMA – Direct Memory AccessDMA – Direct Memory Access
DMAC is the arbiter for the DMAC is the arbiter for the main busmain bus
Used to transfer data between Used to transfer data between processesprocesses
Allows for some parallelismAllows for some parallelism
DMA Controller10 CH
CPU CORE
I$ D$
128bit
Vector UnitsVector Units
Playstation2 has two vector units that are Playstation2 has two vector units that are similarsimilar but not the same but not the same
VU0 is the CPU’s alternate processing unit.VU0 is the CPU’s alternate processing unit. VU1 is the GS’s alternate processing unitVU1 is the GS’s alternate processing unit Each Unit has a direct pipeline to it’s Each Unit has a direct pipeline to it’s
alternate processoralternate processor Vector Units are designed for vectors Vector Units are designed for vectors
(imagine (imagine thatthat))
DMAC and GraphicsDMAC and Graphics
DMAC feeds VU1 with needed data, and DMAC feeds VU1 with needed data, and does so with no CPU interventiondoes so with no CPU intervention
Data that is transferred to VU1 is resident Data that is transferred to VU1 is resident on system RAMon system RAM
CPU is now free to process any instructions CPU is now free to process any instructions that have made hits in the instruction cachethat have made hits in the instruction cache
CPU can also access any information in the CPU can also access any information in the data cachedata cache
VU ArchitectureVU Architecture
VU0/1 each have access to 32 float registers VU0/1 each have access to 32 float registers and 16 integer registerand 16 integer register
Float registers are not your average PC style Float registers are not your average PC style registers; they are 128bits in sizeregisters; they are 128bits in size
128bits can conveniently fit 4 float values at 128bits can conveniently fit 4 float values at once (very similar to SIMD architecture)once (very similar to SIMD architecture)
Integer registers are typically used as loop Integer registers are typically used as loop counters and address calculatorscounters and address calculators
VU0VU0 VU0 has two bus linesVU0 has two bus lines One bus is dedicated One bus is dedicated
to the CPUto the CPU The other bus is used The other bus is used
to communicate with to communicate with all other devicesall other devices
Access to shared bus Access to shared bus lines always need to lines always need to be monitoredbe monitored
VU0 has 4KB of $VU0 has 4KB of $
VU0
I$4KB
D$4KB
CPU CORE
SYS RAM
shared bus
dedicated
Shared Buses and VU0Shared Buses and VU0
Why do we need to monitor shared buses?Why do we need to monitor shared buses?– Only one process can access shared devices at a Only one process can access shared devices at a
timetime– Any access operations through a shared bus Any access operations through a shared bus
will cause all other processes to waitwill cause all other processes to wait Using the VU registers and reducing RAM Using the VU registers and reducing RAM
access will help prevent shared accessaccess will help prevent shared access
VU1VU1
VU1 has two bus linesVU1 has two bus lines Main bus is dedicated Main bus is dedicated
to the GSto the GS Has almost identical Has almost identical
functionality as VU0functionality as VU0 Main purpose of VU1 Main purpose of VU1
is to process the data is to process the data before the GSbefore the GS
VU1 has 16KB of $VU1 has 16KB of $
VU1
I$16KB
D$16KB
GS CORE
SYS RAM
shared bus
dedicated
ReviewReview Playstation2 is like having 4x300 MHz Playstation2 is like having 4x300 MHz
processorsprocessors– CPU + VU0 + VU1 + GSCPU + VU0 + VU1 + GS
Cache utilization is the key to reaching the Cache utilization is the key to reaching the limits of this systemlimits of this system
VU0 is primarily for CPU vector operationsVU0 is primarily for CPU vector operations VU1 is dedicated to geometry processingVU1 is dedicated to geometry processing GS manages hardware support of triangle GS manages hardware support of triangle
rasterizationrasterization