Upload
brett-hoover
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
ATI Stream ComputingATI Radeon™ HD 2900 Series GPU Hardware Overview
Micah VillmowMay 30, 2008
| ATI Stream Computing Update | Confidential2 2 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ATI Radeon™ HD 2900 Series GPU Hardware Overview
• Graphics View
• Compute View
• ATI Radeon™ HD 2900 Series GPU Hardware
• ATI Radeon™ HD 2400/2600 Series GPU Hardware
| ATI Stream Computing Update | Confidential3 3 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ATI Radeon™ HD 2900 Series GPU - Graphics Overview
• Created for graphics
• Not optimal for compute
• Various functions have specific use cases
• Overhead caused by graphics pipeline
• Graphics APIs do not allow very direct control
| ATI Stream Computing Update | Confidential4 4 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ATI Radeon™ HD 2900 Series GPU – Compute Overview
• Hides non-compute items:
Geometry Shader
Tesselation Unit
Vertex Shader
Vertex Cache
Z/Stencil Cache
Etc…
• Exposes only what is required
| ATI Stream Computing Update | Confidential5 5 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ATI Radeon™ HD 2900 Series GPU Hardware
• ALU Hardware– Streaming Core
– Thread processor
– Flow Control
– Thread Creation
– ALU Scheduling
• Memory Hardware– Memory Controller
– Texture Unit
– Texture Unit Scheduling
– Tiling
– Render Backends
| ATI Stream Computing Update | Confidential6 6 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ALU Hardware – Thread Processors
• 5 Streaming Cores
• Four thin SC’s[X,Y,Z,W]
• One fat SC[T]
• Branch execution unit
• Single cycle dispatch
• Four cycle latency
• 16 Threads/Cycle
00 ALU: ADDR(32) CNT(5) 0 x: MOV R1.x, 0.0f y: MOV R1.y, 0.0f z: MOV R1.z, 0.0f w: MOV R1.w, 0.0f t: MOV R0.x, 0.0f
| ATI Stream Computing Update | Confidential7 7 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ALU Hardware – Flow Control
• Predication to mask state updates
• Writes only occur when mask not set
01 LOOP_DX10 i0 FAIL_JUMP_ADDR(5) VALID_PIX 02 ALU_BREAK: ADDR(37) CNT(2) KCACHE0(CB0:0-15) 1 y: SETE_INT R0.y, R0.x, KC0[1].x 2 x: PREDE_INT ____, R0.y, 0.0f UPDATE_EXEC_MASK UPDATE_PRED 03 ALU: ADDR(39) CNT(5) KCACHE0(CB0:0-15) 3 x: ADD R1.x, R1.x, KC0[0].x y: ADD R1.y, R1.y, KC0[0].y z: ADD R1.z, R1.z, KC0[0].z w: ADD R1.w, R1.w, KC0[0].w t: ADD_INT R0.x, R0.x, 104 ENDLOOP i0 PASS_JUMP_ADDR(2)05 EXP_DONE: PIX0, R1END_OF_PROGRAM
| ATI Stream Computing Update | Confidential8 8 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ALU Hardware – DPP Array
• 4 SIMD Engines
• 4 Quads/SE
• 4 TP/Quads
• 5 Streaming Cores/TP
• 320 Streaming Cores
• 2 Wavefronts/SE
• 512 Threads Concurrently processed
• 256 Registers Per SIMD
| ATI Stream Computing Update | Confidential9 9 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
Cycle 0:
ALU Hardware – Wavefront Execution
Even Wavefront Odd Wavefront
Cycle 1:
Cycle 2:
Cycle 3:
Cycle 4:
Cycle 5:
Cycle 6:
IL Instr: imul r22, r22, r10
IL Instr: imul r22, r22, r10
IL Instr: imul r22, r22, r10
IL Instr: imul r22, r22, r10
IL Instr: and r22, r22, r11
IL Instr: and r22, r22, r11
IL Instr: and r22, r22, r11
IL Instr: and r22, r22, r11
Repeat Ad Nauseam for ALU
•1 square represents a quad(4 sequential threads)•4 quads execute per cycle on a SIMD•Two Wavefronts(WF’s) execute in parallel•Even/Odd WF’s interleave quads every other cycle
Cycle 7:
| ATI Stream Computing Update | Confidential1010 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ALU Hardware – Thread Creation
• Stamps out 16 quads per wavefront in preset order
• Dispatched to SE’s in round robin fashion by Ultra-Threaded Dispatch Processor
• Affects memory access performance
| ATI Stream Computing Update | Confidential1111 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ALU Hardware - Wavefront Scheduling
SIMD Engine is 100% busy SIMD Engine has stalls
| ATI Stream Computing Update | Confidential1212 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
Memory Hardware – Memory Controller
• Fully distributed memory interface• Stacked I/O pad design• Runs Independently of compute and texture units.
Highlights:•Over 100 GB/s memory bandwidth•Achieved via last generation technology•Eight 64-bit memory channels•Kilobit ring bus•Lower frequencies required
| ATI Stream Computing Update | Confidential1313 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
13
04/21/23
Memory Hardware – Texture Unit
• Four 32KB Four-way associative L1 caches
• L1 cache size is 4x8KB per SIMD Engine
• Data is split across all four 8K L1 cache’s
• L1 cache line is 256 bytes or 4 quads of data
• 256KB Unified Cache over all SIMDs
| ATI Stream Computing Update | Confidential1414 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
Memory Hardware – TEX Scheduling
• Run independently of ALU units
• Run on core/engine clocks
• Process multiple wavefronts sequentially to hide latency
• Transfers data from cache to registers
• Latency is predictable for L1 cache hits
| ATI Stream Computing Update | Confidential1515 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
Memory Hardware - Tiling
•Multiple tiling formats
•Micro-tiling and macro-tiling
•CAL tiled format is micro-tiled, macro-tiled
•Quad based hierarchical Z pattern
•CAL linear format is micro-tiled, macro-linear
•Tiled quad based linear format
| ATI Stream Computing Update | Confidential1616 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
Memory Hardware – Backends
• Also called ROPs (Raster Operator)
• Outputs data to memory via color registers
• Maximum 8 Outputs
• 4 Backend units
• 256B output width
• 32KB Write cache/unit
• 32 Pixels/Clk
Memory Controller
DPP Array
| ATI Stream Computing Update | Confidential1717 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
ATI Radeon™ HD 2400 / 2600 Series GPUs
ATI Radeon™ HD 2400 Series GPU
• 40 Stream Processors
• 2 SIMD Engines
• 4 Thread Processors/SIMD
• 1 Texture Unit
• 1 Render Backend
ATI Radeon™ HD 2600 Series GPU
•120 Stream Processors
•3 SIMD Engines
•8 Thread Processors/SIMD
•2 Texture Units
•1 Render Backend
| ATI Stream Computing Update | Confidential1818 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview
Disclaimer & AttributionDISCLAIMERThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION© 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI Logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.