18
ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

Embed Size (px)

Citation preview

Page 1: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

ATI Stream ComputingATI Radeon™ HD 2900 Series GPU Hardware Overview

Micah VillmowMay 30, 2008

Page 2: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential2 2 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ATI Radeon™ HD 2900 Series GPU Hardware Overview

• Graphics View

• Compute View

• ATI Radeon™ HD 2900 Series GPU Hardware

• ATI Radeon™ HD 2400/2600 Series GPU Hardware

Page 3: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential3 3 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ATI Radeon™ HD 2900 Series GPU - Graphics Overview

• Created for graphics

• Not optimal for compute

• Various functions have specific use cases

• Overhead caused by graphics pipeline

• Graphics APIs do not allow very direct control

Page 4: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential4 4 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ATI Radeon™ HD 2900 Series GPU – Compute Overview

• Hides non-compute items:

Geometry Shader

Tesselation Unit

Vertex Shader

Vertex Cache

Z/Stencil Cache

Etc…

• Exposes only what is required

Page 5: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential5 5 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ATI Radeon™ HD 2900 Series GPU Hardware

• ALU Hardware– Streaming Core

– Thread processor

– Flow Control

– Thread Creation

– ALU Scheduling

• Memory Hardware– Memory Controller

– Texture Unit

– Texture Unit Scheduling

– Tiling

– Render Backends

Page 6: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential6 6 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ALU Hardware – Thread Processors

• 5 Streaming Cores

• Four thin SC’s[X,Y,Z,W]

• One fat SC[T]

• Branch execution unit

• Single cycle dispatch

• Four cycle latency

• 16 Threads/Cycle

00 ALU: ADDR(32) CNT(5) 0 x: MOV R1.x, 0.0f y: MOV R1.y, 0.0f z: MOV R1.z, 0.0f w: MOV R1.w, 0.0f t: MOV R0.x, 0.0f

Page 7: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential7 7 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ALU Hardware – Flow Control

• Predication to mask state updates

• Writes only occur when mask not set

01 LOOP_DX10 i0 FAIL_JUMP_ADDR(5) VALID_PIX 02 ALU_BREAK: ADDR(37) CNT(2) KCACHE0(CB0:0-15) 1 y: SETE_INT R0.y, R0.x, KC0[1].x 2 x: PREDE_INT ____, R0.y, 0.0f UPDATE_EXEC_MASK UPDATE_PRED 03 ALU: ADDR(39) CNT(5) KCACHE0(CB0:0-15) 3 x: ADD R1.x, R1.x, KC0[0].x y: ADD R1.y, R1.y, KC0[0].y z: ADD R1.z, R1.z, KC0[0].z w: ADD R1.w, R1.w, KC0[0].w t: ADD_INT R0.x, R0.x, 104 ENDLOOP i0 PASS_JUMP_ADDR(2)05 EXP_DONE: PIX0, R1END_OF_PROGRAM

Page 8: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential8 8 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ALU Hardware – DPP Array

• 4 SIMD Engines

• 4 Quads/SE

• 4 TP/Quads

• 5 Streaming Cores/TP

• 320 Streaming Cores

• 2 Wavefronts/SE

• 512 Threads Concurrently processed

• 256 Registers Per SIMD

Page 9: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential9 9 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

Cycle 0:

ALU Hardware – Wavefront Execution

Even Wavefront Odd Wavefront

Cycle 1:

Cycle 2:

Cycle 3:

Cycle 4:

Cycle 5:

Cycle 6:

IL Instr: imul r22, r22, r10

IL Instr: imul r22, r22, r10

IL Instr: imul r22, r22, r10

IL Instr: imul r22, r22, r10

IL Instr: and r22, r22, r11

IL Instr: and r22, r22, r11

IL Instr: and r22, r22, r11

IL Instr: and r22, r22, r11

Repeat Ad Nauseam for ALU

•1 square represents a quad(4 sequential threads)•4 quads execute per cycle on a SIMD•Two Wavefronts(WF’s) execute in parallel•Even/Odd WF’s interleave quads every other cycle

Cycle 7:

Page 10: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1010 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ALU Hardware – Thread Creation

• Stamps out 16 quads per wavefront in preset order

• Dispatched to SE’s in round robin fashion by Ultra-Threaded Dispatch Processor

• Affects memory access performance

Page 11: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1111 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ALU Hardware - Wavefront Scheduling

SIMD Engine is 100% busy SIMD Engine has stalls

Page 12: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1212 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

Memory Hardware – Memory Controller

• Fully distributed memory interface• Stacked I/O pad design• Runs Independently of compute and texture units.

Highlights:•Over 100 GB/s memory bandwidth•Achieved via last generation technology•Eight 64-bit memory channels•Kilobit ring bus•Lower frequencies required

Page 13: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1313 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

13

04/21/23

Memory Hardware – Texture Unit

• Four 32KB Four-way associative L1 caches

• L1 cache size is 4x8KB per SIMD Engine

• Data is split across all four 8K L1 cache’s

• L1 cache line is 256 bytes or 4 quads of data

• 256KB Unified Cache over all SIMDs

Page 14: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1414 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

Memory Hardware – TEX Scheduling

• Run independently of ALU units

• Run on core/engine clocks

• Process multiple wavefronts sequentially to hide latency

• Transfers data from cache to registers

• Latency is predictable for L1 cache hits

Page 15: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1515 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

Memory Hardware - Tiling

•Multiple tiling formats

•Micro-tiling and macro-tiling

•CAL tiled format is micro-tiled, macro-tiled

•Quad based hierarchical Z pattern

•CAL linear format is micro-tiled, macro-linear

•Tiled quad based linear format

Page 16: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1616 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

Memory Hardware – Backends

• Also called ROPs (Raster Operator)

• Outputs data to memory via color registers

• Maximum 8 Outputs

• 4 Backend units

• 256B output width

• 32KB Write cache/unit

• 32 Pixels/Clk

Memory Controller

DPP Array

Page 17: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1717 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

ATI Radeon™ HD 2400 / 2600 Series GPUs

ATI Radeon™ HD 2400 Series GPU

• 40 Stream Processors

• 2 SIMD Engines

• 4 Thread Processors/SIMD

• 1 Texture Unit

• 1 Render Backend

ATI Radeon™ HD 2600 Series GPU

•120 Stream Processors

•3 SIMD Engines

•8 Thread Processors/SIMD

•2 Texture Units

•1 Render Backend

Page 18: ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008

| ATI Stream Computing Update | Confidential1818 | ATI Stream Computing – ATI Radeon™ HD 2900 Series GPU Hardware Overview

Disclaimer & AttributionDISCLAIMERThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION© 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI Logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.