Upload
shashwat-shriparv
View
873
Download
2
Tags:
Embed Size (px)
Citation preview
204/12/2023
Presentation Overview
DefinitionComparison with CPUArchitectureGPU-CPU InteractionGPU Memory
04/12/2023 3
Why GPU?
To provide a separate dedicated graphics resources including a graphics processor and memory.
To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests.
04/12/2023 4
There comes
GPU
04/12/2023 5
What is a GPU?
A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated processor efficient at manipulating and displaying computer graphics .
Like the CPU (Central Processing Unit), it is a single-chip processor.
04/12/2023 6
HOWEVER,
The abstract goal of a GPU, is to enable a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks.
04/12/2023 7
GPU vs CPU
A GPU is tailored for highly parallel operation while a CPU executes programs serially.
For this reason, GPUs have many parallel execution units , while CPUs have few execution units .
GPUs have singificantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs.
GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).
04/12/2023 8
BRIEF HISTORY First-Generation GPUs
– Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set.
Second-Generation GPUs– 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s
Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable.
Third-Generation GPUs– 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB,
DX7/8; Vertex Programmability + ASM
Fourth-Generation GPUs– 2002 onwards; GeForce FX family, Radeon 9700;
OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S.
Fifth-Generation GPUs - GeForce 8X:DirectX10.
04/12/2023 9
GPU Architecture
How many processing units?
How many ALUs?
Do you need a cache?
What kind of memory?
04/12/2023 10
GPU Architecture
How many processing units?– Lots.
How many ALUs?
Do you need a cache?
What kind of memory?
04/12/2023 11
GPU Architecture
How many processing units?– Lots.
How many ALUs?– Hundreds.
Do you need a cache?
What kind of memory?
04/12/2023 12
GPU Architecture
How many processing units?– Lots.
How many ALUs?– Hundreds.
Do you need a cache?– Sort of.
What kind of memory?
04/12/2023 13
GPU Architecture
How many processing units?– Lots.
How many ALUs?– Hundreds.
Do you need a cache?– Sort of.
What kind of memory?– very fast.
04/12/2023 14
The difference…….
Without GPU With GPU
04/12/2023 15
The GPU pipeline
The GPU receives geometry information from the CPU as an input and provides a picture as an output
Let’s see how that happens…
hostinterface
vertexprocessing
trianglesetup
pixel processing
memoryinterface
04/12/2023 16
Details………..
04/12/2023 17
Host Interface
The host interface is the communication bridge between the CPU and the GPU.
It receives commands from the CPU and also pulls geometry information from system memory.
It outputs a stream of vertices in object space with all their associated information (texture coordinates, per vertex color etc) .
hostinterface
vertexprocessing
trianglesetup
pixel processing
memoryinterface
04/12/2023 18
Vertex ProcessingThe vertex processing stage receives
vertices from the host interface in object space and outputs them in screen space
This may be a simple linear transformation, or a complex operation involving morphing effects
No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)
hostinterface
vertexprocessing
trianglesetup
pixel processing
memoryinterface
04/12/2023 19
Triangle setupIn this stage geometry information
becomes raster information (screen space geometry is the input, pixels are the output)
Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected
hostinterface
vertexprocessing
trianglesetup
pixel processing
memoryinterface
04/12/2023 20
Triangle Setup (cont…..)A pixel is generated if and only if its center is
inside the triangleEvery pixel generated has its attributes
computed to be the perspective correct interpolation of the three vertices that make up the triangle
04/12/2023 21
Pixel ProcessingEach pixel provided by triangle setup is
fed into pixel processing as a set of attributes which are used to compute the final color for this pixel
The computations taking place here include texture mapping and math operations
hostinterface
vertexprocessing
trianglesetup
pixel processing
memoryinterface
04/12/2023 22
Memory InterfacePixel colors provided by the previous stage
are written to the framebufferUsed to be the biggest bottleneck before
pixel processing took overBefore the final write occurs, some pixels
are rejected by the zbuffer .On modern GPUs z is compressed to reduce framebuffer bandwidth (but not size).
hostinterface
vertexprocessing
trianglesetup
pixel processing
memoryinterface
04/12/2023 23
Programmability in GPU pipelineIn current state of the art GPUs, vertex
and pixel processing are now programmable
The programmer can write programs that are executed for every vertex as well as for every pixel
This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applicationshost
interfacevertex
processingtrianglesetup
pixel processing
memoryinterface
04/12/2023 24
GPU Pipelined Architecture (simplified view)
Framebuffer
Pixel Shader
Texture Storage + Filtering
RasterizerVertex Shader
Vertex Setup
CPU
Vertices Pixels
GPU
…110010100100…
04/12/2023 25
GPU Pipelined Architecture (simplified view)
GPU
One unit can limit the speed of the pipeline…
Framebuffer
Pixel Shader
Texture Storage + Filtering
RasterizerVertex Shader
Vertex Setup
CPU
04/12/2023 26
CPU/GPU interaction
The CPU and GPU inside the PC work in parallel with each other
There are two “threads” going on, one for the CPU and one for the GPU, which communicate through a command buffer:
CPU writes commands here
GPU reads commands from here
Pending GPU commands
04/12/2023 27
CPU/GPU interaction (cont)If this command buffer is drained
empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster!
If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited
04/12/2023 28
Synchronization issuesIn the figure below, the CPU must
not overwrite the data in the “yellow” block until the GPU is done with the “black” command, which references that data:
CPU writes commands here
GPU reads commands from here
data
04/12/2023 29
Inlining dataOne way to avoid these problems is
to inline all data to the command buffer and avoid references to separate data:
CPU writes commands here
GPU reads commands from here
However, this is also bad for performance, since we may need to copy several Mbytes of data instead of merely passing around a pointer
04/12/2023 30
GPU readbacks
The output of a GPU is a rendered image on the screen, what will happen if the CPU tries to read it?
CPU writes commands here
GPU reads commands from here
Pending GPU commands
GPU must be synchronized with the CPU, ie it must drain its entire command buffer, and the CPU must wait while this happens
04/12/2023 31
GPU readbacks (cont)
We lose all parallelism, since first the CPU waits for the GPU, then the GPU waits for the CPU (because the command buffer has been drained)
Both CPU and GPU performance take a nosedive
Bottom line: the image the GPU produces is for your eyes, not for the CPU (treat the CPU -> GPU highway as a one way street)
04/12/2023 32
About GPU memory…..
04/12/2023 33
Memory Hierarchy
CPU and GPU Memory Hierarchy
CPU Registers
Disk
CPU Caches
CPU Main Memory
GPU Video Memory
GPU Caches
GPU Constant Registers
GPU Temporary Registers
04/12/2023 34
Where is GPU Data Stored?– Vertex buffer– Frame buffer– Texture
Vertex BufferVertex
ProcessorRasterizer
FragmentProcessor
Frame Buffer(s)
Texture
04/12/2023 35
CPU memory vs GPU memory
CPU GPU
Registers Read/write Read/write
Local Mem Read/write stack None
Global Mem Read/write heap Read-only during computation.Write-only at end (to pre-computed address)
Disk Read/write disk None
04/12/2023 36
It looks like…..
04/12/2023 37
Some applications…..
Computer generated holography using a graphics processing unit
Improve the performance of CAD tools.
Computer graphics in games
04/12/2023 38
New…..
NVIDIA's new graphics processing unit, the GeForce 8X ULTRA, said to represent the very latest in visual effects technologies.