27
GPU Programming GPU Programming Robert Hero Robert Hero [email protected] [email protected]

GPU Programming Robert Hero [email protected]. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Embed Size (px)

Citation preview

Page 1: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

GPU GPU ProgrammingProgramming

Robert HeroRobert Hero

[email protected]@soe.ucsc.edu

Page 2: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Quick OverviewQuick Overview(The Old Way)(The Old Way)

Graphics cards process TrianglesGraphics cards process Triangles Quads or other polygons are broken Quads or other polygons are broken

down into trianglesdown into triangles Each triangle processed in two Each triangle processed in two

steps-steps- Vertex operationsVertex operations Pixel operationsPixel operations

Page 3: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Transformation and Transformation and LightingLighting

Each vertex is handled separatelyEach vertex is handled separately First the vertex is transformed into First the vertex is transformed into

screen coordinatesscreen coordinates Next lighting for each vertex is Next lighting for each vertex is

calculatedcalculated Only ambient, diffuse, and specular Only ambient, diffuse, and specular

properties of the vertex are properties of the vertex are calculatedcalculated

Page 4: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Pixel RasterizationPixel Rasterization

Each pixel in the triangle is Each pixel in the triangle is compared to the depth buffercompared to the depth buffer

If the depth test passes, the texture If the depth test passes, the texture for the pixel is looked upfor the pixel is looked up

The texture value, along with the The texture value, along with the color of the pixel is blended together color of the pixel is blended together Gouraud Shading is used for pixel colorGouraud Shading is used for pixel color Possibly with previous color of the pixelPossibly with previous color of the pixel

Page 5: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Gouraud Shading isn’t Gouraud Shading isn’t that goodthat good

Should be a nice circular pattern

Page 6: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

We want better (most of We want better (most of the time)the time)

Possible solution- put all lighting Possible solution- put all lighting calculations in the pixel stepcalculations in the pixel step Expensive to computeExpensive to compute Not always importantNot always important

Better solution- Programmable Better solution- Programmable ShadersShaders Only perform expensive calculations for Only perform expensive calculations for

objects that really need itobjects that really need it Allows programmers to come up with Allows programmers to come up with

their own effectstheir own effects

Page 7: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

PipelinePipeline

Fixed T&L

Vertex Shader Pixel Shader

Tri

Result Image

Position, Lighting, Texturing

More Lighting, Blending, more Texturing

Page 8: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Take control of the GPUTake control of the GPU Two new types of processors to controlTwo new types of processors to control

Vertex shadersVertex shaders Pixel (or Fragment) shadersPixel (or Fragment) shaders

Huge amount of power given to Huge amount of power given to programmerprogrammer

Vertices can be manipulated before they Vertices can be manipulated before they are transformed to screen coordinatesare transformed to screen coordinates

Lighting can now be done on a per pixel Lighting can now be done on a per pixel basisbasis

We can even do pure number crunching on We can even do pure number crunching on the processor – No graphics neededthe processor – No graphics needed

Page 9: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Why not do this on the Why not do this on the CPU?CPU?

Graphics cards have more floating Graphics cards have more floating point power than any CPU on the point power than any CPU on the marketmarket

Specialized hardware allows for Specialized hardware allows for highly optimized calculationshighly optimized calculations

Did I mention Parallel processing?Did I mention Parallel processing?(A NVIDIA 7800GTX has 24 pixel shaders)(A NVIDIA 7800GTX has 24 pixel shaders)

CPU’s aren’t increasing in speed like CPU’s aren’t increasing in speed like GPUs areGPUs are

Page 10: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

So how do we do it?So how do we do it?

Learn a new language or new toolLearn a new language or new tool AssemblyAssembly GLSL (OpenGL)GLSL (OpenGL) HLSL (DirectX)HLSL (DirectX) CgCg ATI RenderMonkeyATI RenderMonkey NVIDIA FX ComposerNVIDIA FX Composer ……..

Page 11: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads
Page 12: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Its not that bad, really…Its not that bad, really… HLSL (High Level Shading Language) HLSL (High Level Shading Language)

and GLSL (GL Shading Language) are and GLSL (GL Shading Language) are very similarvery similar

Very similar to C++Very similar to C++ Created to replace the need to learn Created to replace the need to learn

assembly for each graphics card on the assembly for each graphics card on the marketmarket

Simple to use and learn (assuming you Simple to use and learn (assuming you get a good book)get a good book) Most commands you will use are mul, add, Most commands you will use are mul, add,

dot, sub, and texture lookup.dot, sub, and texture lookup.

Page 13: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

vertexOutput VS_TransformAndTexture(vertexInput IN) vertexOutput VS_TransformAndTexture(vertexInput IN) {{ vertexOutput OUT;vertexOutput OUT; OUT.hPosition = mul( float4(IN.position.xyz , 1.0) , worldViewProj);OUT.hPosition = mul( float4(IN.position.xyz , 1.0) , worldViewProj); OUT.texCoordDiffuse = IN.texCoordDiffuse;OUT.texCoordDiffuse = IN.texCoordDiffuse;

//calculate our vectors N, E, L, and H//calculate our vectors N, E, L, and H float3 worldEyePos = viewInverse[3].xyz;float3 worldEyePos = viewInverse[3].xyz; float3 worldVertPos = mul(IN.position, world).xyz;float3 worldVertPos = mul(IN.position, world).xyz; float4 N = mul(IN.normal, worldInverseTranspose); //normal vectorfloat4 N = mul(IN.normal, worldInverseTranspose); //normal vector float3 E = normalize(worldEyePos - worldVertPos); //eye vectorfloat3 E = normalize(worldEyePos - worldVertPos); //eye vector float3 L = normalize( -lightDir.xyz); //light vectorfloat3 L = normalize( -lightDir.xyz); //light vector float3 H = normalize(E + L); //half angle vectorfloat3 H = normalize(E + L); //half angle vector

//calculate the diffuse and specular contributions//calculate the diffuse and specular contributions float diff = max(0 , dot(N,L));float diff = max(0 , dot(N,L)); float spec = pow( max(0 , dot(N,H) ) , shininess );float spec = pow( max(0 , dot(N,H) ) , shininess ); if( diff <= 0 )if( diff <= 0 ) {{ spec = 0;spec = 0; }}

//output diffuse//output diffuse float4 ambColor = materialDiffuse * lightAmbient;float4 ambColor = materialDiffuse * lightAmbient; float4 diffColor = materialDiffuse * diff * lightColor ;float4 diffColor = materialDiffuse * diff * lightColor ; OUT.diffAmbColor = diffColor + ambColor;OUT.diffAmbColor = diffColor + ambColor;

//output specular//output specular float4 specColor = materialSpecular * lightColor * spec;float4 specColor = materialSpecular * lightColor * spec; OUT.specCol = specColor;OUT.specCol = specColor;

return OUT;return OUT; }}

Page 14: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads
Page 15: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads
Page 16: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads
Page 17: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Difference…Difference…float4 PS_Textured( vertexOutput IN): COLORfloat4 PS_Textured( vertexOutput IN): COLOR{{ float4 diffuseTexture = tex2D(TextureSampler, IN.texCoord0Diffuse );float4 diffuseTexture = tex2D(TextureSampler, IN.texCoord0Diffuse ); float4 diffuse2Texture = tex2D( TextureSampler2, IN.texCoord1Diffuse );float4 diffuse2Texture = tex2D( TextureSampler2, IN.texCoord1Diffuse ); return IN.diffAmbColor * diffuseTexture + IN.specCol;return IN.diffAmbColor * diffuseTexture + IN.specCol;}}

float4 PS_Textured( vertexOutput IN): COLORfloat4 PS_Textured( vertexOutput IN): COLOR{{ float4 diffuseTexture = tex2D( TextureSampler, IN.texCoord0Diffuse );float4 diffuseTexture = tex2D( TextureSampler, IN.texCoord0Diffuse ); float3 normTexture = (tex2D( TextureSampler2, IN.texCoord1Diffuse ).xyz - 0.5)*2.0;float3 normTexture = (tex2D( TextureSampler2, IN.texCoord1Diffuse ).xyz - 0.5)*2.0; float4 N = mul(normTexture, worldInverseTranspose);float4 N = mul(normTexture, worldInverseTranspose); float3 L = normalize( -lightDir.xyz); //light vectorfloat3 L = normalize( -lightDir.xyz); //light vector float diff = max(0 , dot(N,L));float diff = max(0 , dot(N,L)); return diffuseTexture * diff;return diffuseTexture * diff;}}

Page 18: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

General Requirements for General Requirements for writing shaderswriting shaders

Hardware is optimized for graphicsHardware is optimized for graphics This means you can’t create your own datatypesThis means you can’t create your own datatypes Focused on vectors and matricesFocused on vectors and matrices

Vertex and Pixel Shaders have limited input and Vertex and Pixel Shaders have limited input and outputsoutputs

Shaders have no knowledge of what pixel or vertex Shaders have no knowledge of what pixel or vertex they are processingthey are processing

Tricks must be usedTricks must be used Ie. Encode additional position information in color channelsIe. Encode additional position information in color channels Set texture coordinates to give information about which pixel Set texture coordinates to give information about which pixel

is being processedis being processed Don’t use if statements unless you have a really new Don’t use if statements unless you have a really new

card…card…

Page 19: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Cont…Cont…

Shaders can only be so many lines of code Shaders can only be so many lines of code (at least until DirectX10)(at least until DirectX10)

Most newer graphics card have limits Most newer graphics card have limits around 32000 lines of codearound 32000 lines of code

There are different versions with different There are different versions with different featuresfeatures For instance if statements don’t exist in For instance if statements don’t exist in

Shader Model 1.0Shader Model 1.0 Allows the programmer to write effects for Allows the programmer to write effects for

many types of graphics cards (FX files)many types of graphics cards (FX files)

Page 20: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Vertex ShadersVertex Shaders InputInput

PositionPosition NormalNormal Color (Ambient,Diffuse,Specular)Color (Ambient,Diffuse,Specular) Texture CoordinatesTexture Coordinates

OutputOutput PositionPosition ColorColor Texture CoordinatesTexture Coordinates

New features that aren’t available with the fixed New features that aren’t available with the fixed pipelinepipeline Move vertices (Bump mapping, hair,… )Move vertices (Bump mapping, hair,… ) Texture lookup (Get neighbor information… )Texture lookup (Get neighbor information… ) If statements (Not the best idea here, but can be ok)If statements (Not the best idea here, but can be ok)

Page 21: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Pixel ShadersPixel Shaders InputInput

Color informationColor information Texture CoordinatesTexture Coordinates Position InformationPosition Information

OutputOutput Final ColorFinal Color Depth ValueDepth Value This is it!This is it!

Changes from fixed pipelineChanges from fixed pipeline Dependant Texture Lookup (Use a texture to lookup Dependant Texture Lookup (Use a texture to lookup

into another texture)into another texture) If statements (Really bad idea!)If statements (Really bad idea!) Ability to do lighting per pixelAbility to do lighting per pixel

Page 22: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads
Page 23: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

What if you don’t want to make an What if you don’t want to make an Image?Image?

(General Purpose GPU programming)(General Purpose GPU programming)

Encode all your data in a texture mapEncode all your data in a texture map Write your program in a pixel shaderWrite your program in a pixel shader Do a texture lookup to get data and Do a texture lookup to get data and

“render” the result to the image buffer“render” the result to the image buffer Instead of displaying the image buffer Instead of displaying the image buffer

read it back out and you’re doneread it back out and you’re done Or if you need more processing use the Or if you need more processing use the

results as a new texture and process results as a new texture and process againagain

Page 24: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads
Page 25: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

General TipsGeneral Tips

Texture maps are the keys!Texture maps are the keys! You can store 4 different values per texel- You can store 4 different values per texel-

who says they have to be an imagewho says they have to be an image Be careful – texture maps are generally only Be careful – texture maps are generally only

8 bits per channel, and values only range 8 bits per channel, and values only range from 0-255from 0-255

You can make texture maps up to 32bits per You can make texture maps up to 32bits per channelchannel

Values are always clamped 0..1 so make sure you Values are always clamped 0..1 so make sure you scale your valuesscale your values

Use built-in functions where ever possibleUse built-in functions where ever possible

Page 26: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Resources and Cool Resources and Cool ThingsThings

developer.nvidia.comdeveloper.nvidia.com FX ComposerFX Composer NVIDIA SDK (lots of code demos)NVIDIA SDK (lots of code demos)

ATI SDK and RenderMonkeyATI SDK and RenderMonkey DirectX9 SDK (An absolute must for DirectX9 SDK (An absolute must for

programming GPUs)programming GPUs) GPU Gems booksGPU Gems books OpenGL.orgOpenGL.org OpenGL Orange BookOpenGL Orange Book Introduction to 3D Game Programming Introduction to 3D Game Programming

with DirectX 9.0 by Frank Lunawith DirectX 9.0 by Frank Luna

Page 27: GPU Programming Robert Hero rghero@soe.ucsc.edu. Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads

Thank you and Thank you and Questions?Questions?