Upload
augustine-harrison
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Meltdown 20011Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Optimizing DirectX* Graphic Applications
using Software Vertex Processing
Ronen Zohar/Kim Pallister
Intel Corporation
Meltdown 20012Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Agenda
• Do I need SW vertex processing?
• The PSGP
• Using SW vertex processing for maximum performance: memory, batching and render-states
• SW vertex processing and DirectX*’s 8.0 new features
Meltdown 20013Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Do I need SW vertex processing?
• Your publisher wants:– Eye-candy graphics, using all the latest 3D features– Lower the “minimum system requirements”– and many more
• Problem: older systems does not support all the eye-candy features
• Solution1: Disable features for low-end systems• Solution2: Use SW vertex processing (at least for the
features that you can) and keep some features
Meltdown 20014Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Inside DirectX* Graphics
Driver
Application
API Front-end
Communication to the driver (DDI)
SW Vertex processing (PSGP)
DirectX run-time
HW vertex processing path
Meltdown 20015Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
PSGP – Processor Specific Geometry Pipeline
• Part of DirectX graphics responsible for the SW vertex processing algorithms, optimized for the client’s processor
• DirectX’s 8.0 PSGP is optimized for:– Intel® Pentium® III processor– Intel Pentium 4 processor
Meltdown 20016Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
The PSGP
VB
Map stream to registers
Execute vertex shader code
Vertex shader path
Transformation Lighting Tex Gen
Fixed function path
Format data to output-FVF
Internal temporary VB’s
Clipper
IB
To driver
Meltdown 20017Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
PSGP Principles
• Use SIMD to process multiple vertices in each iteration– Vertical processing– Data is swizzled on the fly
• Prefetch input streams to hide memory latency• Write output to temporary VB’s based on XYZRHW
FVF code– In system memory if need to read back transformed vertices– In driver memory if no read-back is required– More on this later…
Meltdown 20018Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Input Stream Memory Allocation
• Create SW processed primitives in system memory (using the D3DUSAGE_SOFTWAREPROCESSING
usage create flags).• If the same VB is processed both in SW and
HW– Try to avoid it– If you must - create multiple copies, one in system
and one in driver memory
• If the primitive is never clipped, use the D3DUSAGE_DONOTCLIP usage flag
Meltdown 20019Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Primitive Batching
• Batch all the SW processed primitives together
• SW processes the entire VB range that you submit, if multiple primitives are using the same VB – squeeze the vertices range
• As with HW, bigger primitives are always better (the PSGP have long setup)
Meltdown 200110Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Primitive Batching (Cont)
• The PSGP is batching the processed vertices before sending them to HW (to reduce HW’s VB changes)
• Primitives are batched as long as their output FVF is equal:– XYZ | NORMAL | TEX1 and XYZ | DIFFUSE | TEX1 have
the same output FVF (XYZRHW | DIFFUSE | TEX1)– In SW mode, changing the VB FVF does
not mean a slowdown (unlike HW)
Meltdown 200111Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Clipping Render-state
• When clipping is enabled, the PSGP– Stores its output to system memory buffer
• As it need to read vertices in order to clip– Driver need to copy it across the AGP
• When clipping disabled writes to driver allocated buffer– No Copy here!
– Calculates clip flags (out-codes) for each vertex• more execution cycles per vertex
– Clips
• Minimize the amount of clipping• Use bounding boxes/spheres on your objects• Don’t forget to take the guard-band into account
Meltdown 200112Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Clipping Render-state (Cont)
• Pseudo-code to minimize clipping– If (BB is outside screen)
• Don’t render primitive
– Elseif (BB is inside guard-band)• Render with clipping off
– Else• Render with clipping on
• Typical game scene should have <10% of primitives clipped– Biggest problem is front plane clipping
Meltdown 200113Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Performance Render-states
• Specular – very expensive
• LocalViewer – smaller performance impact than HW, but still costs more
• NormalizeNormals – extra work for the PSGP, use only when needed
• Fog – written as “specular alpha”, can change PSGP’s output FVF
Meltdown 200114Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
DirectX* 8.0 Graphics New Features
• Point sprites
• Tweening
• Indexed vertex blending/ Indexed palette skinning
• Vertex Shaders
Meltdown 200115Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Point Sprites
• PSGP writes in native FVF format
• If HW does not support– Each point is expanded to quad, using the
point size calculated– The quad list is submitted to the driver
• Very slow solution if no HW support for point sprites, try to avoid it
Meltdown 200116Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Tweening
• Tween the position and normal before transformation (in SIMD)
• After tweening continuous the “standard” PSGP flow
• Costs very few cycles– But, for tweening and transformation only a vertex
shader would run faster– Try to compare your exact scenario to a vertex
shader
Meltdown 200117Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Indexed Skinning
• Transforms all vertices to matrix0 space– Using scalar code, with lookup for the
needed matrix
• Than continuous the normal PSGP flow• DirectX* 7 style skinning is supported
by some HW and may run faster, but requires multiple models and DrawPrimitive calls
Meltdown 200118Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Vertex shaders
• At vertex shader creation– The shader code is compiled to equivalent IA32
code– Using all possible assembly optimizations and
instructions available on client’s CPU to achieve fastest code
• At vertex shader execution– Calling the generated code
• SW vertex shaders have excellent performance
Meltdown 200119Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
SW Vs. HW Vertex Shaders
• Calculates more than one vertex in a single iteration– Based on the processor SIMD width
• Not every shader instruction is 1 clock– But, the CPU runs with much higher
frequency than today’s 3D graphics chips
Meltdown 200120Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
SW Vs. HW Vertex Shaders (Cont)
• Simple compilation sample:– Mul r0.xyz,v0,c0
• Movaps xmm0,[v0.x]• Mulps xmm0,[c0.x]• Movaps xmm1,[v0.y]• Mulps xmm1,[c0.y]• Movaps xmm2,[v0.z]• Mulps xmm2,[c0.z]• Movaps [r0.x],xmm0• Movaps [r0.y],xmm1• Movaps [r0.z],xmm2
Meltdown 200121Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
SW Vs. HW Vertex Shaders (cont)
• Data that you write, is data that the CPU have to calculate– Write only needed data (using the vertex shader
write mask)– Use the swizzle modifiers, and don’t duplicate
written data
• Vertex shader instructions are blended to achieve maximum performance– But, keeping dependency chains squeezed will
help the compiler in physical register assignments
Meltdown 200122Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Performance Tips for SW Vertex Shaders
• m?x? macros have better performance than the un-expanded macros
• Try to minimize the use of the address register– Due to the parallelism of the SW vertex
shader– Sort the VB by values used in the address
register
Meltdown 200123Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Performance Tips for SW Vertex Shaders
• lit, expp and logp are big cycle consumers– Use the worse accuracy (i.e. expp.x) when
possible – Use either .x or .z (but not both)– exp and log are worse than expp, logp
• Don’t implicitly saturate color values– it is done automatically
Meltdown 200124Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Optimized Vertex Shaderdp4 oPos.x, v0, c2
dp4 oPos.y, v0, c3
dp4 oPos.z, v0, c4
dp4 oPos.w, v0, c5
add r1, c6,-v0
dp3 r2, r1, r1
rsq r2, r2
mov oT0, v2
mul r1,r1,r2
dp3 r3, v1, r1
max r3,r3,c8
add r3, r3, c7
min oD0,r3,c9
m4x4 oPos, v0, c[2]
add r1.xyz, c6,-v0
dp3 r2.w, r1, r1
rsq r2.w, r2.w
mul r1.xyz,r1,r2.w
dp3 r2.w, v1, r1
max r2.w,r2.w,c8
add oD0.xyz, r2.w, c7
mov oT0, v2
Meltdown 200125Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Questions??
Intel, Pentium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright © 2001 Intel Corp.
Meltdown 200126Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Backup
Meltdown 200127Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Tweening + transformation vertex shader
• Mul r0.xyz,v0,c0.x // c0.x – α
• Mad r0.xyz,v1,c0.y,r0 // c0.y – (1- α)
• M4x4 oPos,r0,c1
• Mov oD[0].xyz,v2
• Mov oT[0].xy,v3
Meltdown 200128Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Not Equal Address Value
Const register file (x4)
1.0f 1.0f 1.0f 1.0f
2.0f 2.0f 2.0f 2.0f
3.0f 3.0f 3.0f 3.0f
1 2 1 2
Address register (x4)
Need to re-arrange a combination register for the SIMD instruction to use
(costs ~20 cycles) 1.0f 2.0f 1.0f 2.0f
Instruction argument
Meltdown 200129Copyright © 2001 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Equal Address Value
Const register file (x4)
1.0f 1.0f 1.0f 1.0f
2.0f 2.0f 2.0f 2.0f
3.0f 3.0f 3.0f 3.0f
2 2 2 2
Address register (x4)
Accessing directly the x4 constant register file.
No penalty for “re-arranging” vertices
Address accessing mode is selected when storing address value
Instruction argument