29
Meltdown 2001 1 Copyright © 2001 Intel Corporation. *Other names and brands may be claimed as the property of others. Optimizing DirectX* Graphic Applications using Software Vertex Processing Ronen Zohar/Kim Pallister Intel Corporation

1 Copyright © 2001 Intel Corporation. * Other names and brands may be claimed as the property of others. Meltdown 2001 Optimizing DirectX* Graphic Applications

Embed Size (px)

Citation preview

Meltdown 20011Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Optimizing DirectX* Graphic Applications

using Software Vertex Processing

Ronen Zohar/Kim Pallister

Intel Corporation

Meltdown 20012Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Agenda

• Do I need SW vertex processing?

• The PSGP

• Using SW vertex processing for maximum performance: memory, batching and render-states

• SW vertex processing and DirectX*’s 8.0 new features

Meltdown 20013Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Do I need SW vertex processing?

• Your publisher wants:– Eye-candy graphics, using all the latest 3D features– Lower the “minimum system requirements”– and many more

• Problem: older systems does not support all the eye-candy features

• Solution1: Disable features for low-end systems• Solution2: Use SW vertex processing (at least for the

features that you can) and keep some features

Meltdown 20014Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Inside DirectX* Graphics

Driver

Application

API Front-end

Communication to the driver (DDI)

SW Vertex processing (PSGP)

DirectX run-time

HW vertex processing path

Meltdown 20015Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

PSGP – Processor Specific Geometry Pipeline

• Part of DirectX graphics responsible for the SW vertex processing algorithms, optimized for the client’s processor

• DirectX’s 8.0 PSGP is optimized for:– Intel® Pentium® III processor– Intel Pentium 4 processor

Meltdown 20016Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

The PSGP

VB

Map stream to registers

Execute vertex shader code

Vertex shader path

Transformation Lighting Tex Gen

Fixed function path

Format data to output-FVF

Internal temporary VB’s

Clipper

IB

To driver

Meltdown 20017Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

PSGP Principles

• Use SIMD to process multiple vertices in each iteration– Vertical processing– Data is swizzled on the fly

• Prefetch input streams to hide memory latency• Write output to temporary VB’s based on XYZRHW

FVF code– In system memory if need to read back transformed vertices– In driver memory if no read-back is required– More on this later…

Meltdown 20018Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Input Stream Memory Allocation

• Create SW processed primitives in system memory (using the D3DUSAGE_SOFTWAREPROCESSING

usage create flags).• If the same VB is processed both in SW and

HW– Try to avoid it– If you must - create multiple copies, one in system

and one in driver memory

• If the primitive is never clipped, use the D3DUSAGE_DONOTCLIP usage flag

Meltdown 20019Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Primitive Batching

• Batch all the SW processed primitives together

• SW processes the entire VB range that you submit, if multiple primitives are using the same VB – squeeze the vertices range

• As with HW, bigger primitives are always better (the PSGP have long setup)

Meltdown 200110Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Primitive Batching (Cont)

• The PSGP is batching the processed vertices before sending them to HW (to reduce HW’s VB changes)

• Primitives are batched as long as their output FVF is equal:– XYZ | NORMAL | TEX1 and XYZ | DIFFUSE | TEX1 have

the same output FVF (XYZRHW | DIFFUSE | TEX1)– In SW mode, changing the VB FVF does

not mean a slowdown (unlike HW)

Meltdown 200111Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Clipping Render-state

• When clipping is enabled, the PSGP– Stores its output to system memory buffer

• As it need to read vertices in order to clip– Driver need to copy it across the AGP

• When clipping disabled writes to driver allocated buffer– No Copy here!

– Calculates clip flags (out-codes) for each vertex• more execution cycles per vertex

– Clips

• Minimize the amount of clipping• Use bounding boxes/spheres on your objects• Don’t forget to take the guard-band into account

Meltdown 200112Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Clipping Render-state (Cont)

• Pseudo-code to minimize clipping– If (BB is outside screen)

• Don’t render primitive

– Elseif (BB is inside guard-band)• Render with clipping off

– Else• Render with clipping on

• Typical game scene should have <10% of primitives clipped– Biggest problem is front plane clipping

Meltdown 200113Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Performance Render-states

• Specular – very expensive

• LocalViewer – smaller performance impact than HW, but still costs more

• NormalizeNormals – extra work for the PSGP, use only when needed

• Fog – written as “specular alpha”, can change PSGP’s output FVF

Meltdown 200114Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

DirectX* 8.0 Graphics New Features

• Point sprites

• Tweening

• Indexed vertex blending/ Indexed palette skinning

• Vertex Shaders

Meltdown 200115Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Point Sprites

• PSGP writes in native FVF format

• If HW does not support– Each point is expanded to quad, using the

point size calculated– The quad list is submitted to the driver

• Very slow solution if no HW support for point sprites, try to avoid it

Meltdown 200116Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Tweening

• Tween the position and normal before transformation (in SIMD)

• After tweening continuous the “standard” PSGP flow

• Costs very few cycles– But, for tweening and transformation only a vertex

shader would run faster– Try to compare your exact scenario to a vertex

shader

Meltdown 200117Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Indexed Skinning

• Transforms all vertices to matrix0 space– Using scalar code, with lookup for the

needed matrix

• Than continuous the normal PSGP flow• DirectX* 7 style skinning is supported

by some HW and may run faster, but requires multiple models and DrawPrimitive calls

Meltdown 200118Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Vertex shaders

• At vertex shader creation– The shader code is compiled to equivalent IA32

code– Using all possible assembly optimizations and

instructions available on client’s CPU to achieve fastest code

• At vertex shader execution– Calling the generated code

• SW vertex shaders have excellent performance

Meltdown 200119Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

SW Vs. HW Vertex Shaders

• Calculates more than one vertex in a single iteration– Based on the processor SIMD width

• Not every shader instruction is 1 clock– But, the CPU runs with much higher

frequency than today’s 3D graphics chips

Meltdown 200120Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

SW Vs. HW Vertex Shaders (Cont)

• Simple compilation sample:– Mul r0.xyz,v0,c0

• Movaps xmm0,[v0.x]• Mulps xmm0,[c0.x]• Movaps xmm1,[v0.y]• Mulps xmm1,[c0.y]• Movaps xmm2,[v0.z]• Mulps xmm2,[c0.z]• Movaps [r0.x],xmm0• Movaps [r0.y],xmm1• Movaps [r0.z],xmm2

Meltdown 200121Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

SW Vs. HW Vertex Shaders (cont)

• Data that you write, is data that the CPU have to calculate– Write only needed data (using the vertex shader

write mask)– Use the swizzle modifiers, and don’t duplicate

written data

• Vertex shader instructions are blended to achieve maximum performance– But, keeping dependency chains squeezed will

help the compiler in physical register assignments

Meltdown 200122Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Performance Tips for SW Vertex Shaders

• m?x? macros have better performance than the un-expanded macros

• Try to minimize the use of the address register– Due to the parallelism of the SW vertex

shader– Sort the VB by values used in the address

register

Meltdown 200123Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Performance Tips for SW Vertex Shaders

• lit, expp and logp are big cycle consumers– Use the worse accuracy (i.e. expp.x) when

possible – Use either .x or .z (but not both)– exp and log are worse than expp, logp

• Don’t implicitly saturate color values– it is done automatically

Meltdown 200124Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Optimized Vertex Shaderdp4 oPos.x, v0, c2

dp4 oPos.y, v0, c3

dp4 oPos.z, v0, c4

dp4 oPos.w, v0, c5

add r1, c6,-v0

dp3 r2, r1, r1

rsq r2, r2

mov oT0, v2

mul r1,r1,r2

dp3 r3, v1, r1

max r3,r3,c8

add r3, r3, c7

min oD0,r3,c9

m4x4 oPos, v0, c[2]

add r1.xyz, c6,-v0

dp3 r2.w, r1, r1

rsq r2.w, r2.w

mul r1.xyz,r1,r2.w

dp3 r2.w, v1, r1

max r2.w,r2.w,c8

add oD0.xyz, r2.w, c7

mov oT0, v2

Meltdown 200125Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Questions??

[email protected]

[email protected]

Intel, Pentium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright © 2001 Intel Corp.

Meltdown 200126Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Backup

Meltdown 200127Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Tweening + transformation vertex shader

• Mul r0.xyz,v0,c0.x // c0.x – α

• Mad r0.xyz,v1,c0.y,r0 // c0.y – (1- α)

• M4x4 oPos,r0,c1

• Mov oD[0].xyz,v2

• Mov oT[0].xy,v3

Meltdown 200128Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Not Equal Address Value

Const register file (x4)

1.0f 1.0f 1.0f 1.0f

2.0f 2.0f 2.0f 2.0f

3.0f 3.0f 3.0f 3.0f

1 2 1 2

Address register (x4)

Need to re-arrange a combination register for the SIMD instruction to use

(costs ~20 cycles) 1.0f 2.0f 1.0f 2.0f

Instruction argument

Meltdown 200129Copyright © 2001 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Equal Address Value

Const register file (x4)

1.0f 1.0f 1.0f 1.0f

2.0f 2.0f 2.0f 2.0f

3.0f 3.0f 3.0f 3.0f

2 2 2 2

Address register (x4)

Accessing directly the x4 constant register file.

No penalty for “re-arranging” vertices

Address accessing mode is selected when storing address value

Instruction argument