Status – Week 240 Victor Moya. Summary Post Geometry Pipeline. Post Geometry Pipeline....

Preview:

Citation preview

Status – Week Status – Week 240240

Victor MoyaVictor Moya

SummarySummary

Post Geometry Pipeline.Post Geometry Pipeline. Rasterization.Rasterization. Triangle Setup.Triangle Setup. Triangle Traversal.Triangle Traversal. Interpolation.Interpolation. Current status.Current status.

Post Geometry PipelinePost Geometry Pipeline

Divide by w?Divide by w? Clipping?Clipping?

NVidia doesn’t seem to have geometric NVidia doesn’t seem to have geometric clipping.clipping.

Alpha kill in NV2x for user clip planes.Alpha kill in NV2x for user clip planes. ATI seems to have geometric clipping.ATI seems to have geometric clipping.

Proper user clipping.Proper user clipping. No support for transformed and lit vertex clipping.No support for transformed and lit vertex clipping.

What do we do?What do we do?

Post Geometry PipelinePost Geometry Pipeline

Clipping:Clipping: 6 frustum clip planes.6 frustum clip planes. At least 6 user clip planes.At least 6 user clip planes. Hardware requeriments:Hardware requeriments:

Plane – edge intersection (?).Plane – edge intersection (?). Generates new vertices (for triangles 1 or 2).Generates new vertices (for triangles 1 or 2).

– Interpolate output attributes at the new vertex.Interpolate output attributes at the new vertex. Can generate new triangles (for triangles 1).Can generate new triangles (for triangles 1).

– Affects primitive assembly.Affects primitive assembly.

At least frustum clipping should be fast.At least frustum clipping should be fast.

Post Geometry PipelinePost Geometry Pipeline

Viewport TransformationViewport Transformation Delay to end of rasterization (at Delay to end of rasterization (at

conversion from fixed point to float conversion from fixed point to float point fragment attributes).point fragment attributes).

Use fixed point device coordinates [-Use fixed point device coordinates [-1, 1] for rasterization.1, 1] for rasterization.

Rasterization.Rasterization.

MC

StF

StOC

StC

PA TS

TT

Int

StL

Shader

1

1

1

1 1

1

A*TL+L

A*TL+L

2 1 1 1

MC: Memory Controller Shader: Vertex Shader

StF: Streamer Fetch PA: Primitive Assembly

StL: Streamer Loader TS: Triangle Setup

StOC: Streamer Output Cache TT: Triangle Traversal

StC: Streamer Commit Int: Interpolation

RasterizationRasterization

We can divide it in three phases:We can divide it in three phases: Setup.Setup.

Calculate linear equation coefficients, start values Calculate linear equation coefficients, start values and slopes.and slopes.

Perform area and face culling.Perform area and face culling. Traversal.Traversal.

Traverse the triangle generating fragments inside Traverse the triangle generating fragments inside the triangle.the triangle.

Clipping of fragments by frustum and user clip. Clipping of fragments by frustum and user clip. Interpolation.Interpolation.

Interpolate all fragment attributes for the Interpolate all fragment attributes for the generated fragment.generated fragment.

Primitive Assembly

vertex attributes

vertex from theStreamer Commit

Triangle Setup

Triangle Traversal

start &offset

Interpolate

vertex attributes

Fragment FIFO

EdgeEquationvalues

vertexposition

vertexposition

Triangle SetupTriangle Setup Use 2DH rasterization setup.Use 2DH rasterization setup. Create matrix (inverse or just adjoint Create matrix (inverse or just adjoint

matrix?) from the three vertex 2DH positions.matrix?) from the three vertex 2DH positions. Calculate determinant.Calculate determinant. Cull for sign (face culling) and zero (zero Cull for sign (face culling) and zero (zero

area).area). Send the edge equation coefficients or/and Send the edge equation coefficients or/and

start and slope values to Triangle Traversal.start and slope values to Triangle Traversal. Optional: send other equations (1/w, clip Optional: send other equations (1/w, clip

planes, interpolators …).planes, interpolators …).

Triangle SetupTriangle Setup Adjoint rasterization matrix adj(M):Adjoint rasterization matrix adj(M):

First level: 18 muls.First level: 18 muls. Second level: 9 adds.Second level: 9 adds. aa00 = y = y11ww22 – y – y22ww11

aa11 = y = y22ww00 – y – y00ww22

aa2 2 = y= y00ww11 – y – y11ww00

bb00 = x = x22ww11 – x – x11ww22

bb11 = x = x00ww22 – x – x22ww00

bb22 = x = x11ww00 – x – x00ww11

cc00 = x = x11yy22 – x – x22yy11

cc11 = x = x22yy00 - x - x00yy22

cc22 = x = x00yy11 – x – x11yy00

Triangle SetupTriangle Setup

Matrix determinant det(M):Matrix determinant det(M): 1 DP3: {w1 DP3: {w00, w, w11, w, w22} X {c} X {c00, c, c11, c, c22}}

Inverse matrix MInverse matrix M-1-1 (not needed?): (not needed?): First level: 1 reciproque: 1/det(M).First level: 1 reciproque: 1/det(M). Second level: 9 muls.Second level: 9 muls.

Edge equations:Edge equations: MM-1-1 rows. rows. EE00 = [a = [a00, b, b00, c, c00]] EE11 = [a = [a11, b, b11, c, c11]] EE22 = [a = [a22, b, b22, c, c22]]

Triangle SetupTriangle Setup 1/w equation:1/w equation:

Sum of rows (param vector {1, 1, 1}).Sum of rows (param vector {1, 1, 1}). Can be calculated as the sum of the edge Can be calculated as the sum of the edge

equations.equations. Additional equations:Additional equations:

param vector {uparam vector {u00, u, u11, u, u22} X M} X M-1-1 : 3 DP3. : 3 DP3. Frustum/Viewport clip:Frustum/Viewport clip:

DD00 = [1, 0, -x = [1, 0, -x00]] DD11 = [-1, 0, x = [-1, 0, x00 + w] + w] DD22 = [0, 1, -y = [0, 1, -y00]] DD33 = [0, -1, y = [0, -1, y00 + h] + h]

*

*

*

++

*

*

DP3

Triangle TraversalTriangle Traversal

Different algorithms:Different algorithms: I don’t know which is better.I don’t know which is better. Scanline.Scanline. Centerline (PixelVision).Centerline (PixelVision). Tiled (Neon, McCormack).Tiled (Neon, McCormack). Incremental and Hierarchical Hilbert Incremental and Hierarchical Hilbert

Order (McCool).Order (McCool). Others?Others?

Triangle TraversalTriangle Traversal

Traversal algorithm effects:Traversal algorithm effects: Can improve the texture pattern access Can improve the texture pattern access

(Neon, Hilbert).(Neon, Hilbert). Can improve framebuffer memory access Can improve framebuffer memory access

(Neon).(Neon). Traversal algorithm requeriments:Traversal algorithm requeriments:

Must produce at least 2x2 fragments per Must produce at least 2x2 fragments per cycle or multiples (2 2x2 or 3 2x2, etc).cycle or multiples (2 2x2 or 3 2x2, etc).

Must be efficient and generate the less Must be efficient and generate the less fragments outside the triangle.fragments outside the triangle.

Antialiasing?Antialiasing?

Triangle TraversalTriangle Traversal Uses edge equation coefficients and/or start Uses edge equation coefficients and/or start

and slope values calculated from then to and slope values calculated from then to walk the triangle.walk the triangle.

One ‘step’ per cycle.One ‘step’ per cycle. Fixed point arithmetic : integer addition.Fixed point arithmetic : integer addition. Requires to save state (2 to 3 saved states) Requires to save state (2 to 3 saved states)

or must use walk back (spends cycles).or must use walk back (spends cycles). Tests (sign) the edge equations values at n Tests (sign) the edge equations values at n

positions per cycle.positions per cycle. May test frustum and znear/zfar clip at the May test frustum and znear/zfar clip at the

same time.same time.

Triangle TraversalTriangle Traversal

Hardware requeriments:Hardware requeriments: Multiple fixed point adders.Multiple fixed point adders. Multiple sign testers.Multiple sign testers. Registers for current (at least 3 for Registers for current (at least 3 for

each edge equation) and saved states.each edge equation) and saved states. Registers for edge slops/increments Registers for edge slops/increments

(as many as fragments generated per (as many as fragments generated per cycle and edge equations?).cycle and edge equations?).

TraversalAlgorithm

+

+

+

TE

ST

Interpolation.Interpolation.

Using barycentric method:Using barycentric method: Use the edge equation result (McCool):Use the edge equation result (McCool):

FF00(x,y) = E(x,y) = E00

FF11(x,y) = E(x,y) = E11

FF22(x,y) = E(x,y) = E22

Calculate sum of edge equations at the Calculate sum of edge equations at the fragment: fragment:

R’(x,y) = FR’(x,y) = F00 (x,y) + F (x,y) + F11(x,y) + F(x,y) + F22(x,y)(x,y) Calculate reciproque:Calculate reciproque:

r = 1/R’(x,y)r = 1/R’(x,y) Interpolate attribute at the fragment:Interpolate attribute at the fragment:

ppkk(x,y) = p(x,y) = pk0k0rFrF00 (x,y) + p (x,y) + pk1k1rFrF11(x,y) + p(x,y) + pk2k2rFrF22(x,y)(x,y)

InterpolationInterpolation

Alternative (Olano & Greer):Alternative (Olano & Greer): At setup:At setup:

Use 2DH method and calculate coefficients for all Use 2DH method and calculate coefficients for all the attributes.the attributes.

Calculate 1/w (sum of rows) coefficients.Calculate 1/w (sum of rows) coefficients. Requires a vector matrix mul per attribute.Requires a vector matrix mul per attribute.

At traverse/interpolation:At traverse/interpolation: Interpolate 1/w and attributes using fixed point Interpolate 1/w and attributes using fixed point

incremental arithmetic.incremental arithmetic. Calculate reciproque of 1/w.Calculate reciproque of 1/w. Mul interpolated attribute by reciproque of 1/wMul interpolated attribute by reciproque of 1/w

InterpolationInterpolation Barycentric coordinates (McCool):Barycentric coordinates (McCool):

no cost at setup.no cost at setup. store the parameter values at the three triangle store the parameter values at the three triangle

edges.edges. fixed: 1 addition, 1 reciproque and 3 mulsfixed: 1 addition, 1 reciproque and 3 muls per parameter: 1 DP3.per parameter: 1 DP3.

Interpolation using Olano & Greer:Interpolation using Olano & Greer: vector matrix mul at setup per parameter and 1/w: 3 vector matrix mul at setup per parameter and 1/w: 3

DP3.DP3. store current state and slope increment for all the store current state and slope increment for all the

parameters and 1/w.parameters and 1/w. fixed: 1 addition, 1 reciproquefixed: 1 addition, 1 reciproque per parameter: 1 addition, 1 mul.per parameter: 1 addition, 1 mul.

InterpolationInterpolation

How many attributes/parameters How many attributes/parameters can be interpolated per cycle?can be interpolated per cycle? XBOX: XBOX:

5 interpolators?5 interpolators? general interpolator: color diffuse + color general interpolator: color diffuse + color

specular (shared).specular (shared). Texture interpolators: 4?Texture interpolators: 4? Note: each of those interpolators is for a Note: each of those interpolators is for a

4D vector.4D vector.

VERTEX ATTRIBUTES

+ 1/x

*

*

*

*

*

*

+

FRAGMENT ATTRIBUTES

Current statusCurrent status Implemented Primitive Assembly box (with Implemented Primitive Assembly box (with

trivial degenerate triangle rejection).trivial degenerate triangle rejection). Added Added GPU_VERTEX_OUTPUT_ATTRIBUTE GPU_VERTEX_OUTPUT_ATTRIBUTE

register.register. Boolean vector of Boolean vector of MAX_VERTEX_ATTRIBUTES that stores MAX_VERTEX_ATTRIBUTES that stores

if a vertex output register is written in the shader (and if a vertex output register is written in the shader (and therefore must be transmited).therefore must be transmited).

Now the transmission latency for vertex Now the transmission latency for vertex between the Shader and Streamer Commit between the Shader and Streamer Commit and between Streamer Commit and and between Streamer Commit and Primitive Assembly is determined by the Primitive Assembly is determined by the number of ouput attributes. number of ouput attributes.

Current StatusCurrent Status

Started Triangle Setup box and Started Triangle Setup box and support classes.support classes.

Current StatusCurrent Status

Comments:Comments: Streamer Loader to Shader Streamer Loader to Shader

transmission should also have transmission should also have transmission latency penalty?transmission latency penalty?

Where are stored the vertex output Where are stored the vertex output attributes?attributes?

How many times we must pay the How many times we must pay the vertex transmission penalty?vertex transmission penalty?

Current StatusCurrent Status

Signal Analyzer:Signal Analyzer: Already works with large traces.Already works with large traces.

ReferencesReferences

Triangle Scan Conversion using 2D Triangle Scan Conversion using 2D Homogeneous CoordinatesHomogeneous Coordinates, Marc , Marc Olano, Trey Greer.Olano, Trey Greer.

Tiled Polygon Traversal Using Half-Tiled Polygon Traversal Using Half-Plane Edge FunctionsPlane Edge Functions, Joel , Joel McCormack, Robert McNamara.McCormack, Robert McNamara.

Incremental and Hierarchical Hilber Incremental and Hierarchical Hilber Order Edge Equation Polygon Order Edge Equation Polygon RasterizationRasterization, Michael D. McCool, , Michael D. McCool, Chris Wales, Kevin Moule.Chris Wales, Kevin Moule.

ReferencesReferences

A Parallel Algorithm for A Parallel Algorithm for Polygon RasterizationPolygon Rasterization, Juan , Juan Pineda.Pineda.

Recommended