View
212
Download
0
Embed Size (px)
Citation preview
Status – Week Status – Week 239239
Victor MoyaVictor Moya
SummarySummary
Primitive AssemblyPrimitive Assembly Clipping triangle rejection.Clipping triangle rejection. Rasterization.Rasterization. Triangle Setup.Triangle Setup. Early Z.Early Z. Current status.Current status.
Primitive AssemblyPrimitive Assembly
Works as a LRU cache.Works as a LRU cache. Asks the Post T&L cache for missing Asks the Post T&L cache for missing
vertex.vertex. Checks if some of the new vertex are Checks if some of the new vertex are
already in the primitive assembly cache.already in the primitive assembly cache. Three vertex stored (2 for triangles, 3 for Three vertex stored (2 for triangles, 3 for
quads).quads). Last vertex is always bypassed directly Last vertex is always bypassed directly
to Triangle Setup.to Triangle Setup.
Clipping RejectionClipping Rejection Check clipping per vertex.Check clipping per vertex. Apply results per primitive.Apply results per primitive. Reject full primitives.Reject full primitives. DP3 clip plane equation with vertex DP3 clip plane equation with vertex
homogeneous coordinates.homogeneous coordinates. Signed distance between the vertex and the Signed distance between the vertex and the
plane.plane. Clip the primitive when all the vertex are Clip the primitive when all the vertex are
negative for some of the planes.negative for some of the planes. Problem: triangles with all vertex outside the Problem: triangles with all vertex outside the
clip volume, but with a region inside.clip volume, but with a region inside.
RasterizationRasterization
PrimitiveAssembly
Triangle Setup
Traversal Interpolation
Rasterizer Emulator
Setup(vattrib[3]) nextFragment() Interpolate(fr)
RasterizationRasterization
Boxes only carry timing.Boxes only carry timing. Latency and throughput for the setup, Latency and throughput for the setup,
traversal and interpolation traversal and interpolation operations.operations.
Rasterizer Emulator performs the Rasterizer Emulator performs the actual work:actual work: Setup algorithm.Setup algorithm. Traversal algorithm.Traversal algorithm. Interpolation algorithm.Interpolation algorithm.
RasterizationRasterization
Timing and rasterization algorithm are Timing and rasterization algorithm are independent.independent.
Rasterization boxes can simulate as Rasterization boxes can simulate as many ‘stages’ as needed without many ‘stages’ as needed without worrying about functionality.worrying about functionality.
Rasterizer emulator offers an interface Rasterizer emulator offers an interface for all the rasterization operations:for all the rasterization operations: Setup(), Area(), AreaSign(), Setup(), Area(), AreaSign(),
GenerateNextFragment(), GenerateNextFragment(), GenerateNextTile(), InterpolateFragment(), GenerateNextTile(), InterpolateFragment(), InterpolateFragmentAttribute(), etc…InterpolateFragmentAttribute(), etc…
RasterizationRasterization
Setup Box:Setup Box: Get the triangle vertex positions and Get the triangle vertex positions and
attributes.attributes. Send to internal signal ‘setup’ -> simulates Send to internal signal ‘setup’ -> simulates
setup latency.setup latency. Read internal signal ‘setup’.Read internal signal ‘setup’. RastEmu::setup(vattrib[3]).RastEmu::setup(vattrib[3]). RastEmu::getArea().RastEmu::getArea(). Check area sign and face culling method:Check area sign and face culling method:
Reject if area is zero or near zero.Reject if area is zero or near zero. Reject if face culling enabled and wrong sign.Reject if face culling enabled and wrong sign. Invert coefficient signs if front face culling.Invert coefficient signs if front face culling.
Issue triangle to triangle traversal.Issue triangle to triangle traversal.
RasterizationRasterization Traversal Box:Traversal Box:
Read triangles from Setup box.Read triangles from Setup box. Set start point: RastEmu::setStart().Set start point: RastEmu::setStart().
Optional? Optional? Algorithm dependant?Algorithm dependant?
Ask for next fragment/fragment tile: write to Ask for next fragment/fragment tile: write to internal signal ‘next fragment’. Simulates internal signal ‘next fragment’. Simulates fragment generation latency.fragment generation latency.
Read generated fragment: read ‘next fragment’ Read generated fragment: read ‘next fragment’ signal.signal.
RastEmu::nextFragment().RastEmu::nextFragment(). Send fragment to interpolation.Send fragment to interpolation.
RasterizationRasterization Traversal Box:Traversal Box:
Other algorithms could not provide a fragment Other algorithms could not provide a fragment per cycle or have variable latency for each per cycle or have variable latency for each generated fragment.generated fragment.
RastEmu::nextFragment() could return a boolean.RastEmu::nextFragment() could return a boolean. RastEmu::nextFragment() could return the number of RastEmu::nextFragment() could return the number of
generated fragments (or a mask for a tile).generated fragments (or a mask for a tile). RastEmu::nextFragment() could return the ‘amount of RastEmu::nextFragment() could return the ‘amount of
work’.work’. Additional interface functions for fragment generation Additional interface functions for fragment generation
and triangle traversal.and triangle traversal. Fragment culling is done in the rasterizer Fragment culling is done in the rasterizer
emulator?emulator?
RasterizationRasterization
Interpolation box:Interpolation box: Read fragments from Traversal box.Read fragments from Traversal box. Interpolate -> write to ‘interpolate’ signal.Interpolate -> write to ‘interpolate’ signal.
per fragment, orper fragment, or per attributeper attribute
Read ‘interpolate’ signal.Read ‘interpolate’ signal. RastEmu::interpolate().RastEmu::interpolate(). Repeat if per attribute/group of attributes.Repeat if per attribute/group of attributes. Send to fragment FIFO.Send to fragment FIFO.
Triangle SetupTriangle Setup
Using hardware equivalent to a vertex Using hardware equivalent to a vertex shader.shader.
Use multithreading to hide dependecy Use multithreading to hide dependecy latencies.latencies. Same as shaders.Same as shaders. Multiple triangles at setup at the same time.Multiple triangles at setup at the same time.
Minimum setup latency:Minimum setup latency: 6 cycles (just adj(M) using McCool method).6 cycles (just adj(M) using McCool method).
Minimum initialization latency:Minimum initialization latency: 1 cycle using multithreading and enough 1 cycle using multithreading and enough
registers.registers.
Triangle SetupTriangle Setup
Registers:Registers: rA, rB, rC -> Edge equations a, b and c rA, rB, rC -> Edge equations a, b and c
coefficients (adj(M) and Mcoefficients (adj(M) and M-1-1 matrix rows). matrix rows). rX, rY, rW -> the 3 vertices x, y and w rX, rY, rW -> the 3 vertices x, y and w
coordinates (M colums).coordinates (M colums). rD, rI -> matrix determinant and rD, rI -> matrix determinant and
reciprocate.reciprocate. rR -> 1/w equation coefficients.rR -> 1/w equation coefficients. rU -> parameter values at the three verticesrU -> parameter values at the three vertices rP -> parameter equation coefficientsrP -> parameter equation coefficients
Triangle SetupTriangle Setup
Adj(M): (at least 6 cycles + lat. Adj(M): (at least 6 cycles + lat. dep.) dep.) mul rC.xyz, rX.yzx, rY.zxymul rC.xyz, rX.yzx, rY.zxy
mul rB.xyz, rX.zxy, rW.yzxmul rB.xyz, rX.zxy, rW.yzx
mul rA.xyz, rY.yzx, rW.zxymul rA.xyz, rY.yzx, rW.zxy
mad rC.xyz, rX.zxy, rY.yzx, -rCmad rC.xyz, rX.zxy, rY.yzx, -rC
mad rB.xyz, rX.yzx, rW.zyx, -rBmad rB.xyz, rX.yzx, rW.zyx, -rB
mad rA.xyz, rY.zxy, rW.yzx, -rAmad rA.xyz, rY.zxy, rW.yzx, -rA
Triangle SetupTriangle Setup
det(M): (1 cycle)det(M): (1 cycle)
MM-1-1: (4 cycles + dep. lat.): (4 cycles + dep. lat.)rcc rI.x, rD.xrcc rI.x, rD.x
mul rC, rC, rImul rC, rC, rI
mul rB, rB, rImul rB, rB, rI
mul rA, rA, rImul rA, rA, rI
dp3 rD.x, rC, rWdp3 rD.x, rC, rW
Triangle SetupTriangle Setup
1/w coefficients: (2 cycles + dep. 1/w coefficients: (2 cycles + dep. lat.)lat.)
Parameter coefficients: (3 cycles)Parameter coefficients: (3 cycles)
add rR, rA, rB
add rR, rR, rC
dp3 rU.x, rP, rA
dp3 rU.y, rP, rB
dp3 rU.z, rP, rC
Early ZEarly Z
Could be implemented before Could be implemented before interpolation.interpolation.
Interpolate the triangle Z (z/w) Interpolate the triangle Z (z/w) first.first.
Could save some calculations.Could save some calculations. Would save time?Would save time?
Current StatusCurrent Status
(to be done)(to be done)