View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Optimization of Mesh Locality forOptimization of Mesh Locality forTransparent Vertex CachingTransparent Vertex Caching
Optimization of Mesh Locality forOptimization of Mesh Locality forTransparent Vertex CachingTransparent Vertex Caching
Hugues HoppeHugues Hoppe
Microsoft ResearchMicrosoft Research
SIGGRAPH 99SIGGRAPH 99
Hugues HoppeHugues Hoppe
Microsoft ResearchMicrosoft Research
SIGGRAPH 99SIGGRAPH 99
Triangle meshesTriangle meshesTriangle meshesTriangle meshes
geometricgeometricprocessingprocessinggeometricgeometricprocessingprocessing
rasterizationrasterizationrasterizationrasterization
graphics processorgraphics processorgraphics processorgraphics processor
frame bufferframe bufferframe bufferframe buffer
busbusbusbus(e.g.(e.g. AGP) AGP)(e.g.(e.g. AGP) AGP)
texturetextureimageimagetexturetextureimageimage
............texturetextureimageimagetexturetextureimageimage
geometrygeometrygeometrygeometry
verticesverticesverticesvertices facesfacesfacesfaces
mesh in memorymesh in memorymesh in memorymesh in memory
System architectureSystem architectureSystem architectureSystem architecture
texturetexturecachecachetexturetexturecachecache
busbusbusbus
CPUCPUCPUCPU
L2 cacheL2 cacheL2 cacheL2 cache
L1 cacheL1 cacheL1 cacheL1 cache
bottleneckbottleneckbottleneckbottleneck
verticesverticesverticesvertices
????
Previous workPrevious workPrevious workPrevious work
16-entry FIFO buffer16-entry FIFO buffer [Deering95, Chow97][Deering95, Chow97]
stack bufferstack buffer [BarYehuda-Gotsman96][BarYehuda-Gotsman96]
mesh compressionmesh compression [Taubin-Rossignac98][Taubin-Rossignac98][Gumhold-Strasser98][Gumhold-Strasser98]……
16-entry FIFO buffer16-entry FIFO buffer [Deering95, Chow97][Deering95, Chow97]
stack bufferstack buffer [BarYehuda-Gotsman96][BarYehuda-Gotsman96]
mesh compressionmesh compression [Taubin-Rossignac98][Taubin-Rossignac98][Gumhold-Strasser98][Gumhold-Strasser98]……
compressedcompressedgeometry streamgeometry stream
compressedcompressedgeometry streamgeometry stream
geometricgeometricprocessingprocessinggeometricgeometricprocessingprocessing
graphics processorgraphics processorgraphics processorgraphics processor
busbusbusbus
meshmeshbufferbuffermeshmeshbufferbuffer
parsingparsinglogiclogic
parsingparsinglogiclogicv1v1 cc v2-v1v2-v1 cc
v3-v2v3-v2 cc v4-v3v4-v3 ccv5-v4v5-v4 cc cc
cc
nθ nθ
Previous workPrevious workPrevious workPrevious work
Drawbacks:Drawbacks: only static geometryonly static geometry new APInew API not backward compatiblenot backward compatible
Drawbacks:Drawbacks: only static geometryonly static geometry new APInew API not backward compatiblenot backward compatible
compressedcompressedgeometry streamgeometry stream
compressedcompressedgeometry streamgeometry stream
geometricgeometricprocessingprocessinggeometricgeometricprocessingprocessing
graphics processorgraphics processorgraphics processorgraphics processor
busbusbusbus
meshmeshbufferbuffermeshmeshbufferbuffer
parsingparsinglogiclogic
parsingparsinglogiclogicv1v1 cc v2-v1v2-v1 cc
v3-v2v3-v2 cc v4-v3v4-v3 ccv5-v4v5-v4 cc cc
cc
Our approachOur approachOur approachOur approach
texturetextureimageimagetexturetextureimageimage
......texturetexturecachecachetexturetexturecachecachetexturetexture
imageimagetexturetextureimageimage
geometricgeometricprocessingprocessinggeometricgeometricprocessingprocessing
rasterizationrasterizationrasterizationrasterization
graphics processorgraphics processorgraphics processorgraphics processor
busbusbusbus
vertexvertexcachecachevertexvertexcachecache
traditionaltraditionalmesh APImesh APItraditionaltraditionalmesh APImesh API
vertexvertexarrayarrayvertexvertexarrayarray
indexedindexedstripsstrips
indexedindexedstripsstrips
Optimize ordering!Optimize ordering!Optimize ordering!Optimize ordering!
No explicit cache managementNo explicit cache managementNo explicit cache managementNo explicit cache management
applicationapplicationapplicationapplication
Transparent vertex cachingTransparent vertex cachingTransparent vertex cachingTransparent vertex caching
Pros:Pros: animated geometryanimated geometry application program unchangedapplication program unchanged backward compatible on legacy hardwarebackward compatible on legacy hardware
Cons:Cons: less compression (but still a factor ~2)less compression (but still a factor ~2)
Pros:Pros: animated geometryanimated geometry application program unchangedapplication program unchanged backward compatible on legacy hardwarebackward compatible on legacy hardware
Cons:Cons: less compression (but still a factor ~2)less compression (but still a factor ~2)
traditional mesh APItraditional mesh APItraditional mesh APItraditional mesh API
vertexvertexarrayarrayvertexvertexarrayarray
indexedindexedstripsstrips
indexedindexedstripsstrips
graphicsgraphicssystemsystem
graphicsgraphicssystemsystem
1111
2222
3333
44 5555
6666
7777
Indexed triangle stripsIndexed triangle stripsIndexed triangle stripsIndexed triangle strips
v1v1v1v1v2v2v2v2v3v3v3v3v4v4v4v4v5v5v5v5v6v6v6v6v7v7v7v7
1111 2222 33334444 3333 55556666
2222 7777 44445555
positionpositionx y zx y z
positionpositionx y zx y z
normalnormalnx ny nznx ny nznormalnormal
nx ny nznx ny nzcolorcolorrgbargbacolorcolorrgbargba
texture1 texture1 u vu v
texture1 texture1 u vu v
texture2 texture2 u vu v
texture2 texture2 u vu v
~ 32 bytes~ 32 bytes~ 32 bytes~ 32 bytes
~ 2 bytes~ 2 bytes~ 2 bytes~ 2 bytes
Cache parametersCache parametersCache parametersCache parameters
size size ??size size ??
replacementreplacementpolicy policy ??
replacementreplacementpolicy policy ??
vertex cachevertex cachevertex cachevertex cache
16 entries16 entries16 entries16 entries
FIFOFIFOFIFOFIFO
Vertex data accessVertex data accessVertex data accessVertex data access
= cache hit= cache hit= cache hit= cache hit = cache miss= cache miss= cache miss= cache miss
traditional stripstraditional stripstraditional stripstraditional strips with cachingwith cachingwith cachingwith caching
transfer ~transfer ~0.5 0.5 vertex/trivertex/tritransfer ~transfer ~0.5 0.5 vertex/trivertex/triassume in cacheassume in cacheassume in cacheassume in cache
transfer ~transfer ~1.0 1.0 vertex/trivertex/tritransfer ~transfer ~1.0 1.0 vertex/trivertex/tri
# misses# misses# misses# misses 0000 1111 2222 3333
Vertex data accessVertex data accessVertex data accessVertex data access
traditional stripstraditional stripstraditional stripstraditional strips with cachingwith cachingwith cachingwith caching
transfer ~transfer ~0.5 0.5 vertex/trivertex/tritransfer ~transfer ~0.5 0.5 vertex/trivertex/tritransfer ~transfer ~1.0 1.0 vertex/trivertex/tritransfer ~transfer ~1.0 1.0 vertex/trivertex/tri
ExampleExampleExampleExamplebefore optimizationbefore optimizationbefore optimizationbefore optimization
# misses# misses# misses# misses 0000 1111 2222 3333
after optimizationafter optimizationafter optimizationafter optimization
47% bandwidth reduction47% bandwidth reduction47% bandwidth reduction47% bandwidth reduction
Optimization problemOptimization problemOptimization problemOptimization problem
Given mesh,Given mesh,find strips minimizing bus bandwidthfind strips minimizing bus bandwidth
( strips correspond to ordering of faces ( strips correspond to ordering of faces F F ))
Given mesh,Given mesh,find strips minimizing bus bandwidthfind strips minimizing bus bandwidth
( strips correspond to ordering of faces ( strips correspond to ordering of faces F F ))
232
FbFrFPF )ˆ(
min 232
FbFrFPF )ˆ(
min
cache miss ratecache miss ratecache miss ratecache miss rate # vertex indices# vertex indices# vertex indices# vertex indices
Two reordering techniquesTwo reordering techniquesTwo reordering techniquesTwo reordering techniques
Greedy strip-growingGreedy strip-growing fast: 40,000 faces/secfast: 40,000 faces/sec
Local optimizationLocal optimization improve initial greedy solutionimprove initial greedy solution
very slowvery slow
Greedy strip-growingGreedy strip-growing fast: 40,000 faces/secfast: 40,000 faces/sec
Local optimizationLocal optimization improve initial greedy solutionimprove initial greedy solution
very slowvery slow
Greedy strip-growingGreedy strip-growingGreedy strip-growingGreedy strip-growing
Inspired by [Chow97]Inspired by [Chow97] Inspired by [Chow97]Inspired by [Chow97]
To decide when to restart strip,To decide when to restart strip, perform lookahead cache simulations perform lookahead cache simulations
To decide when to restart strip,To decide when to restart strip, perform lookahead cache simulations perform lookahead cache simulations
11
2233
44
When to restart strip?When to restart strip?When to restart strip?When to restart strip?
good strip lengthgood strip lengthgood strip lengthgood strip length
22 1133
33 2244
11
44 33
22 11
44
33 22 11
44 33 22
11
44 33
22 11
44
33 22 11
(cache size 4)(cache size 4)(cache size 4)(cache size 4)
When to restart strip?When to restart strip?When to restart strip?When to restart strip?
good strip lengthgood strip lengthgood strip lengthgood strip length strip too longstrip too longstrip too longstrip too long
44
33 22 11
22 1133
33 2244
11
44 33
22 11
44
33 22 11
44 33 22
11
44 33
22
11
33 11
22
44
44 22
33 11
33 11
44 22
44 22
33 11
jump in miss rate!jump in miss rate! jump in miss rate!jump in miss rate!
(cache size 4)(cache size 4)(cache size 4)(cache size 4)
Lookahead simulationsLookahead simulationsLookahead simulationsLookahead simulations
Perform Perform ss simulations simulations Perform Perform ss simulations simulations
44
33 22 11
(a)(a) restart immediately, after 0 faces restart immediately, after 0 faces(a)(a) restart immediately, after 0 faces restart immediately, after 0 faces(b)(b) restart after restart after 0 < i < s0 < i < s faces faces(b)(b) restart after restart after 0 < i < s0 < i < s faces faces
If If (a)(a) is best, restart strip is best, restart strip If If (a)(a) is best, restart strip is best, restart strip
ResultResultResultResult
traditional long stripstraditional long stripstraditional long stripstraditional long strips
face orderface orderface orderface order
within stripwithin stripwithin stripwithin strip
strip restartstrip restartstrip restartstrip restart
ResultResultResultResult
traditional long stripstraditional long stripstraditional long stripstraditional long strips greedy strip-growinggreedy strip-growinggreedy strip-growinggreedy strip-growing
ResultResultResultResultbeforebeforebeforebefore afterafterafterafter
45.8 bytes/triangle45.8 bytes/triangle45.8 bytes/triangle45.8 bytes/triangle 25.5 bytes/triangle25.5 bytes/triangle25.5 bytes/triangle25.5 bytes/triangle
Local optimizationLocal optimizationLocal optimizationLocal optimization
Apply perturbations to face ordering if cost is lowered:Apply perturbations to face ordering if cost is lowered:Apply perturbations to face ordering if cost is lowered:Apply perturbations to face ordering if cost is lowered:
Initial order FInitial order FInitial order FInitial order F
FF1..x-11..x-1FF1..x-11..x-1 FFy+1..my+1..mFFy+1..my+1..mF’=ReflectF’=Reflectx,yx,y(F)(F)F’=ReflectF’=Reflectx,yx,y(F)(F)
FF1..x-11..x-1FF1..x-11..x-1 FFy+1..my+1..mFFy+1..my+1..mF’=Insert1F’=Insert1x,yx,y(F)(F)F’=Insert1F’=Insert1x,yx,y(F)(F) FFyyFFyy
FF1..x-11..x-1FF1..x-11..x-1 FFy+1..my+1..mFFy+1..my+1..mF’=Insert2F’=Insert2x,yx,y(F)(F)F’=Insert2F’=Insert2x,yx,y(F)(F) FFy-1..yy-1..yFFy-1..yy-1..y
FFyyFFyy
xxxx yyyy
FFxxFFxx
FFx..y-1x..y-1FFx..y-1x..y-1
FFy..xy..xFFy..xy..xFFy..xy..xFFy..xy..x
FFyyFFyy
FFy-1..yy-1..yFFy-1..yy-1..y FFx..y-2x..y-2FFx..y-2x..y-2
FFx..y-1x..y-1FFx..y-1x..y-1
ResultResultResultResult
greedy strip-growinggreedy strip-growinggreedy strip-growinggreedy strip-growing local optimizationlocal optimizationlocal optimizationlocal optimization
25.5 bytes/triangle25.5 bytes/triangle25.5 bytes/triangle25.5 bytes/triangle 24.2 bytes/triangle24.2 bytes/triangle24.2 bytes/triangle24.2 bytes/triangle
Bandwidth ResultsBandwidth ResultsBandwidth ResultsBandwidth Results
0
10
20
30
40
50
Byt
es /
tri
ang
le
fandisk gameguy bunny2k bunny4k bunny buddha schooner
Original Greedy strips Local optimization
Improvement by factor of 1.6 – 1.9Improvement by factor of 1.6 – 1.9Improvement by factor of 1.6 – 1.9Improvement by factor of 1.6 – 1.9
Choice of cache sizeChoice of cache sizeChoice of cache sizeChoice of cache size
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70Cache size
Cac
he
mis
s r
ate
r
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70Cache size
Cac
he
mis
s r
ate
r
size 16 sufficient for most gainsize 16 sufficient for most gainsize 16 sufficient for most gainsize 16 sufficient for most gain
Cache replacement policyCache replacement policyCache replacement policyCache replacement policy
22 1133
33 2244
11
44 33
22 11
44
33 22 11
44 33 22
11
all is OKall is OK all is OKall is OK
FIFOFIFOFIFOFIFO LRULRULRULRU(cache size 4)(cache size 4)(cache size 4)(cache size 4)
Cache replacement policyCache replacement policyCache replacement policyCache replacement policy
22 1133
11 4433
22
33 44
22 11
33 11
44 22
44 33
22 11
33 11 44
22
FIFOFIFOFIFOFIFO LRULRULRULRU
22 1133
33 2244
11
44 33
22 11
44
33 22 11
44 33 22
11
strips twice as longstrips twice as long strips twice as longstrips twice as long
(cache size 4)(cache size 4)(cache size 4)(cache size 4)
ComparisonComparisonComparisonComparison
FIFOFIFOFIFOFIFO LRULRULRULRU
FIFOFIFOFIFOFIFO LRULRULRULRU
ComparisonComparisonComparisonComparison
SummarySummarySummarySummary
Vertex caching reduces geometryVertex caching reduces geometrybandwidth by factor of bandwidth by factor of 1.6 to 1.9 1.6 to 1.9
Transparent to application:Transparent to application: simply pre-process the models (fast)simply pre-process the models (fast)
Still efficient on legacy hardwareStill efficient on legacy hardware
Supports dynamic geometrySupports dynamic geometry
Vertex caching reduces geometryVertex caching reduces geometrybandwidth by factor of bandwidth by factor of 1.6 to 1.9 1.6 to 1.9
Transparent to application:Transparent to application: simply pre-process the models (fast)simply pre-process the models (fast)
Still efficient on legacy hardwareStill efficient on legacy hardware
Supports dynamic geometrySupports dynamic geometry
Future workFuture workFuture workFuture work
Issue of cache sizeIssue of cache size Find face ordering good for all sizes?Find face ordering good for all sizes? Standardize on size 16?Standardize on size 16? Reprocess mesh at load timeReprocess mesh at load time
Interaction with texture cachingInteraction with texture caching
Cache efficiency during runtime LODCache efficiency during runtime LOD
Issue of cache sizeIssue of cache size Find face ordering good for all sizes?Find face ordering good for all sizes? Standardize on size 16?Standardize on size 16? Reprocess mesh at load timeReprocess mesh at load time
Interaction with texture cachingInteraction with texture caching
Cache efficiency during runtime LODCache efficiency during runtime LOD