View
212
Download
0
Tags:
Embed Size (px)
Citation preview
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Oblivious Mesh Layouts
Sung-Eui Yoon, Peter LindstromValerio Pascucci, Dinesh Manocha1: University of North Carolina - Chapel Hill2: Lawrence Livermore National Laboratory
1
1
2
2
http://gamma.cs.unc.edu/COL
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Goal
• Compute cache-coherent layouts of polygonal meshes ♦ For geometric processing and
visualization♦ Handle any kinds of polygonal
models (e.g., irregular geometry)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Motivation
• High growth rate of computational power of CPUs and GPUs
Growth rateduring 1993 – 2004
Courtesy: http://www.hcibook.com/e3/online/moores-law/
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Memory Hierarchies and Caches
CPU or GPU
Fast memory or cache
Slow memory
Blocktransfer
Disk
106nsAccess time: 102ns100ns
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Coherent Layouts
• Cache-Aware♦ Optimized for particular cache
parameters (e.g., block size)
• Cache-Oblivious♦ Minimizes data access time without
any knowledge of cache parameters♦ Directly applicable to various
hardware and memory hierarchies
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
82 million trianglesIrregular distribution of geometry
CAD Model – Double Eagle Tanker Model
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Isosurface and Scanned Models
Isosurface100M triangles
St. Matthew372M triangles
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Main Contribution
• Algorithm to compute cache-oblivious layouts of polygonal meshes
Cache-oblivious metric
Multilevel optimization framework
Applicable to hierarchical representations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Live Demo – View-Dependent Rendering (VDR)
GeForce Go 6800 Ultra
• Based on multiresolution hierarchy♦ Dynamically computes simplification♦ Cache-oblivious layout is used to
minimize GPU vertex cache misses
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Related Work
• Cache-coherent algorithms• Mesh layouts
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Coherent Algorithms
• Cache-aware [Coleman and McKinley 95, Vitter 01, Sen et al. 02]
• Cache-oblivious [Frigo et al. 99, Arge et al. 04]
Focus on specific problems such as sorting and linear algebra computations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Mesh Layouts
• Rendering sequences♦ Triangle strips♦ [Deering 95, Hoppe 99, Bogomjakov
and Gotsman 02]
• Processing sequences♦ [Isenburg and Gumhold 03, Isenburg
and Lindstrom 04]
Assume that access patternglobally follows the layout order!
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Mesh Layouts
• Space-filling curves♦ [Sagan 94, Velho and Gomes 91,
Pascucci and Frank 01, Lindstrom and Pascucci 01, Gopi and Eppstein 04]
Assume geometric regularity!
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview• Cache-oblivious metric• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview• Cache-oblivious metric• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Overview
Multilevel optimizationCache-oblivious metric
Local permutations
va
vb vd
vc
Input graph
va vb vd vc
Result 1D layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Graph-based Representation
• Undirected graph, G = (V, E)♦ Represents access patterns of
applications
• Vertex♦ Data element ♦ (e.g., mesh vertex or mesh triangle)
• Edge♦ Connects two vertices if they are
likely to be accessed sequentially
va
vb vd
vc
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Problem Statement
• Vertex layout of G = (V, E)♦ One-to-one mapping of vertices to
indices in the 1D layout
• Compute a that minimizes the expected number of cache misses
: |}|, ... ,1{ VVva
vb vd
vc
va vb vd vc
1 2 3 4
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Local Permutation
Vertex layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Terminology
• Edge span of (va, vb)|)()(| ba vv
Layout mapping
1)( av
5)( cv
4|)()(| ca vv
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Terminology
• ♦ Set of edges having edge span i in
the layout
iE
4),( Evv ca 4
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Terminology
• Edge span distribution ♦ where i is in [1, n]|| iE
1|| 3 E1|| 2 E
1|| 4 E
4|| 1 E
Edge span1
Number of edges
2 3 4
1
1
1
1
4
2
3
4
1
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache Miss Ratio Function (CMRF),
• Probability of a cache miss for a given edge span i
ip
0
1Cache miss ratio =Probability to have
a cache miss
Edge span
ip
1 n-1i
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Number of Cache Misses at Runtime
• Estimated by multiplying two factors♦ Runtime edge span distribution♦ CMRF
1D Layout:
Edge span 2 Edge span 4 Edge span 2
2p 2p4p+ + ( 2 1, () 2p 4p, )( )
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Number of Cache Misses at Runtime
1D Layout:
Edge span 2 Edge span 4 Edge span 2
2p 2p4p+ + ( 2 1, () 2p 4p, )
Runtime edge span distribution CMRF
( )
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Expected Number of Cache Misses
♦ Approximate runtime edge span distribution with one of the layout
1
1
||n
iii pE
Edge span distribution of the layout
The number of vertices
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview• Cache-oblivious metric• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Oblivious Metric
• Decides if a local permutation reduces number of cache misses♦ Probabilistic formulation♦ Reduces to geometric volume
computation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Does a Local Permutation Decrease Cache Misses?
1
1
||n
iii pE
1
1
|)||(|n
iiii pEE
|||| ii EE || iE
?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Does a Local Permutation Decrease Cache Misses?
1
1
||n
iii pE
1
1
|)||(|n
iiii pEE
0||1
1
n
iii pE
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Monotonocity of CMRF,ip
• Assume CMRF is a monotonically increasing function of edge span
0
1Cache miss
ratio
Edge span
ip
1 ∞i
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Exact Cache-Oblivious Metric
0||1
1
n
iii pE
where
All the possible cache configurations
1...0 1221 nn pppp
Monotonicity of CMRF
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Geometric Formulation
where
0||1
1
n
iii pE
1...0 1221 nn pppp
Half hyperspacep2
p10
Closed hyperspace1 n
p2
p10
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Geometric Volume Computation
• Assume each CMRF to be equally likely
• Half hyperspace (blue area)♦ Space of CMRFs that reduce cache misses
p2
p10where
0||1
1
n
iii pE
1...0 1221 nn pppp
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Geometric Volume Computation
Time complexity♦ Exact: [Lasserre and Zeron
01]♦ Approximate: [Kannan et al. 97]
)( 1nnO)( 5nO
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
p2
p10
Fast and Approximate Volume Comparison
• Define a top polytope in closed hyperspace
• Compute the centroid, C, of the top polytope
Top polytope Centroid, C
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
p2
p10
Fast and Approximate Volume Comparison
• Use the centroid for approximate volume comparison♦ The volume containing the centroid is
likely to be larger
Centroid, C
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Bound of Approximation
• 0.1% ~ 0.3% compared to the exact metric
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Final Approximate Metric
0||1
)(
m
jjl jE
Centroid
Pack non-zero to 1,…, m || iE
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Layout Optimization
• Find an optimal layout that minimizes our metric♦ Combinatorial optimization problem
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Multilevel Minimization
Step 1: Coarsening
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Multilevel Minimization
Step 2: Ordering of coarsest graph
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Multilevel Minimization
Step 3: Refinement and
local optimization
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview• Cache-oblivious layouts• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Layout Computation Time
• Process 70 million vertices per hour♦ Takes 2.6 hours to lay out St.
Matthew model (372 million triangles)
♦ 2.4GHz of Pentium 4 PC with 1 GB main memory
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Edge Span Distributions of Different Layouts
Cache-oblivious layout
Spectral layout
Original layout
Edge span
Nu
mb
er o
f ed
ges
>
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Applications
• View-dependent rendering• Collision detection• Isocontour extraction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
View-Dependent Rendering
• Layout vertices and triangles of CHPM [Yoon et al. 04]♦ Reduce misses of GPU vertex cache
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
View-Dependent Rendering
Models # of Tri.Our
layout
Simplification layout
[Yoon et al. 04]
St. Matthew
372M 106 M/s 23 M/s
Isosurface 100M 90 M/s 20 M/s
Double Eagle
Tanker82M 47 M/s 22 M/s
4.5X
2.1X
Peak performance: 145 M tri / s on GeForce 6800 Ultra
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Realtime Captured Video – St. Matthew Model
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison with Other Rendering Sequences
Our layout
Universal rendering sequences[Bogomjakov and Gotsman 2002]
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison with Other Rendering Sequences
Our layout
[Hoppe 99]
Optimized for 16 vertex cache sizewith FIFO replacement
Optimized for no particular cache size
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Performance during View-Dependent Rendering
Our layout
[Hoppe 99]
Optimized for various resolutions
Optimized for full resolution
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison with Space Filling Curve on Power Plant Model
Our layout
Space filling curve (Z-curve)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Collision Detection
• Bounding volume hierarchies♦ Widely used to accelerate the
performance of collision detection♦ Traversed to find contacting area♦ Uses pre-computed layouts of OBB
trees [Gottschalk et al. 96]
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Rigid Body Simulation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Collision Detection Time
2X on average
Depth-first layout
Cache-oblivious layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Isocontour Extraction
• Contour tree [van Kreveld et al. 97]
• Use mesh as the input graph
• Extract an isocontour that is orthogonal to z-axis
Puget sound, 134 M triangles
Isocontourz(x,y) = 500m
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison – FirstExtraction of Z(x,y) = 500m
Relative Performance
overZ-axis sorted
layout
Nearly optimized for particular isocontour
2
21
13
1
Disk access time is bottleneck
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison – Second Extraction of Z(x,y) = 500m
Relative Performance
overZ-axis sorted
layout
2
21
13
379
212
10.8
Memory and L1/L2 cache access times are bottleneck
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Limitations
• Assumptions on CMRF♦ May not work well for all applications
• Does not compute global optimum♦ Greedy solution
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Advantages
• General ♦ Applicable to all kinds of polygonal
models♦ Works well for various applications
• Cache-oblivious♦ Can have benefit from CPU/GPU
cache to memory and disk
• No modification of runtime application♦ Only layout computation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
OpenCCL: Cache-Coherent Layouts of Graphs and Meshes• Source codes for computing a
cache-coherent layout • Easy to use
CLayoutGraph Graph (NumVertex);
0
1 2
Graph.AddEdge (0, 1);Graph.AddEdge (0, 2);Graph.AddEdge (1, 2);
int Order [NumVertex];Graph.ComputeOrdering (Order);
Google “Cache Oblivious Mesh Layout” or
Http://gamma.cs.unc.edu/COL
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Conclusion
• Novel algorithm for computing cache-oblivious mesh layouts♦ Cast the problem as an optimization♦ Probabilistically compute the
expected number of caches misses♦ Achieve significant improvements (2
to 20X) without modifying runtime applications
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Ongoing and Future Work
• Apply to other applications ♦ Simplification and approximate
collision detection [Yoon et al. 04]♦ Shortest path computation, etc.
• Investigate optimality
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Ongoing and Future Work
• Cache-Oblivious Layouts of Bounding Volume Hierarchies [Yoon and Manocha 05] ♦ Tech. Report, University of North
Carolina at Chapel Hill
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Acknowledgements
• Anonymous donor ♦ Power plant model
• Digital Michelangelo Project♦ St. Matthew model at Stanford
University
• LLNL ASCI VIEWS♦ Isosurface model
• Newport news shipbuilding♦ Double eagle tanker
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Acknowledgements
• Army Research Office• DARPA• Intel Corporation• Lawrence Livermore Nat’l Lab.• National Science Foundation• Office of Naval Research• RDECOM
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
• Martin Isenburg• Dawoon Jung• Brandon Lloyd• Elise London• Brian Salomon• Avneesh Sud
Acknowledgements
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Questions?
Project URLhttp://gamma.cs.unc.edu/COL