Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
HistoPyramid stream compaction and expansion
Christopher Dyken1* and Gernot Ziegler2
Advanced Computer Graphics / Vision SeminarTU Graz 23/10-2007
1University of Oslo2Max-Planck-Institut fur Informatik, Saarbrucken
Page 1
GPUs are highly parallel and perfect for all data local operations.
However, re-arranging or selectively deleting data is difficult onGPUs.
Stream compaction and expansion are such operations:
Stream compactionFor each element in input stream,let a predicate determine if the element should be discarded.
Produce a compact output stream of remaining elements.
Generalization: Stream compaction and expansionFor each element in input stream,let a predicate determine each element’s multiplicity, i.e. how many times theelement should be present in the output stream (0 =⇒ discard)
Produce a compact output stream of all elements with a multiplicity greaterthan 0
Page 2
Why bother with stream compaction and expansion?
I Feature extraction:I get a compact list of all locations satisfying a criterion.
I Point cloud generation:I Generate a set of points on an implicitly defined surface.
I Compaction of intermediate results:I Save GPU → CPU bandwith.
I Emulate/Offload geometry-shader:I Our HP-based Marching Cubes implementation does not need
GS, currently even outperforms GS-based approaches.
I Sparse matrix extraction.
I . . .
Data re-organization is an active field of research (see GPU GemsII and III, etc.)
Page 3
The HistoPyramid algorithm
I The input stream is laid out over a N × N grid (=tex2D).
I Each input element are subjected to a predicate function:I count = 0 =⇒ discard from output stream.I count = 1 =⇒ keep in output stream.I count > 1 =⇒ repeat in output stream.
I The output of the predicate func forms the HP base layer.I the HP is a mipmap-pyramid of partial sums.I the HP pyramid is built in log2(N) passes.I top element of HP: number of elements in output.
I Then, iterate over output elements:I extraction of an element is done in log2(N) texture lookups.I Each input element can have multiple copies in output.
I No data transfer from GPU to CPU.
Page 4
Overview
Input image Bucket count
0 2 2 02 0 0 20 1 1 00 1 1 0
4 42 2
12
HistoPyramid
(6,5) (4,6) (5,6)(2,6) (3,6) (7,4)(5,2) (0,4) (1,5)(2,1) (2,2) (5,1)
Point list
Predicate HP-builder Extractor
Page 5
Predicate function
I For each input element, determine output stream count.I count is often binary (1/0)
I For different predicates, several base layers can be built:I per base layer, one HP will be built - in parallel.I predicates may overlap if needed!I NV40: 4×RGBA = 16 predicates, G80: 8×RGBA = 32
predicates.
I Example: extract list of edge pixels (Laplace + threshold):
Input data Laplace + threshold︸ ︷︷ ︸predicate
HP base level
Page 6
HistoPyramid builder
I mipmap-generation, but with sum instead of average:I in effect, each cell counts elements in its sub-pyramid:
1 1 0 11 0 1 00 2 0 11 0 0 0
3 23 1
9
Base level, 4× 4Level 1, 2× 2
Level 2, 1× 1
I top element: total number of elements in base layer =⇒output size.
I Example: HP of the Lena edge-pixels (red = nonzero count):
Page 7
Pointlist extractor
Input: output index used askey index.
0 1 2
3 4 5
6 7 8
Input: Key indices
[0,0],0 [1,0],0 [0,1],0
[3,0],0 [2,1],0 [1,2],0
[1,2],1 [0,3],0 [3,2],0
Output: texcoords & clone ix
Notice:multiplicity from base layer:=1 =⇒ copy once.>1 =⇒ copy multiple times.
9
L2
3 2
3 1
L1
1 1 0 1
1 0 1 0
0 2 0 1
1 0 0 0
L0
∅ [0,1) [1,2) ∅intervals
[0,3) [3,5) [5,8) [8,9)
intervals
∅ [0,2) [2,3) ∅intervals
4
16
1
Page 8
The Marching Cubes Algorithm
The input to the algorithm is an M3 grid of scalar values.
Examine groups of 2× 2× 2 voxels (MC cell).
Check if MC cell’s corners are inside/outside iso-level.
8 corners, inside/outside=⇒ 256 classes.
Each MC class:combination of edgesthat pierce iso-surface.
Use table with geometryfor MC classes, with allpossible triangulationsof the edge intersections(figure).
Determine exact edge-surface intersections and emit corresponding triangles.
Notice: Effectively a stream compaction/expansion process!Page 9
HistoPyramid Marching Cubes
Scalar fieldtexture
Vertexcounttexture
HistoPyramidtexture
Triangulationtable
texture
EnumerationVBO
Startnew frame
Updatescalar field
BuildHP base
HPreduce
Vertex countreadback
Rendergeometry
Iso-level
For each level
Input: A stream of (M − 1)3 MC-cells (2x2x2 voxels grouped).
Predicate: Samples and determines MC class viainside/outside-state of MC cell corners, then writes number ofrequired vertices for MC geometry to base layer.
HistoPyramid: Top element gives total number of vertices in theiso-surface (3× the number of triangles).
Extraction: Use output index to traverse HP, determinecorresponding input element (i.e. which MC cell), remainder tellswhich edge intersection this vertex correspond to, determine edgeintersection and emit position.
Page 10
Datasets used in the performance analysis
Bunny CThead MRbrain
Bonsai Aneurism Cayley
Page 11
Performance of HistoPyramid Marching Cubes
6600GT 7800GT 8800GTX 8800GTXModel MC cells Density HP-VS HP-VS HP-VS NV-SDK10
Bunny 255x255x255 16581375 3.2% – – 538.6 (32.5) –
127x127x127 2048383 5.6% 5.4 (2.6) 11.8 (5.8) 309.5 (151.1) –63x63x63 250047 9.1% 4.0 (16.1) 8.5 (34.1) 163.4 (653.5) 28.3 (113.2)31x31x31 29791 13.6% 2.5 (82.8) 5.0 (167.9) 25.5 (857.0) 21.9 (734.0)
Cth
ead 255x255x128 8323200 3.7% – 16.3 (2.0) 437.6 (53.0) –
127x127x63 1016127 6.3% 5.4 (5.3) 11.6 (11.5) 288.1 (283.6) –63x63x31 123039 9.6% 3.7 (29.9) 7.7 (62.2) 97.3 (791.0) 25.3 (205.9)31x31x15 14415 14.5% 2.3 (161.3) 4.5 (311.5) 12.9 (896.4) 17.1 (1187.0)
mrb
rain 255x255x128 8323200 5.8% – 10.5 (1.3) 309.0 (37.4) –
127x127x63 1016127 7.4% 4.6 (4.5) 9.9 (9.7) 257.7 (263.6) –63x63x31 123039 10.0% 3.5 (28.6) 7.4 (60.0) 96.8 (786.5) 26.4 (214.9)31x31x15 14415 14.9% 2.2 (155.0) 4.3 (300.9) 12.7 (879.7) 18.2 (1257.4)
Bonsa
i 255x255x255 16581375 3.0% – – 560.8 (33.8) –127x127x127 2048383 5.1% 5.9 (2.9) 13.0 (6.3) 329.8 (161.0) –
63x63x63 250047 6.7% 5.4 (21.5) 11.4 (45.5) 186.5 (745.9) 28.9 (115.6)31x31x31 29791 8.2% 4.1 (136.8) 8.0 (268.8) 25.1 (843.0) 24.0 (804.6)
Aneu
rism
255x255x255 16581375 1.6% – – 892.5 (53.8) –127x127x127 2048383 2.1% 12.6 (6.1) 29.1 (14.2) 557.6 (272.2) –
63x63x63 250047 3.7% 9.1 (36.2) 19.2 (76.7) 190.5 (761.9) 32.9 (131.5)31x31x31 29791 6.8% 4.5 (149.7) 8.6 (289.1) 25.0 (839.3) 25.5 (856.6)
Cay
ley 255x255x255 16581375 0.9% – – 1112.3 (67.1) –
127x127x127 2048383 1.9% 13.5 (6.6) 31.2 (15.2) 581.3 (283.8) –63x63x63 250047 3.9% 8.5 (33.9) 17.9 (71.6) 198.0 (791.9) 32.1 (128.5)31x31x31 29791 8.1% 3.7 (123.8) 7.3 (245.8) 25.8 (866.2) 24.7 (827.9)
Numbers in million voxels processed per second (Parentheses: MC runs per second - framerate).
Page 12
References:
I C. Dyken, G. Ziegler, C. Theobalt, H.-P. Seidel,GPU Marching Cubes on Shader Model 3.0 and 4.0,MPI-I-2007-4-006, Max-Planck-Institut fur Informatik, 2007
I C. Dyken, J. Seland, and M.Reimers,Real-Time GPU Silhouette Refinement using adaptively blended Bezierpatches,to appear in Graphics Forum, 2007
I I Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. SeidelEikonal Rendering: Efficient Light Transport in Refractive Objectsto appear in ACM Trans. on Graphics (Siggraph’07), 2007.
I G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel,”GPU Point List Generation through Histogram Pyramids”,MPI-I-2006-4-002, Max-Planck-Institut fur Informatik, 2006.
Page 13