REAL-TIME SMOOTH SURFACE CONSTRUCTION ON THE …ufdcimages.uflib.ufl.edu/UF/E0/02/19/75/00001/ni_t.pdf1.3 Modern GPU Pipeline and Current Trends A graphics processing unit (GPU) is

REAL-TIME SMOOTH SURFACE CONSTRUCTION ON THE GRAPHICS PROCESSINGUNIT

By

TIANYUN NI

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2008

1

c© 2008 Tianyun Ni

2

To my family, especially my father and to all of whom have lentencouragement and support

during the time spent on this research

3

ACKNOWLEDGMENTS

I wish to express my sincerest thanks to the chair of my dissertation committee, Dr. Jorg,

Peters, for working with me throughout this long enterprise.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111.2 Problem Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131.3 Modern GPU Pipeline and Current Trends. . . . . . . . . . . . . . . . . . . . . 141.4 Representations in Surface Modeling. . . . . . . . . . . . . . . . . . . . . . . . 17

1.4.1 Subdivision Surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . .171.4.2 Parametric Patches. . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

1.4.2.1 Bezier technique. . . . . . . . . . . . . . . . . . . . . . . . . 221.4.2.2 Related work. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 A NEW SCHEME FOR SURFACE CONSTRUCTION. . . . . . . . . . . . . . . . . 25

2.1 Contribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252.2 The Conversion Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

2.2.1 The Conversion Rules for a Type-1 Quad. . . . . . . . . . . . . . . . . . 272.2.2 The Conversion Rules for a Type-2, or Type-3 Quad. . . . . . . . . . . . 29

2.3 Derivation of the coefficients of a c-patch. . . . . . . . . . . . . . . . . . . . . 302.3.1 Derivation ofλ0 andλ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Derivation ofb211 andb121 . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.3 Derivation ofb112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33

2.4 Smoothness Verification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .352.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

2.5.1 Number of Patches. . . . . . . . . . . . . . . . . . . . . . . . . . . . .392.5.2 Cost of Patch Construction. . . . . . . . . . . . . . . . . . . . . . . . . 392.5.3 Cost of Surface Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . 39

2.6 Approximation Catmull-Clark Subdivision Surface. . . . . . . . . . . . . . . . 402.7 Water-Tight Surface Verification. . . . . . . . . . . . . . . . . . . . . . . . . . 402.8 Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40

3 GPU IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .423.2 2-pass Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

5

3.3 1-pass Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .443.4 Coordinate System Transformation. . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Water-Tight Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

4 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

4.1 Shape Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .484.2 Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .504.3 Displacement Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .514.4 Morphing and Animation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .524.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

5 PATCH CONVERSIONS FOR MESHES WITH TRI/QUAD/PENT FACETS. . . . . 56

6 DISCUSSION AND FUTURE WORK. . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.1 Future GPU API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .596.2 Volume Preservation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .596.3 Adaptive Tessellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65

6

LIST OF TABLES

Table page

4-1 ALU operations for evaluation at(u, v) . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4-2 Performance results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50

4-3 Performance of the 1-pass implementation.. . . . . . . . . . . . . . . . . . . . . . . 51

7

LIST OF FIGURES

Figure page

1-1 Polygonal modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

1-2 Problem statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

1-3 DirectX 10 pipeline stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14

1-4 DirectX 10 pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

1-5 The primitives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

1-6 The notations of input mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

1-7 The three possible configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

1-8 The Catmull-Clark stencils. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

1-9 The subdivision schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

1-10 The suggested rendering passes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

1-11 Future GPU architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

1-12 The subdivision schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

2-1 Derivation of c-patch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

2-2 Vertex computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

2-3 Surface conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

2-4 Computing control pointsv, e, f andt, the projection ofe . . . . . . . . . . . . . . . . 27

2-5 Patch-based computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

2-6 Patch computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

2-7 The re-parameterization ofλ to meetG1 at the vertex . . . . . . . . . . . . . . . . . . 32

2-8 Coefficientsb211 andb121 of c-patch is derived on top of a ghost patch.. . . . . . . . . 32

2-9 The choice of middle point in c-patch. . . . . . . . . . . . . . . . . . . . . . . . . . 34

2-10 The center of a bi-cubic patch can be evaluated by the linear combination of the bound-ary coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

2-11 C1 transition between a triangular and a bicubic patch.. . . . . . . . . . . . . . . . . 37

2-12 G1 transition between two triangular patches.. . . . . . . . . . . . . . . . . . . . . . 38

8

3-1 2-Pass implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

3-2 2-Pass conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

3-3 1-Pass conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

3-4 1-Pass implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

3-5 (u, v) on an irregular quad.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

3-6 Water-tight Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

4-1 Shape quality comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

4-2 Catmull-Clark approximation comparison. . . . . . . . . . . . . . . . . . . . . . . . 49

4-3 Ordinary patches and extraordinary patches. . . . . . . . . . . . . . . . . . . . . . . 49

4-4 GPU smoothed quad surfaces with displacement mapping.. . . . . . . . . . . . . . . 49

4-5 Close-up of the frog. The refined mesh is water-tight.. . . . . . . . . . . . . . . . . . 51

4-6 Displacement mapping on the frog model. . . . . . . . . . . . . . . . . . . . . . . . 52

4-7 Shape comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

4-8 Shape comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

4-9 Real time animation on the Sword model.. . . . . . . . . . . . . . . . . . . . . . . . 54

4-10 Real time animation on the Frog model.. . . . . . . . . . . . . . . . . . . . . . . . . 54

4-11 Asynchronous animation of nine Frogs.. . . . . . . . . . . . . . . . . . . . . . . . . 54

5-1 The reasons for using Tr/Quad/Pent Meshes. . . . . . . . . . . . . . . . . . . . . . . 56

5-2 A quad/tri/pent model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

5-3 Patch representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

5-4 Triangular representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

9

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

REAL-TIME SMOOTH SURFACE CONSTRUCTION ON THE GRAPHICS PROCESSINGUNIT

By

Tianyun Ni

August 2008

Chair: Jorg, PetersMajor: Computer Engineering

Increased realism in interactive graphics and gaming requires complex smooth surfaces

to be rendered at ever higher frame rates. In particular, representations used to model surfaces

offline, such as spline and subdivision surfaces, have to be modified or reorganized to allow

for efficient usage of the graphics processing unit and its SIMD (Single Instruction, Multiple

Data) parallelism. This dissertation presents a novel algorithm for converting quad meshes on

the GPU to smooth, water-tight surfaces at the highest speeddocumented so far. The conversion

reproduces bi-cubic splines wherever possible and closelymimics the shape of the Catmull-Clark

subdivision surface by c-patches where a vertex has a valence different from 4. The smooth

surface is piecewise polynomial and has well-defined normals everywhere.

10

CHAPTER 1INTRODUCTION

This chapter introduces the challenges that motivate the dissertation, gives a detailed

literature review, positions of the research relative to the current state of the art. and an overview

of the modern GPU pipeline.

1.1 Motivation

In graphics, 3D objects are approximated by polyhedral meshes of great complexity. For

example, a game character can consist of tens of thousands ofpolygons (Figure1-1). Increased

realism in interactive gaming demands such meshes to be animated and rendered in real-time.

There are essentially two major approaches in the literature which serve this purpose: Polygonal

Modeling and Higher-order Surface Modeling.

There are two scenarios of animations: Morphing and Skinning. Morphing is used to

change one image into another through a seamless transition. Skinning is a common technique

to deform characters [20, 23, 24, 32]. The animated mesh, referred as a ”skin”, is deformed

based on the pose of an underlying skeleton. In Polygonal Modeling (Figure1-1), skinning

and morphing are applied to a high-detail mesh created by an artist. Most games currently use

this approach. This technique involves redundant work due to minimal sharing in Polygonal

Modeling representation. In addition, a large number of vertices in a complex mesh must be fed

into the graphics pipeline via the GPU’s memory bus, which isa potential bottleneck.

Figure 1-1. Polygonal Modeling: currently the popular animation approach in games.

11

The alternative approach, Surface Modeling, animates a coarse mesh (Figure1-2).

Subdivision surfaces and parametric patches, as two popular high-order surface representations,

both support level of detail rendering (see Section 1.4). Highly-detailed 3D models are produced

by displacement mapping [11]. Displacement mapping adds fine details in form of scalar

fields on the smooth surface defined by the coarse mesh. As a specific instance, Lee [27]

proposes Displaced Subdivision Surface to represent a detailed surface model as a scalar value

displacement over a smooth surface domain. This approach reduces the number of vertices that

must be read and animated in each frame because complex geometric details are generated on

the GPU. The runtime cost now includes the conversion process from the coarse input mesh to

the final complex mesh. The conversion process involves surface construction, evaluation and

displacement mapping.

Figure 1-2. Each high-detail mesh in Surface Modeling is represented by a coarse control meshwith a displacement map. The coarse control mesh is first converted to a smoothsurface. Then the surface is tessellated and the vertices are perturbed in the normaldirections based on the corresponding value in the displacement map. Last, thenormal at each vertex of the refined highly-detailed mesh is updated.

In summary, the advantages of Surface Modeling are

1. lower computation cost of animation because skinning is done on the coarse mesh, not thefinal dense mesh;

2. memory and bandwidth savings by encoding most detail as one-dimensional displacementsrather than three-dimensional vectors;

12

3. support of refinement level on the fly;

4. customization of archetypes: we can model different 3D models with the same coarsemesh, changing only the displacement map;

5. support of adaptive tessellation: evaluation does not haveto be on a uniform grid.

The disadvantages of Surface Modeling is that modern GPUs cannot render such surface directly.

The surface must be converted into triangles or quads through a process of tessellation and

evaluation. Therefore, Surface Modeling becomes more attractive as a real-time technique only if

the conversion is more cheaply than the cost of reading and animating a high-polygon mesh. Our

goal is to design such a scheme on the GPU.

1.2 Problem Statement

Meshes consist of pure quadrilateral facets are common in modeling for animation. Any

polyhedral mesh can be converted into such a quad mesh by one step of mesh refinement. But a

good designer creates meshes with the quad-restriction in mind so that no global refinement is

necessary. We therefore focus on quadrilateral meshes and aim to derive a set of efficient rules

directly on the GPU (Figure1-2, the red dotted rectangle) that produce surfaces with good visual

quality. Specifically the resulting surfaces should

1. generate a small number of low degree polynomials;

2. possess smooth geometry (no extra cost for smooth shading);

3. closely approximate Catmull-Clark surfaces (a standard modeling tool);

4. are water-tight (no pixel drops out);

5. map well to the graphics pipeline and leverage the strengthsof GPU computation.

13

1.3 Modern GPU Pipeline and Current Trends

A graphics processing unit (GPU) is a dedicated graphics rendering device. Its SIMD

architecture has evolved substantially over the last decade. This highly parallel structure makes

it more effective than general-purpose CPUs for a range of algorithms. Modern GPUs expose a

programmable parallel stream processing pipeline as a series of short programs called shaders.

During the last five years, major graphics software libraries such as OpenGL and DirectX are

used to program the GPU via shaders on a programmable pipeline, which has mostly superseded

the older ”fixed-function pipeline”. The two most popular graphics software libraries, DirectX

and OpenGL, currently both specify APIs for three types of shaders: vertex, geometry, and pixel

shader. The shaders in DirectX10 system [4](Figure1-4 ) share a common core that accesses

up to 128 memory buffers and 16 parameter (constant) buffers. Vertex and pixel shaders use a

”one-in, one-out” data processing model. In contrast, the geometry shader has a limited ability to

amplify or reduce primitive count and thus is able to change meshes. Figure1-3 shows the input

Figure 1-3. The input and output of each pipeline stage in DirectX 10 system

and output of each pipeline stage. The more detailed explanation of each stage is as follows:

14

Figure 1-4. DirectX 10 Pipeline

1. The Input Assembler (IA) gathers vertex data to set up vertexand index buffers. Vertexbuffers contain per-vertex data while index buffers define geometry primitives as integerindices into vertex buffers. Indexing helps avoid redundant computations of the samevertex.

2. The vertex shader (VS) typically processes vertex-based operations such as changing theposition and normal of a single vertex. The computations in this stage are local. Eachvertex only has its own information and does not communicatewith other vertices. The VSis most commonly used to transform vertices from object space to clip space.

3. The geometry shader (GS) processes the vertices of a single primitive. A primitive can bea point, a line segment, a triangle, a point with adjacency, aline segment with adjacency,and a triangle with adjacency (Figure1-5). Due to the availability of the primitive verticesup to 6 vertices for a triangle with adjacency), the computations in the stage are lesslocal than those on the VS and PS. The GS can emit additional primitives. This newamplification feature, introduced in DirectX10, adds more flexibility and makes a numberof algorithms [1] possible to be implemented on the GPU, such as mesh refinement,shadow volumes, dynamic particle systems, etc. The geometry shader output may be fed tothe rasterizer stage and/or to a vertex buffer in memory via the stream output stage.

15

4. The rasterizer (TR) is a fixed-function stage generating fragments by filling in the poly-gons sent through the graphics pipeline. Clipping, culling, perspective divide, viewporttransform, primitive set-up, scissoring, depth offset also happen in the stage.

5. The pixel shader (PS) operates on one fragment at a time. Usually scene lighting andpixel-related effects such as bump mapping and color tone mapping occur in the PS.

6. The output merger (OM) takes a fragment from PS and performs traditional stencil anddepth testing operations as well as render target blending to generate a final pixel on thescreen.

Figure 1-5. The six primitives used inGS

The future GPU pipeline [29, 48] is expected to provide a Tessellation Unit, combined with

new shader stages for patch conversion and evaluation of tessellated high-order surfaces. The

Tessllator provides a solution to adaptive refinement on thegraphics hardware. Based on user-

provided tessellation factors per edge, the tessellator adaptively creates a sampling pattern of

the underlying parametric domain and automatically generates a set of parametric domains. In

addition, two special shaders are introduced to the next-generation GPU pipeline. The patch

shader converts an input mesh to a set of patches. The evaluation shader takes the(u, v) output of

the tessellator and evaluates the patch at(u, v). This future GPU architecture also allows the GPU

to exploit more parallelism because multiple arithmetic units can be running the same evaluation

shader. Moreover tessellation occurs on the GPU and overcomes the bottleneck of bus bandwidth

caused by model complexity. The new GPU design indicates Surface Modeling is the trend for

real-time graphics.

16

1.4 Representations in Surface Modeling

In Computer Graphics, surfaces are represented by polyhedral meshes. A polyhedral

mesh is a collection of vertices, edges and facets. The valence of the vertex is the number of its

incident edges. Each facet is an n-sided polygon. In a triangular (or quadrilateral) mesh, n equals

3 (or 4 respectively). An arbitrary mesh has n-sided polygons where the value of n is arbitrary.

The difference between Regular and Irregular Vertices are explained in Figure1-6. Figure1-7

illustrates three possible types of a facet.

Figure 1-6. Tri- and Quadrilateral meshes and facet types 1,2,3.

Figure 1-7. The three possible configurations. Type-1 Quad is regular. Type-2 or 3 is irregular.

Parametric patches and subdivision surfaces are major tools for modeling freeform surfaces

with arbitrary topology. A more intuitive way for inexperienced users to create shape by drawing

curves, or sketch is also available [22, 36]

1.4.1 Subdivision Surfaces

Subdivision surfaces, as part of standard modeling packages (e.g., 3DMax, Maya, Soft-

image, Mirai, Lightwave, etc.), have proven to be a useful modeling tool. Subdivision schemes

were first introduced by [10, 12, 31]. They generate a smooth surface through mesh refinement

17

process. This method begins with a coarse mesh that approximates a 3d model, known as a

control mesh. Each vertex in the control mesh is called a control point. Control points influence

the shape of the limit surface. The mesh is refined after each subdivision step by inserting new

vertices into the mesh, refining existing point positions, and updating the connectivity. The

positions of the new vertices in the mesh are computed by the averaging rules that apply to the

positions of nearby old vertices. The averaging rules are different from scheme to scheme (see

a comparison in Figure1-9), and it is these rules that determine the properties of the surface.

The graphs that illustrates the rules are called stencils. The binary subdivision splits each edge

into 2 while ternary subdivision split each edge into 3. Usually each subdivision scheme has at

most three types of rules: vertex stencil, edge stencil, andface stencil. For example, the stencils

of Catmull-Clark subdivision is shown in Figure1-8. The refinement rules includes stencils

for smooth surface as well as special rules for creating sharped or semi-sharped features. Each

refinement step produces a denser mesh than the previous one.The limit subdivision surface is

the surface produced from this process after infinitely manytimes of refinements. In practical use

however, this algorithm is only applied a limited, and usually four, number of times.

Figure 1-8. The stencils used in Catmull-Clark subdivision. These stencils define the rules toderive the new vertices that lie on the old vertices, edges, and facets.

A realization of tessellation-on-the-fly for Loop subdivision surfaces was proposed in

[33]. Pulli [44] implemented Loop’s subdivision scheme with additions by Hoppe et al [19].

Bischoff [3] proposed a forward-differencing method that only requires a constant amount of

memory regardless of subdivision step. DeRose [13] generalized the infinitely sharp creases

of [19] to obtain semi-sharp creases. Hoppe [19] extended Loop’s scheme by introducing

18

Figure 1-9. Classification of common Subdivision Schemes.

subdivision rules that lead to a piecewise smooth surface with features such as creases, corners,

darts, and conical vertices.

Adaptive subdivision can dramatically speed up the performance because the level of

detail(LOD) is updated based on dynamic distance with the camera as well as the complexity

of each part of the model. Adaptive refinement is previously implemented using quad-tree data

structure [50]. Each level of the tree represents one refinement level of the mesh. However, it

is difficult to map the recursive non-uniform tree structureto parallel computation. Bunnell [9]

provides code for adaptive refinement. Even though this codewas optimized for an earlier

generation GPUs, this implementation adaptively renders the subdivision surfaces in real-time

on current hardware. Lai and Cheng [26] implemented adaptive Catmull-Clark subdivision. A

hardware architecture support for adaptive refinement is proposed by [5]

The implementation of subdivision surfaces on the GPU can beroughly categorized

into three groups: (I) recursive evaluation [9, 13, 28, 44, 46]; (II) direct evaluation [45, 47];

(III) pre-tabulated basis function composition [6, 7]. Recursive evaluation is the most intuitive

way, but not the most efficient approach. Stam [47] directly evaluates subdivision surfaces at

19

arbitrary parameter values. However, Stam’s method can notevaluate a mesh that contains

Type-3 quads. Moreover, the required projection of controlpoints into the eigen space is too

complex for large meshes on the GPU. The weakness of [6, 7, 9, 46] is not able to convert a mesh

with Type-3 quads either. To get rid of those quads usually means applying at least one Catmull-

Clark subdivision step on the CPU and four-fold data transfer to the GPU. In more detail, Shiue

implements recursive Catmull-Clark subdivision using several passes via the pixel shader, using

textures for storage and spiral-enumerated mesh fragmentsfor maximizing parallelism [46]. Bolz

tabulates the subdivision nodal functions up to a given density and linearly combine them in the

GPU [6, 7]. The number of nodal functions equals the number of the vertices of the input mesh.

One of the obvious advantages of subdivision surfaces is they can model surfaces of

arbitrary topological type. Also because of static refinement rule for each scheme subdivision

surfaces are easy to implement. Although subdivision surfaces have been known for nearly

twenty years, their use has been hindered in realtime applications such as games because

recursive refinement is neither memory efficient nor performance efficient. Multiple passes

are required to render a visually smooth surface. Moreover,approximately 4-fold of geometry

increase after each subdivision step causes heavy memory traffic on the bus between the CPU and

the GPU.

1.4.2 Parametric Patches

Since current and impending GPU configurations favor short explicit surface definitions

over recursively defined surfaces, the alternative Patch-based refinement has been advocated for

fast rendering. Parametric patches (short as PP) are rendered directly in terms of their polynomial

representations, as opposed to a collection of approximating facets. Generally speaking, PP

converts control meshes to a set of patches that are parametric piecewise polynomials. PP

schemes can conveniently fit into a 2-pass implementation onthe current graphics pipeline

(Figure1-10). The two rendering passes are combined to one pass in a future GPU pipeline

(Figure1-11) [48].

20

Figure 1-10. The animation, Displacement Mapping(DM) takeplace in VS of the first pass, andsecond pass respectively. The first pass converts the deformed control mesh to itsparametric patch representations. In the following pass, the details are added usingDM after the evaluation of the produced patches from previous pass.

The overall speed of a PP scheme is influenced by both the complexity of patches and the

number of patches. For shape measurements, a desired PP scheme ensures at leastG1 continuity

across the adjacent patches and is a close approximation of subdivision surfaces. One of the

biggest challenge is to ensure the smoothness everywhere over the patches. Peters explained how

to solve the vertex enclosure problem and geometric continuity in [39, 41].

GPU-based evaluation of trimmed NURBs surfaces is proposedin [16, 25]. Peters [40]

used an approximation to the limit surface of Doo-Sabin subdivision to get a quickly convergent

series of approximations to the volume of the enclosed subdivision surface. The difficult problem

of filling n-sided holes is recently solved by [21, 42]. Bajaj et al. [2] introduced A-patches

in tri-variate BB form with few free parameters to adjust theshape both locally and globally.

In [15], the free-form surface is represented in either NURBS formor as cubic triangular

Bezier patches An explicit spline representation of smooth free-form surfaces is to form the

basis of an interactive sculpting environment. In the spirit of the Tessllator, Boubekeur [8]

21

Figure 1-11. One possible pass on the future graphics rendering pipeline,

describes a generic refinement pattern for Surface Modeling(tessellation + displacement) on any

programmable GPU.

1.4.2.1 Bezier technique

The Bezier form is a parametric surface representation andwas first developed in 1972

by the French engineer Pierre Bezier. A comprehensive overview of the Bezier form can be

found in [43]. A Bezier patch is a defined by control points. A Bezier surface, as a set of Bezier

patches, are piecewise polynomials. They are visually intuitive and mathematically convenient

due to the following properties:

1. Affine invariance: Applying an affine transformation to a control mesh applies it to thecorresponding Bezier patch as well.

2. The convex hull property: A Bezier patch lies completely within the convex hull of itscontrol points, and therefore also completely within the bounding box of its control pointsin any given Cartesian coordinate system.

There are two types of Bezier patch:

22

A tensor product patch in Bezier form of degreem by n is defined as:

g(u, v) :=m∑

i=0

n∑

j=0

gij

(m

i

)

ui(1 − u)m−i

(n

j

)

vj(1 − v)n−j.

where(u, v) is a barycentric coordinate on the domain of[0, 1] × [0, 1].

A triangular Bezier patch of degree n is defined as:

b(s, t, w) :=∑

i+j+k=n

i,j,k≥0

bijk

n!

i!j!k!sitjwk.

where(s, t, w) are the barycentric coordinates on a triangle domain.

1.4.2.2 Related work

For quadrilateral input meshes, it is well known that Type-1quads can be converted into

degree 3 by 3 patches in tensor-product Bezier form by the standard B-spline to Bezier conver-

sion rules [14]. Therefore, any two adjacent patches derived from ordinary quads will joinC2.

The interesting aspect is the conversion of Type-2 and Type-3 quads. A number of techniques(see

a comparison in Figure1-12) exist to smooth out quad meshes. Peters [38] generates NURBS

output, that could be rendered, for example by the GPU algorithm of [17]. But this has not been

implemented. The method of [30] generates one bicubic patch per quad following the shape of

Catmull-Clark surfaces. Since these bicubic patches typically do not join smoothly, Loop and

Schaefer compute two additional patches whose cross product approximates the normal of the

bicubic patch. As pointed out in [49], this trompe l’oeil represents a simple solution when true

smoothness is not needed. Comparing the number of operations in construction and evaluation,

the method of [30] should run at comparable speeds to our GPU quad mesh smoothing. Our

method [37] designs a c-patch for converting an irregular quad. The resulting c-patches form a

G1 surface. The alternative algorithm proposed by [35] uses a bi-5 Bezier patch for each irregular

quad.

23

Figure 1-12. This figure compares existing PP schemes in terms of how well they meet theperformance and shape measurements. geom=geometry patches, tan=tangentpatches.

24

CHAPTER 2A NEW SCHEME FOR SURFACE CONSTRUCTION

2.1 Contribution

This thesis proposes a set of rules for converting a quadrilateral mesh to a surface consist-

ing of bi-cubic splines wherever possible. Each irregular quad (Figure1-7) is converted to a novel

C1 surface patch (shortc-patch). The surface closely mimics the shape of the Catmull-Clarksub-

division surface and is constructed entirely by local parallel operations on the GPU. The resulting

surface is piecewise polynomial and has well-defined normals everywhere. The evaluation avoids

pixel dropout.

A c-patch is aC1 piecewise polynomial patch with cubic boundary. It is defined by 24

coefficients whose instantiation for a smooth surface is given in Section xxx below and indicated

in Figure2-1. A c-patch has an alternative representation as four triangular, total degree 4 patches

in Bernstein-Bezier form (Figure2-5 right).

Figure 2-1. The c-patch coefficients. Fori = 0, 1, 2, 3, the boundary coefficientsvi andeij

defined by vertex neighborhoods(figure2-4 specifies the formulas). The interiorcoefficientsbi

211, bi121, bi

112 (figure2-6), wherei = 0..3, j = 0..ni, andni is thevalence ofvi.

2.2 The Conversion Algorithm

Here we give the detailed algorithm for converting the quad mesh into coefficients that

define a smooth surface of low degree. Essentially, the conversion from a mesh to a patch

25

Figure 2-2. Smoothing the vertex neighborhood according toFigure2-4. The center pointp∗, itsdirect neighborsp2j and diagonal neighborsp2j+1 form a vertex neighborhood,j = 0..n − 1.

Figure 2-3. a) A quad neighborhood defining a surface piece. b) A bicubic patch with4 × 4control points. This patch is the output if the quad is regular, and used to determinethe shape of ac-patchc) if the quad is irregular. A c-patch is defined by4 × 6 controlpoints displayed as• and can alternatively, for analysis, be represented as fourC1-connected triangular pieces of degree 4 with degree 3 outerboundaries identicalto the bicubic patch boundaries.

26

consists of computing new points near a vertex using the knowledge of thevertex neighborhood.

A vertex neighborhoodconsists of a mesh pointp∗ and mesh pointspk, k = 0, . . . , 2N − 1 of

all quads surroundingp∗ (Figure2-2). the union of the fourvertex neighborhoodsis a thequad

neighborhood(Figure2-3, A.) that defines a patch. In our scheme, the patch is either a tensor

product bi-cubic Bezier patch, or a c-patch.

2.2.1 The Conversion Rules for a Type-1 Quad

Recall that a quad is Type-1 if all four vertices have 4 neighbors. Type-1 quads are

considered regular in the literature. Such a facet will be converted into a degree 3 by 3 patch in

tensor-product Bezier form by the standard B-spline to Bezier conversion rules [14]. Therefore,

any two adjacent patches derived from Type-1 quads will joinC2. Figure2-3 illustrates the

derivation process from a quad to a Bi-cubic Bezier patch. The conversion rules are shown in

Figure2-4.

Figure 2-4.Computing control pointsv, e, f andt, the projection ofe, at a vertex of valenceN from the meshpointspj of a vertex neighborhood; the subscripts are modulo2N . By default,

σN :=(

cN + 5 +√

(cN + 9)(cN + 1))

/16, the subdominant eigenvalue of Catmull-Clark

subdivision.

A vertexv computed according to Figure2-4 is the limit point of Catmull-Clark sub-

division as explained, for example, in [18]. The rules forej andfj are the standard rules for

converting a uniform bicubic tensor-product B-spline to its Bezier representation. The points

tj are a projection ofej into a common tangent plane (see e.g. [15]). The default scale fac-

tor σ is the subdominant eigenvalue of Catmull-Clark subdivision. We note that forN = 4,

ej+2 = 2v − ej andσ = 1/2 so that the projection leaves the tangent control points invariant as

27

tj = ej :

for N = 4, tj = v +2

4(ej − ej+2) = v + (ej − v) = ej. (2–1)

In the next stage, we combine information from four vertex neighborhoods, as shown in Figure

2-5, to populate a tensor-product patchg of degree 3 by 3 in Bezier form [14]:

g(u, v) :=3∑

k=0

3∑

ℓ=0

gkℓ

(3

k

)

uk(1 − u)3−k

(3

ℓ

)

vℓ(1 − v)3−ℓ.

The patch is defined by its 16 control pointsgkℓ. The formulas of Figure2-4make this patch the

Bezier representation of a bicubic spline in B-spline form. For example, in the notation of Figure

2-5, (gk0)k=0,..3 = (v0, t00, t11, v

1).

Figure 2-5.Patch construction. On the left, four vertex neighborhoodswith verticesvi each contribute one sectorto assemble the4 × 4 coefficients of the Bezier patchg, for exampleg00 = v0, g10 = e0

0, g11 = f0,

g30 = v1, g31 = e1

0(we use superscripts to indicate vertices). On the right, the same four sectors are

used to determine a c-patch if the underlying quad is extraordinary. The indices of the control points ofg andbi are shown.Note that only a subset of the coefficients of the four triangular piecesbi isactually computed to define the c-patch.The full set of coefficients displayed here is only used toanalyze the construction. The indexing of 15 coefficients ofa quartic triangular patch is shown on theright. We use this labeling throughout the dissertation.

28

2.2.2 The Conversion Rules for a Type-2, or Type-3 Quad

Type-2 and Type-3 quads are known as irregular. The irregular quads have at least one and

possibly up to four vertices with valence other than 4. For each irregular quad, the conversion

involves two steps:

1. Apply regular rules defined in Figure2-4 to generatevi andeij shown in Figure2-1 left.

2. Then apply rules in Figure2-6 to yield bi211, b

i121, b

i112 shown in Figure2-1 right.

We use the bicubic patch to outline the shape as we replace it by a c-patch (Figure2-3, c). A

c-patch has the right degrees of freedom to cheaply and locally construct a smooth surface. We

introduce the c-patch in terms of a well-known Bezier form of a polynomial piecebi of total

degree 4 [14]:

bi(u1, u2) :=∑

k+ℓ+m=4k,ℓ,m≥0

bikℓm

4!

k!ℓ!m!uk

1uℓ2(1 − u1 − u2)

m. (2–2)

The c-patch is equivalent to the union of fourbi, i = 0, 1, 2, 3 of total degree 4, but defined by

only 4 × 6 c-coefficients constructed in Figures2-4and2-6:

vi, ti0, ti1, b

i211, b

i121, b

i112, i = 0, 1, 2, 3.

These 24 c-coefficients imply the missing interior control points of the representation (2–2) by

C1 continuity between the triangular pieces: forj = 0, 1, 2, 3 andi = 0, 1, 2, 3,

bi3−j,0,1+j = bi−1

0,3−j,1+j := (bi3−j,1,j + bi−1

1,3−j,j)/2; (2–3)

and the boundary control pointsbikℓ0 are implied by degree-raising [14]:

bi400 := vi, bi

310 := (vi + 3ti0)/4, bi220 := (ti0 + ti+1

1 )/2,

bi130 := (vi+1 + 3ti+1

1 )/4, bi040 := vi+1. (2–4)

For all objects with boundaries, the boundary rules are simply the derivation of cubic Bezier

curves defined by(vi, ti0, ti+11 , vi+1). Basis functions corresponding to the 24 c-coefficients of the

29

Figure 2-6.Formulas for the4 × 3 interior control points that, together with the vertex control pointsvi and thetangent control pointstij , define ac-patch. See also Figures2-11and2-12. Herec

i := cos 2πNi

,

si := sin 2π

Ni

and superscripts are modulo 4. By default,g∗ := (∑

3

i=0vi + 3(ei

0+ ei

1) + 9f i)/64, the

central point of the ordinary patch.

c-patch can be read off by setting one c-coefficient to one andall others to zero and then applying

(2–3) and (2–4).

2.3 Derivation of the coefficients of a c-patch

When a c-patch sector b meets a c-patch sector a (Figure2-12), the following equation

must hold to preserveG1 continuity across the boundary between b and a,

λ(u)∂1b(u, 0) = ∂2b(u, 0) + ∂1a(0, u), (2–5)

where, with· denoting the scalar, respectively three scalar products for the vectors,

λ(u) := (λ0, λ1) · (u, 1 − u)

∂1b(u, 0) := 3(U0, 2U1, U2) · (u2, u(1 − u), (1 − u)2)

∂2b(u, 0) := 4(v0, 3v1, 3v2, v3) · (u3, u2(1 − u), u(1 − u)2, (1 − u)3)

∂1a(0, u) := 4(w0, 3w1, 3w2, w3) · (u3, u2(1 − u), u(1 − u)2, (1 − u)3)

30

Equation (2–5) can be rewritten in a collection of the following simplifiedforms in terms of

Ui, vi, wi.

3λ0U0 = 4v0 + 4w0 (2–6)

6λ0U1 + 3λ1U0 = 12(v1 + w1) (2–7)

3λ0U2 + 6λ1U1 = 12(v2 + w2) (2–8)

3λ1U2 = 4v3 + 4w3 (2–9)

2.3.1 Derivation ofλ0 and λ1

The scalarλ0 is derived from (2–6). (2–9) sets the constraint forλ1.

Let U0 := (1, 0), V0 := (cos 2πn0

, sin 2πn0

), andW0 := (cos 2πn0

,− sin 2πn0

). (Figure2-7)

We knowu0 = 34U0, u3 = 3

4U2 from degree raising.

v0 + w0 =1

2(3

4V0 +

3

4U0) +

1

2(3

4W0 +

3

4U0)

=3

4(1 + cos 2π

n0

2,sin 2π

n0

2) +

3

4(1 + cos 2π

n0

2,−

sin 2πn0

2)

=3

4(1 + cos

2π

n0

, 0)

=3

4(1 + cos

2π

n0)U0 (2–10)

Hence,4(v0 + w0) = 3(1 + cos 2πn0

)U0

λ0 = (1 + cos 2πn0

).

Similarly, becauseV3 = (1 − cos 2πn1

, sin 2πn1

) andW3 = (1 − cos 2πn1

,− sin 2πn1

),

4(v3 + w3) = 3(1 − cos2π

n1)U2 (2–11)

Hence,λ1 = (1 − cos 2πn1

).

2.3.2 Derivation ofb211 and b121

To derive the formulas forbi211 and its symmetric counterpartbi

121 note that the formulas

must guarantee a smooth transition betweenbi and its neighbor patch on an adjacent quad,

31

Figure 2-7. The re-parameterization ofλ to meetG1 at the vertex

regardless whether the adjacent quad is regular or irregular. That is, the formulas are derived to

satisfysimultaneouslytwo types of smoothness constraints (see Section2.4). From Equation

Ghost patch

Triangular patches

Figure 2-8. Coefficientsb211 andb121 of c-patch is derived on top of a ghost patch.

(2–7), we obtain

b211 + a211 =1

2λ0U1 +

1

4λ1U0 + 2b310 (2–12)

To get a second constraint and determineb211 uniquely, we consider the valuesb∗211 anda∗211 if

each ghost patch in terms ofsin averages (Figure2-8):

4s0(b211 − b310) + 4s1(b211 − b220) = 3(b11 − b10) yields

b211 =4s0b310 + 4s1b220 + 3(f 0

0 − t00)

4(s0 + s1)(2–13)

32

Similarly,

a211 =4s0b310 + 4s1b220 + 3(f 0

n0−1 − t00)

4(s0 + s1)(2–14)

Therefore,

b211 − a211 =3(f 0

0 − e00)

2(s0 + s1)(2–15)

Together with Equation (2–12),

b211 = b310 +1

4λ0(t

11 − t00) +

1

8λ1(t

00 − v0) +

3(f 00 − e0

0)

4(s0 + s1)(2–16)

Equation (2–8) implies

b121 + a121 =1

4λ0U2 +

1

2λ1U1 + 2b130 (2–17)

Using the similar approach as derivingb211, we yield4s0(b121 − b220) + 4s1(b121 − b130) =

3(b21 − b20) yields

b121 =4s1b130 + 4s0b220 + 3(f 1

0 − t11)

4(s0 + s1)(2–18)

Similarly,

a121 =4s1b130 + 4s0b220 + 3(f 1

1 − t11)

4(s0 + s1)(2–19)

(2–18) and (2–19) ⇒

b121 − a121 =3(f 1

0 − e11)

2(s0 + s1)(2–20)

(2–18) and (2–20) ⇒

b121 = b130 +1

8λ0(v1 − t11) +

1

4λ1(t

11 − t00) +

3(f 10 − e1

1)

4(s0 + s1)(2–21)

The formulas (2–21) and (2–21) are the same as shown in Figure2-6.

2.3.3 Derivation ofb112

By contrast,bi112 is not pinned down by continuity constraints. We could choose eachbi

112

arbitrarily without changing the formal smoothness of the resulting surface. However, we opt

for increased smoothness at the center of the c-patch and additionally use the freedom to closely

mimic the shape of Catmull-Clark subdivision surfaces, as we did earlier for vertices. First, we

33

approximately satisfy fourC2 constraints across the diagonal boundaries at the central point b004

(Figure2-9) by enforcing

1 −1 0 0

0 1 −1 0

0 0 1 −1

−1 0 0 1

b0112

b1112

b2112

b3112

=1

2

b0211 − b1

121 − q

b1211 − b2

121 − q

b2211 − b3

121 − q

b3211 − b0

121 − q

, (2–22)

whereq := 14

∑3i=0(b

i211 − bi

121). The perturbation byq is necessary, since the coefficient matrix

of theC2 constraints is rank deficient. After perturbation, the system can be solved with the

last equation implied by the first three. We add the constraint that the average ofbi112 matches

g∗ := g(12, 1

2), the center position of the bicubic patch.

Figure 2-9. Dark lines cover the control points involved in theC2 constraints (2–22). The pointson dashed lines are implied by averaging.

1 −1 0 0

0 1 −1 0

0 0 1 −1

1 1 1 1

b0112

b1112

b2112

b3112

=1

2

b0211 − b1

121 − q

b1211 − b2

121 − q

b2211 − b3

121 − q

8g∗

g∗ lies on the Bicubic patch atu = 0.5 andv = 0.5. The Bicubic control points are given

except interior 4 points, because all the control points on the boundaries are calculated. We can

34

use a mask of determining Bezier control points from a uniform bicubic B-spline surface. Figure

2-10(a) is a mask forb11. For other interior points, we can use a symmetric mask.

Figure 2-10. The center of a bi-cubic patch can be evaluated by the linear combination of theboundary coefficients.

Figure2-10(b) shows a mask for the evaluation of Bicubic patch at(0.5, 0.5).

g∗ =1

64(b00 + 3b01 + 3b02 + b03 + 3b10 + 9b11 + 9b12 + 3b13

+3b20 + 9b21 + 9b22 + 3b23 + b30 + 3b31 + 3b32 + b33)

Now, we can solve for thebi112, i = 0, 1, 2, 3 and obtain the formula of Figure2-6.

2.4 Smoothness Verification

In this section we formally verify the following lemma. For the purpose of the proof, we

view the c-patch in its equivalent representation as four B´ezier patches of total degree 4.

Lemma 1. Two adjacent polynomial piecesa andb defined by the rules of Section2.2(Figure

2-4, Figure2-6, (2–3), (2–4)) meet at least

(i) C2 if a andb correspond to two regular quads;

(ii) C1 if a andb are adjacent pieces of a c-patch;

(iii) C1 if a andb correspond to two quads, exactly one of which is regular;

(iv) with tangent continuity ifa andb correspond to two different irregular quads;

Proof. (i) If a andb are bicubic patches corresponding to regular quads, they are part of a

bicubic spline with uniform knots and therefore meetC2. (ii) If a andb are adjacent pieces of a

c-patch then Equations (2–3) enforceC1 continuity.

35

For the remaining cases, letb be a triangular piece. Letu the parameter corresponding to

the quad edge betweenb400 = v0, whereu = 0 and the valence isN0 andb040 = v1 whereu = 1

and the valence isN1 (Figures2-11for (iii) and 2-12for case (iv)). By construction, the common

boundaryb(u, 0) = a(0, u) is a curve of degree 3 with Bezier control points(v0, t00, t11, v

1) so that

bicubic patches on regular quads and triangular patches on irregular quads match up exactly.

Denote by∂1b the partial derivative ofb along the common boundary and by∂2b the par-

tial derivative in the other variable. Sinceb(u, 0) = a(0, u), we have∂1b(u, 0) = ∂2a(0, u). The

partial derivative in the other variable ofa is ∂2a. We will verify that the following conditions

hold, that imply tangent continuity:

if one quad is ordinary (case (iii)),

∂1b(u, 0) = 2∂2b(u, 0) + ∂1a(0, u); (2–23)

if both quads are extraordinary (case (iv)),

((1 − u)λ0 + uλ1

)∂1b(u, 0) = ∂2b(u, 0) + ∂1a(0, u), (2–24)

whereλ0 := 1 + c0, λ1 := 1 − c

1, andci := cos(

2π

Ni

).

Both equations, (2–23) and (2–24), equate vector-valued polynomials of degree 3 (we write

∂1b(u, 0) in degree-raised form [14]). The equations hold, if and only if all Bezier coefficients

are equal. Off hand, this means checking four vector-valuedequations for each of (2–23) and

(2–24). However, in both cases, the setup is symmetric with respect to reversal of the direction in

which the boundaryb(u, 0) is traversed. That means, we need only check the first two equations

(2–23’) and (2–23”) of (2–23) and the first two equations (2–24’) and (2–24”) of (2–24). We

verify these equations by inserting the formulas of Figures2-4 and2-6.

To verify (2–23), the key observation is thatN0 = N1 = 4 if one quad is ordinary. Hence

c0 = c

1 = 0 ands0 = s

1 = 1 (cf. Figure2-6) andtij = eij . Therefore, for example (cf. Figure

36

Figure 2-11.C1 transition between a triangular and a bicubic patch.

2-11)

2∂2b(0, 0) = 2 · 4(b301 − v0) = 83

4(e00 + e0

1

2− v0)

= 3(e00 + e0

1) − 6v0,

where the factor34

stems from raising the degree from 3 to 4; and the second Bezier coefficient of

∂1b(u, 0) (in degree-raised form) and of2∂2b(u, 0) are respectively (cf. Figure2-11)

3(e0

0 − v0) + 2(e11 − e0

0)

3and

2 · 4(b211 − b310) = 8(e11 − e0

0

4+

e00 − v0

8+ 3

f 0 − e00

8).

Then, comparing the first two Bezier coefficients of∂1b(u, 0) and2∂2b(u, 0) + ∂1a(0, u) yields

equality and establishesC1 continuity:

3(e00 − v0)

︸︷︷︸

∂1b(0,0)

= 3(e00 + e0

1) − 6v0

︸︷︷︸

2∂2b(0,0)

−3(e01 − v0)

︸︷︷︸

∂1a(0,0)

(′)

(e00 − v0) + 2(e1

1 − e00) = 2(e1

1 − e00) + (e0

0 − v0) + 3(f 0 − e00)

− 3(f 0 − e00). (′′)

The equations for (2–24) are similar, except that we need to replaceej by tj and keep in

mind that, by definition,

(t0n0−1 − v0) + (t01 − v0) = 2c0(t00 − v0).

37

Figure 2-12.G1 transition between two triangular patches.

Hence, for example,

∂2b(0, 0) + ∂1a(0, 0) = 4(b301 − v0 + a301 − v0)

=3

44 · 2c

0(t00 − v0).

The first of the four coefficient equations of (2–24) then simplifies to

3(1 + c0)(t00 − v0) = 4(b301 + a301 − 2v0)

= 3(t01 + t00

2− v0 +

tN0−11 + t00

2− v0)

= 31

2(2c

0(t00 − v0) + 2(t00 − v0)). (′)

Noting that terms(f0 − e00)/(8(s0 + s

1)) in the expansions ofb211 anda211 cancel, the second

coefficient equation is

6λ0(t11 − t00) + 3λ1(t

00 − v0) = 12(b211 + a211 − 2b310)

=12 · 2(1 + c

0)

4(t11 − t00) +

12 · 2(1 − c1)

8(t00 − v0). (′′)

It is easy to read off that the equalities hold. So the claim ofsmoothness is verified.

38

2.5 Complexity Analysis

2.5.1 Number of Patches

The conversion scheme yields the minimum set of patches because (1) no initial refinement

for input coarse mesh is needed; (2) each quadrilateral facet of the coarse mesh corresponds to

only one patch. Namely, the total number of patches equals tothe number of facets in the mesh.

The patch complexity of various schemes are compared in Figure1-12.

The low cost of construction and evaluation makes c-patchesan attractive representation,

not just on the GPU

2.5.2 Cost of Patch Construction

The separation into vertex and patch construction means that the number of scaled vertex

additions (adds) per patch is independent of the valence. The cost of computing the control points

per patch, (i.e.), with the cost of vertex computations distributed,is 4 × (4 + 1 + 1 + 2) = 32

adds per bicubic construction and computingtj from t0 andt1 and determiningbi211, bi

121 and

bi112 according to Figure2-6amounts to an additional4 × (2 + 6 + 6 + 12) = 104 adds per

c-patch. Each c-patch has 24 coefficients. This compares favorably to, say [30] where 16+12+12

coefficients are generated.

2.5.3 Cost of Surface Evaluation

The patch can be evaluated at any parametric domain(u, v) using de Casteljau’s algorithm.

A tensor product Bi-cubic Bezier patch is defined by 16 control points. The evaluation at

(u, v) needs 42 vector-vector additions, 42 scaler-vector multiplications, and 42 scaler-scaler

operations. Similarly the evaluation of a c-patch at(u, v) requires 40 vector-vector additions and

60 scaler-vector multiplications. In terms of evaluation cost, a c-patch has roughly the same cost

as a bicubic patch does.

39

2.6 Approximation Catmull-Clark Subdivision Surface

Since Catmull-Clark subdivision is a standard modeling tool, our scheme is designed to

approximate Catmull-Clark Subdivision Surface. In fact, the resulting Bi-cubic patches com-

pletely agree with the Catmull-Clark Subdivision Surface except in the immediate neighborhood

of irregular mesh vertices. In such a neighborhood they joinat least with tangent continuity and

interpolate the limit of the irregular mesh vertex. Furthermore, the center of c-patch interpolates

the center point of the correspondent Catmull-Clark limit surface due to the choice of the c-patch

coefficientb112.

2.7 Water-Tight Surface Verification

Patches are evaluated independently. If the generated vertices along the boundary from the

adjacent patches do not match exactly, the refined mesh will have a hole in it. There are three

configurations for adjacent patches: (1) both are Bi-3 patches, (3) both are c-patches , (2) one of

them is Bi-3 patch.

The coefficients defining the shared boundary curve are derived by the averaging rules

defined in Figure2-4. Since additions are commutative, the generation of all boundary coef-

ficients are independent of the evaluation of the choice of patch. In other words, no round off

error and cracking are possible for the first case. The boundary coefficients of a c-patch are com-

puted by the same rules in Figure2-4, therefore water-tightness are also achieved for the lateral

two cases. Note that computation of the cubic boundaries shared by a bicubic and a c-patch is

mathematically identical.

2.8 Discussion

The introduction of triangular patches to model quad patches is somewhat unconventional,

but has been used in an I3D paper before [15]. Also [49] is based on triangular patches.

Evaluation and normal computation of degree 4 triangular patches is comparable in cost to

40

tensor-product bicubic patches: in the triangular case we have to average 15 control points, in the

tensor-product case 16. Triangular patches may deserve more attention in OpenGL.

41

CHAPTER 3GPU IMPLEMENTATION

3.1 Overview

We implemented the conversion scheme using C++ on DirectX 10pipeline. We compute

vertex neighborhoods according to Figure2-4 in the vertex shader and use the geometry shader

primitive triangle with adjacencyto accumulate the coefficients of the bicubic patch or compute

a c-patch according to Figure2-6. We implemented conversion plus rendering in two variants:a

1-pass and a 2-pass scheme.

3.2 2-pass Approach

Figure 3-1.2-pass implementation detailed in Figure3-2. The first pass converts, the second renders. Note that thegeometry shader only computes at most 24 coefficients per patch and does not create (amplify to)evaluation point primitives.

42

Figure 3-2. 2-Pass conversion: VS=vertex shader, GS=geometry shader, PS=pixel shader. VS Outof Pass 1 outputsN pointsfj for one vertex (hence the subscript) and GS In of Pass 1retrieves four pointsf i, each generated by a different vertex of the quad (hence thesuperscript).

The2-pass implementationconstructs the patches in the first pass using the vertex shader

and the geometry shader and evaluates positions and normalsin the second pass. Pass 1 streams

out only the4 × 6 coefficients of a c-patch and not the4 ×(4+22

)Bezier control points of

the equivalent triangular pieces. The data amplification necessary to evaluate takes place by

instancing a(u, v)-grid on the vertex shader in thesecond pass. That is, wedo not stream back

large data sets after amplification. Position and normal are computed on the(u, v) domain[0..1]2

43

of the bicubic or of the c-patch (not on any triangular domains). We pre-tessellate the quad

domain, and store the results in a set of textures with different resolution. If a tessellation factor

is chosen to bem, the texture with(m + 1) by (m + 1) parametric values will be sent to the

vertex shader in the subsequent evaluation pass. Given the pre-tessellated domain with a patch

identifier, the vertex shader loads the appropriate controlpoints and evaluates the patch. Figure

3-2 lists the input, output and the computations of each pipeline stage. Figure3-1 illustrates this

association of computations and resources. In order to avoid pricy branching in HLSL(High

Level Shader Language) and optimize the performance, specialized shaders are actually written

for patch constructions and evaluation based on the patch type.

3.3 1-pass Approach

In the1-pass implementation, the evaluation immediately follows conversion in the

geometry shader, using the geometry shader’s ability toamplify, (i.e.), output multiple point

primitives for each facet (Figure3-4). While a 1-pass implementation sounds more efficient

than a 2-pass implementation, DX10 limits data amplification in the geometry shader so that the

maximal evaluation density is8 × 8 per quad. Moreover, maximal amplification in the geometry

shader slows the performance. We observed a minimum of25% better performance of the 2-pass

implementation. Figure3.3lists the data flow on the graphics pipeline.

3.4 Coordinate System Transformation

When we evaluate normal and position of an irregular quad at(u, v), we need first

transform the tessellated domain value from a Cartesian coordinate(u, v) to a barycentric

coordinate(s, t, w). Figure3-5 illustrates how to locate which of four triangles where(u, v)

lies on. In this way, we minimize number of comparisons and take care of the shared vertices.

We make(0.0, 0.0), (1.0, 0.0), (0.5, 0.5) only belong toT1, (1.0, 1.0) only belongs toT2, and

(0.0, 1.0) only belongs toT4.

44

Figure 3-3. 1-Pass conversion: VS=vertex shader, GS=geometry shader, PS=pixel shader. GSamplifies the geometry and evaluates the patches.

Figure 3-4.At present, the 1-pass conversion-and-rendering must place patch assembly and evaluation on thegeometry shader. This is not efficient.

45

u

v

(0.5,0.5)

(0.0,1.0) (1.0,1.0)

(1.0,0.0)(0.0,0.0)

T4

T3

T2

T1

Figure 3-5.(u, v) on an irregular quad.

3.5 Water-Tight Evaluation

The HLSL code in Figure3-6 shows that the same cubic curve is evaluated along the

boundary. An explicit if- statement in the evaluation guarantees the exact same ordering of

computations since boundary coefficients are only computedonce,

Figure 3-6. Water-tight Evaluation

46

3.6 Conclusion

The presented approach fits well into a GPU pipeline. In both approaches, we computev,

e, f andt using itsvertex neighborhoodand the rules in Figure2-4 in the vertex shader. Each

vertex has2n + 1 vertices in itsvertex neighborhood, wheren is the valence. This information

is stored in a texture. With a vertex ID and its valence, all vertices in its neighborhood can

be retrieved in counter-clockwised order. In the geometry shader, the patch is finalized and

assembled. Overall, the 2-pass implementation has better performance because of small stream-

out, short geometry shader code and minimal amplification onthe geometry shader.

47

CHAPTER 4RESULTS

4.1 Shape Quality

Our algorithm producesC1 surfaces and they closely approximate Catmull-Clark subdivi-

sion surfaces. We compare our algorithm with [30] on the closeness to Catmull-Clark surfaces.

We measure how the surface is close to Catmull-Clark surfaceby comparing both geometric dif-

ference and normal angle difference. Figure4-1compares the smoothed quad mesh surfaces with

densely refined Catmull-Clark subdivision surfaces based on the same mesh. Both geometric

distance, as percent of the local quad size, and normal distance, in degrees of variation, are com-

pared. Especially after displacement, large models rendered by subdivision and quad smoothing

appear visually indistinguishable. The relatively small examples, without displacement, shown

in Figure4-1 and the close up in Figure4-5 are also important to support our observation that

c-patches do not create shape problems compared to a single bicubic patch: despite the lower

degree and internalC1 join, their visual appearance is remarkably similar to thatof bicubic

patches. The comparison with ACC-patches [30] is shown in4-2. Figures4-3, 4-4 show the

generated smooth surface by our algorithm and the surface after applying displacement mapping

respectively.

Figure 4-1. Comparison between the Catmull-Clark (CC) subdivision limit surface and thesmoothed quad mesh surface for the same input.

48

Figure 4-2. Comparison of ACC-patch and C-patch in terms of approximation of Catmull-Clarksubdivision surfaces for the same input.

Figure 4-3. GPU smoothed quad surfaces: orange patches correspond to ordinary quads, bluepatches to extraordinary quads.

Figure 4-4. GPU smoothed quad surfaces with displacement mapping.

49

4.2 Performance

We compiled and executed the implementation on the latest graphics cards of both

major vendors under DirectX10 and tested the performance for several industry-sized models.

Two surface models and models with displacement mapping areshown in Figure4-3 and

4-4 respectively. Table 4 summarizes the performance of the 2-pass algorithm for different

granularities of evaluation. The frog model, in particular, provides a challenge due to the large

number of extraordinary patches. The Frog Party shown in Figure4-11currently renders at 50

fps for uniform evaluation for N=9, (i.e.), on a9 × 9 grid. That is, the implementation converts

1292 ∗ 9 quads, of which 59% are extraordinary, and renders of 1 million polygons 50 times per

second. On the same hardware, we measured Bunnell’s efficient implementation (distribution

accompanying [9]) featuring the single frog model, (i.e.), 1/9th of the workof the Frog Party,

running at 44 fps with three subdivisions (equivalent to tessellation factor N=9). That is,

Table 4-1. A a total degree 4 patch and a bicubic patch have thesame evaluation cost at(u, v) interms of ALU operations.

evaluation for a c-patch ALU vector opsposition 55normal 3other 1total 59evaluation for a bicubic patch ALU vector opsposition 56normal 3other 0total 59

Table 4-2. Frames per second for some standard test meshes with each patch evaluated on a gridof sizeN × N ; eqs= percentage of extraordinary quads. Sword and Frog are shownin Figure4-3, Head in Figure4-1.

Mesh Frames per second(verts,quads, eqs) N = 5 9 17 33Sword (140,138, 38%) 965 965 965 703Head (602,600, 100%) 637 557 376 165Frog (1308,1292, 59%) 483 392 226 87

50

Figure 4-5. Close-up of the frog. The refined mesh is water-tight.

Table 4-3. Performance of the 1-pass implementation.

Mesh Slower 1-pass implementationN = 2 5 8

Sword 389 96 43Head 108 34 15Frog 44 10 4

GPU smoothing of quad meshes is an order of magnitude faster.Compared to [46], the speed

up is even more dramatic. While the comparison is not among equals since both [46] and [9]

implement recursive Catmull-Clark subdivision, it is nevertheless fair to observe that the speedup

is at least partially due to our avoiding stream back after amplification (data explosion due to

refinement). We expect that more careful storage of vertex neighborhoods, in retrieving order,

will further improve our use of texture cache and thereby improve the frames per second (fps)

count.

4.3 Displacement Mapping

Displacement mapping is a technique for adding geometric details on the mesh with a

height map. It is different from Bump Mapping or Normal Mapping in the sense that it changes

the geometry by moving vertices often along their normal directions according to the value in the

51

height map. The change of real geometry, not just normal for instance in Bump Mapping, permits

self-occlusion. Figure4-6shows the displacement mapping on the frog model which consists of

330k facets. The size of height map is 1024 by 1024.

Figure 4-6. Displacement mapping on the frog model

In order to perturb normals after displacement mapping, we needDu andDv bump

mapping value. The equation to calculate new normals is as follows.

S = P + D ∗ n (4–1)

where, S is the displacement of the point P, D is the displacement and n is the normal of P. Then

the new normal is calculated by the cross product ofSu andSv.

Su = Pu + Du · n + D · nu (4–2)

Sv = Pv + Dv · n + D · nv (4–3)

Note thatnu andnv are the derivatives of the normalized normaln.

nu =n

′

u − n(n′

u · n)

||n||(4–4)

wheren′

u = Puu × Pv + Pu × Puv

4.4 Morphing and Animation

We implement morphing using the 2-pass approach. The animated sequence of the input

meshes in form of textures are fed into the Input Assembler ofthe first pass each frame. The

morphed patches are constructed during the first pass. Fine details are added in the second pass.

The screen shots in Figures4-9, 4-10, 4-11illustrate real time displacement and animation.

52

Figure 4-7. Comparison of the c-patch scheme with PN-Triangles(also called N-patch),ACC-patch, and Catmull-Clark subdivision

Figure 4-8. comparison of the c-patch scheme with PN-Triangles(also called N-patch),ACC-patches, and Catmull-Clark subdivision

53

Figure 4-9. Real time animation on the Sword model.

Figure 4-10. Real time animation on the Frog model.

Figure 4-11. Asynchronous animation of nine Frogs.

54

4.5 Conclusion

Smoothing quad meshes on the GPU offers an alternative to highly refined facet repre-

sentations transmitted to the GPU and is preferable for interactive graphics and integration with

complex morphing and displacement.

We advertised a 2-pass scheme, since, as we argued, the DX10 geometry shader is not

well suited for the data amplification for evaluation after conversion. The 1-pass scheme

outlined in Section3 may become more valuable with availability of a dedicated hardware

tesselator [29, 48]. Such a tesselator will make amplification more efficient and supportadaptive

tessellation(which is why we only discussed uniform tessellation in Section 3). Such a hardware

amplification will also benefit the 2-pass approach in that the (u, v) domain tessellation, fed into

the second pass will be replaced by the amplification unit.

55

CHAPTER 5PATCH CONVERSIONS FOR MESHES WITH TRI/QUAD/PENT FACETS

Our conversion algorithm can be generalized to work for arbitrary meshes. The generalized

algorithm [34] provides an elegant solution for meshes with Tri/Quad/Pent Facets. Removing

restrictions on vertex valences and allowing meshes with triangles, quadrilaterals, and pentagons

vastly simplifies a designer’s task and enriches the design space of meshes for smooth surfaces:

while quads naturally model the flow of (parallel) feature lines and are therefore the main facet

type in models, triangular facets allow merging lines whilepentagonal facets allow to starting

new lines (Figure5-1) – without creating T-corners or forcing refinement of intermediate models

to satisfy connectivity or quad-layout constraints. Essentially, designers can re-use the whole

range of polyhedral models they are used to. We modified the algorithm for converting quad

meshes to a generalized method for a mesh with Tr/Quad/Pent facets. The generalized scheme

converts such a polyhedral model to a surface with everywhere well-defined normal andC2 in

‘regular’ mesh regions with quad-grid connectivity. Figure 5-2shows an example of the resulting

surfaces. Note that the facets are limited to triangles, quads and pentagons due to current GPU

Figure 5-1. (a) Retaining the density of feature lines whilevarying their number. (b),(c) Axehandle detail using a triangle and a pentagon to transition between detailed andcoarser areas.

constraints and to avoid unnecessary notational, technical and shape complexity.

An irregular facet withk sides is converted into a k-patch. A k-patch is a generalization

of a c-patch. It is a piecewise degree 4C1 spline patch withk cubic boundaries. A k-patch

is defined by6k + 1 control points indicated as◦ in Figure5-3(b),(c). That is, the k-patch

corresponding to a triangular, quadrilateral or pentagonal facet is defined by a total of 19, 24 or

31 points respectively.

56

Figure 5-2. The generalized scheme converts a mesh with Tri/Quad/Pent Facets to a smoothsurface consisting of bi-cubic patches (yellow), k-patch withk = 3 (green), k = 4(red), andk = 5 (gray).

Figure 5-3. (a) An ordinary facet is converted to a bi-cubic patch with 16 control pointsgij.(b),(c) An extraordinary facet withk sides is converted to a k defined by6k + 1control points shown as◦. The k can be viewed ask C1-connected degree-4triangular patchesi, i = 0 . . . k−1 with cubic outer boundaries.

Figure 5-4. The triangularsectorsare listed in counter-clockwise order with a modulo-ksuperscript. (a) 14 control points from three consecutive sectors of a k-patch define(b) a single patch in triangular Bezier-form.

57

For evaluation, we can recover the polynomial representation of theith sector in triangular

-form of total-degree 4 (Figure5-3(b) and (c)),

S(u, v) :=∑

i+j+k=4

ijk4!

i!j!k!uivj(1 − u− v)k, (5–1)

where the(4+22

)BB-coefficientsijk ∈ R

3 are indexed as in Figure5-4. Specifically, we compute

the(4+22

)coefficientsijk (Figure5-4(b)) from the 14 coefficients labeled in5-4(a) by simple

averaging: degree-raising the coefficientsi3−l,l,0, l = 0, . . . , 3 to i4−ℓ,ℓ,0, ℓ = 0, . . . , 4

[i400, i310, i220, i130, i040] = [i300,i300+3i210

4, i210+i120

2, 3i120+i030

4, i030]

and computing the shared -coefficients on the sector boundaries i3−,0,1+ = i − 10,3−,1+,

= 0, 1, 2, 3, (i.e.), indices301, 202, 103 and004 in Figure5-4 (b), from the C1 constraints.

Read [34] for a thorough explanation of the algorithm and its GPU implementation,

smoothness verification, etc.

58

CHAPTER 6DISCUSSION AND FUTURE WORK

6.1 Future GPU API

Our conversion scheme not only fits well with the current graphics hardware pipeline,

but also matches very well with the architecture of the future graphics hardware[29, 48]. The

work load currently in the geometry shader will be assigned to the patch shader. The ideal GPU

pipeline needs to explore more parallelism in the geometry shader where 24 coefficients of a

c-patch can be computed independently given the vertex neighborhood. The maximal parallelism

makes the cost of deriving one coefficient roughly equals to the cost of constructing a whole

patch. Currently we precompute the tessellated domain and store these static values in a set

of textures. In the future, this part of computation will be replaced by the tessellation unit.

Animation using our conversion scheme will be achieved in a single pass without geometry

transmission between passes.

6.2 Volume Preservation

Preserving the volume under constraints can achieve a realistic deformable object ani-

mation. The well-known divergence theorem can be used to reduce a volume integral to an an

integral over the surface. Given a closed object, volume is matched to a prescribed value by

inflating or deflating the deformable object uniformly. For enhancing the realism, this method can

be further extended to fix parts of the object and attach different material properties to surface

pieces. This exact, localized volume preservation method works for all surfaces that consists

of Bezier patches. Therefore, we will combine this method with our new surface conversion

algorithm to achieve real-time volume preservation.

6.3 Adaptive Tessellation

The adaptive tessellation samples each surface patch more densely in regions of high

curvature and less densely in regions of low curvature. Moreover it adjusts the level of detail

according to how close the geometry is to the camera. The surface is only tested where and when

59

it’s necessary. Therefore, adaptive tessellated surface will greatly improve the performance. The

tessellation factor can be generated by using the flat test [9]. With the tessellation unit in the

GPU, the cost of tessellating the domain is almost free.

60

REFERENCES

[1] Microsoft DirectX10 SDK. 2008.http://www.microsoft.com/downloads/details.aspx?FamilyId=572BE8A6-263A-4424-A7FE-69CFF1A5B180displaylang=en.

[2] C. Bajaj, J. Chen, and G. Xu. Free form surface design with a-patches. InProceedings ofGraphics Interface 94, pages 174–181, Banff, Alberta, Canada, 1994.

[3] S. Bischoff, L. P. Kobbelt, and H. Seidel. Towards hardware implementation of loopsubdivision. InHWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICSworkshop on Graphics hardware, pages 41–50, New York, NY, USA, 2000. ACM Press.

[4] D. Blythe. The Direct3D 10 System. InProceedings of ACM SIGGRAPH 2006, pages724–734, 2006. http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/Direct3D10web.pdf.

[5] M. Bo, M. Amor, M. Doggert, J. Hirche, and W. Strasser. Hardware support for adaptivesubdivision surface rendering, 2001. citeseer.ist.psu.edu/article/boo01hardware.html.

[6] J. Bolz and P. Schroder. Rapid evaluation of Catmull-Clarksubdivision surfaces. InWeb3D’02: Proceeding of the seventh international conference on3D Web technology, pages11–17, New York, NY, USA, 2002. ACM Press.

[7] J. Bolz and P. Schroder. Evaluation of subdivision surfaces on programmable graphicshardware. 2007. http://www.multires.caltech.edu/pubs/GPUSubD.pdf.

[8] T. Boubekeur and C. Schlick. Generic mesh refinement on GPU. In HWWS ’05: Proceed-ings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages99–104, New York, NY, USA, 2005. ACM.

[9] M. Bunnell. GPU Gems 2: Programming Techniques for High-Performance Graphics andGeneral-Purpose Computation, chapter 7. Adaptive Tessellation of Subdivision Surfaceswith Displacement Mapping. Addison-Wesley, Reading, MA, 2005.

[10] E. Catmull and J. Clark. Recursively generated B-spline surfaces on arbitrary topologicalmeshes.Computer Aided Design, 10:350–355, 1978.

[11] R. L. Cook.Shade trees. ACM, New York, NY, USA, 1998.

[12] M. S. D. Doo. Behaviour of recursive division surfaces near extraordinary points.ComputerAided Design, 10:356–360, 1978.

[13] T. DeRose, M. Kass, and T. Truong. Subdivision surfaces in character animation. InSIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphics andinteractive techniques, pages 85–94, New York, NY, USA, 1998. ACM Press.

[14] G. Farin. Curves and surfaces for computer aided geometric design: a practical guide.Academic Press Professional, Inc., San Diego, CA, USA, 1988.

61

[15] C. Gonzalez and J. Peters. Localized hierarchy surface splines. In S. S. J. Rossignac, editor,ACM Symposium on Interactive 3D Graphics, pages 7–15, 1999.

[16] M. Guthe,A. Balazs, and R. Klein. GPU-based trimming and tessellation of NURBS andT-spline surfaces.ACM Transactions on Graphics, 24(3):1016–1023, 2005.

[17] M. Guthe, A. Balazs, and R. Klein. GPU-based trimming and tessellation of NURBS andT-spline surfaces.ACM Trans. Graph., 24(3):1016–1023, 2005.

[18] M. Halstead, M. Kass, and T. DeRose. Efficient, fair interpolation using Catmull-Clarksurfaces.Proceedings of SIGGRAPH 93, pages 35–44, Aug 1993.

[19] H. Hoppe, T. DeRose, T. Duchamp, M. Halstead, H. Jin, J. McDonald, J. Schweitzer, andW. Stuetzle. Piecewise smooth surface reconstruction.Computer Graphics, 28(AnnualConference Series):295–302, 1994.

[20] D. L. James and C. D. Twigg. Skinning mesh animations. InSIGGRAPH ’05: ACMSIGGRAPH 2005 Papers, pages 399–407, New York, NY, USA, 2005. ACM.

[21] K. Karciauskas and J. Peters. Guided subdivision, 2005.http://www.cise.ufl.edu/research/SurfLab/papers.shtml.

[22] O. A. Karpenko and J. F. Hughes. Smoothsketch: 3d free-form shapes from complexsketches.ACM Transactions on Graphics, 25/3:589–598, 2006.

[23] L. Kavan, C. O’Sullivan, and J. Zara. Efficient collision detection for spherical blendskinning. InProceedings of the 4th international conference on Computer graphics andinteractive techniques in Australasia and Southeast Asia table of contents, Kuala Lumpur,Malaysia, pages 147–156, 2006.

[24] L. Kavan and J.Zara. Spherical blend skinning: a real-time deformation of articulatedmodels. InI3D ’05: Proceedings of the 2005 symposium on Interactive 3Dgraphics andgames, pages 9–16, New York, NY, USA, 2005. ACM.

[25] A. Krishnamurthy, R. Khardekar, and S. McMains. Direct evaluation of nurbs curves andsurfaces on the GPU. InSPM ’07: Proceedings of the 2007 ACM symposium on Solid andphysical modeling, pages 329–334, New York, NY, USA, 2007. ACM.

[26] S. Lai and F. F. Cheng. Adaptive rendering of catmull-clark subdivision surfaces. InCAD-CG ’05: Proceedings of the Ninth International Conference on Computer Aided Design andComputer Graphics, pages 125–132, Washington, DC, USA, 2005. IEEE Computer Society.

[27] A. Lee, H. Moreton, and H. Hoppe. Displaced subdivision surfaces. In K. Akeley,editor,Siggraph 2000, Computer Graphics Proceedings, pages 85–94. ACM Press / ACMSIGGRAPH / Addison Wesley Longman, 2000. citeseer.ist.psu.edu/lee00displaced.html.

[28] A. Lee, H. Moreton, and H. Hoppe. Displaced subdivision surfaces. In K. Akeley, editor,Siggraph 2000, Computer Graphics Proceedings, Annual Conference Series, pages 85–94.ACM Press / ACM SIGGRAPH / Addison Wesley Longman, 2000.

62

[29] M. Lee. Next-generation graphics programming on xbox 360, 2006.http://download.microsoft.com/download/d/3/0/d30d58cd-87a2-41d5-bb53-baf560aa2373/Next GenerationGraphicsProgrammingon Xbox 360.ppt.

[30] C. Loop and S. Schaefer. Approximating Catmull-Clark subdivision surfaces with bicubicpatches. Technical report, Microsoft Research, MSR-TR-2007-44, 2007.

[31] C. T. Loop. Smooth subdivision surfaces based on triangles,1987. Master’s Thesis,Department of Mathematics, University of Utah.

[32] A. Mohr, L. Tokheim, and M. Gleicher. Direct manipulation ofinteractive character skins.In I3D ’03: Proceedings of the 2003 symposium on Interactive 3Dgraphics, pages 27–30,New York, NY, USA, 2003. ACM.

[33] K. Muller and S. Havemann. Subdivision surface tesselationon the fly using a versatilemesh data strucure, 2000. citeseer.ist.psu.edu/muller00subdivision.html.

[34] A. Myles, T. Ni, and J. Peters. GPU-friendly smooth surfacesfrom meshes withtri/quad/pent facets. InSymposium on Geometry Processing, July 2 - 4, 2008, Copen-hagen, Denmark, pages 1–8. Blackwell, 2008.

[35] A. Myles, Y. Yeo, and J. Peters. GPU conversion of quad meshesto smooth surfaces.In D. Manocha, B. Levy, and H. Suzuki, editors,ACM Solid and Physical ModelingSymposium, June 2 - 4, 2008,Stony Brook University, Stony Brook, New York, USA, pages321–326. ACM Press, 2008.

[36] A. Nealen, T. Igarashi, O. Sorkine, and M. Alexa. Fibermesh:designing freeform surfaceswith 3d curves.ACM Trans. Graph., 26(3), 2007.

[37] T. Ni, Y. Yeo, A. Myles, V. Goel, and J. Peters. GPU smoothing of quad meshes. InM. Spagnuolo, D. Cohen-Or, and X. Gu, editors,IEEE International Conference on ShapeModeling and Applications, June 4 - 6, 2008, Stony Brook University, Stony Brook, NewYork, USA, pages 3–10. ACM Press, 2008.

[38] J. Peters. Patching Catmull-Clark meshes. In K. Akeley, editor, Siggraph 2000, ComputerGraphics Proceedings, Annual Conference Series, pages 255–258. ACM Press / ACMSIGGRAPH / Addison Wesley Longman, 2000.

[39] J. Peters. Geometric continuity. InHandbook of Computer Aided Geometric Design, pages193–229. Elsevier, 2002.

[40] J. Peters and A. Nasri. Computing volumes of solids enclosedby recursive subdivisionsurfaces.Computer Graphics Forum, 16(3):C89–C94, 1997.

[41] J. Peters and U. Reif. Analysis of generalized B-spline subdivision algorithms.SIAMJournal on Numerical Analysis, 35(2):728–748, Apr. 1998.

[42] H. Prautzsch. Freeform splines.Computer Aided Geometric Design, 14(3):201–206, 1997.

63

[43] H. Prautzsch, W. Boehm, and M. Paluzny.Bezier and B-Spline Techniques. Springer Verlag,2002.

[44] K. Pulli and M. Segal. Fast rendering of subdivision surfaces. InSIGGRAPH ’96: ACMSIGGRAPH 96 Visual Proceedings: The art and interdisciplinary programs of SIGGRAPH’96, page 144, New York, NY, USA, 1996. ACM.

[45] S. Schaefer and J. Warren. Exact evaluation of non-polynomial subdivision schemes atrational parameter values. InPG ’07: Proceedings of the 15th Pacific Conference onComputer Graphics and Applications, pages 321–330, Washington, DC, USA, 2007. IEEEComputer Society.

[46] L.-J. Shiue, I. Jones, and J. Peters. A realtime GPU subdivision kernel. In M. Gross,editor,Siggraph 2005, Computer Graphics Proceedings, Annual Conference Series, pages1010–1015. ACM Press / ACM SIGGRAPH / Addison Wesley Longman, 2005.

[47] J. Stam. Exact evaluation of Catmull-Clark subdivision surfaces at arbitrary parametervalues. InSIGGRAPH, pages 395–404, 1998.

[48] A. Tatarinov. Instanced tessellation in directx10, 2008.http://www.microsoft.com/downloads/details.aspx?FamilyId=572BE8A6-263A-4424-A7FE-69CFF1A5B180displaylang=en.

[49] A. Vlachos, J. Peters, C. Boyd, and J. L. Mitchell. Curved PN triangles. In2001,Symposium on Interactive 3D Graphics, Bi-Annual Conference Series, pages 159–166.ACM Press, 2001.

[50] D. Zorin. Subdivision for modeling and animation.ACM SIGGRAPH Course Notes, 2000.

64

BIOGRAPHICAL SKETCH

Tianyun Ni was born in Nanjing, China. She was awarded her BS in computer science with

mathematics minor from Texas State University in 2000 and her ME in computer engineering

from University of Florida in 2002. She earned her doctoral degree in computer graphics field in

2008.

65

Documents

REAL-TIME SMOOTH SURFACE CONSTRUCTION ON THE …ufdcimages.uflib.ufl.edu/UF/E0/02/19/75/00001/ni_t.pdf1.3 Modern GPU Pipeline and Current Trends A graphics processing unit (GPU) is