10
EUROGRAPHICS 2006 / E. Gröller and L. Szirmay-Kalos (Guest Editors) Volume 25 (2006), Number 3 Real-Time Weighted Pose-Space Deformation on the GPU Taehyun Rhee 1 J.P. Lewis 2 and Ulrich Neumann § 1 1 University of Southern California, U.S.A. 2 Stanford University, U.S.A. Abstract WPSD (Weighted Pose Space Deformation) is an example based skinning method for articulated body animation. The per-vertex computation required in WPSD can be parallelized in a SIMD (Single Instruction Multiple Data) manner and implemented on a GPU. While such vertex-parallel computation is often done on the GPU vertex processors, further parallelism can potentially be obtained by using the fragment processors. In this paper, we develop a parallel deformation method using the GPU fragment processors. Joint weights for each vertex are automatically calculated from sample poses, thereby reducing manual effort and enhancing the quality of WPSD as well as SSD (Skeletal Subspace Deformation). We show sufficient speed-up of SSD, PSD (Pose Space Deformation) and WPSD to make them suitable for real-time applications. Categories and Subject Descriptors (according to ACM CCS): I.3.1 [Computer Graphics]: Hardware Architecture- Parallel processing, I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling-Curve, surface, solid and object modeling, I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-Animation. 1. Introduction Skinning is an important part of realistic articulated body animation and is an important topic of computer graphics and animation. Generally, skinning can be categorized into algorithmic, physically-based, and example-based methods. Although widely used, simple algorithmic skinning schemes cannot capture the complexity and subtlety of real skin de- formation, and revised approaches will be required to in- crease character animation realism. Physically-based skin- ning is based on the biomechanics of skin deformation aris- ing from the motions of muscles and tendons. Although this approach can generate physically accurate simulations of each layer, it is not at present suitable for real time applica- tions such as gaming due to the large computation required. Example-based methods capture some of the complexity of real skin deformation by interpolating scanned or sculpted examples of the desired skin shape in various poses. Al- [email protected] [email protected] § [email protected] though this requires gathering a sufficient number of sam- ples and some pre-calculation, example-based methods can potentially be used in real-time applications due to their rel- atively simple real-time computation. Weighted pose space deformation (WPSD) is an exam- ple based skinning method that generates high quality skin- ning with a limited number of sample poses [ KM04]. Al- though it can generate an accurate skinning, it requires more computation than the original pose space deformation (PSD) [LCF00], since joint distances are computed independently for each vertex. As such, this method has not been suitable for real-time applications. Furthermore, both WPSD and SSD require joint weights for each vertex, and accurate joint weights are required to achieve good results. However, the weights are usually man- ually generated by artists, which requires effort and great skill in the case of a complex skeletal system such as the human hand. In this paper, we present a parallel WPSD algorithm (in- cluding automatic determination of joint weights) suitable for SIMD architectures such as current GPUs. The joint c The Eurographics Association and Blackwell Publishing 2006. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

Real-Time Weighted Pose-Space Deformation on …graphics.usc.edu/cgit/publications/papers/Rhee06_EG.pdf · Real-Time Weighted Pose-Space Deformation on the GPU ... Weighted pose space

  • Upload
    vuphuc

  • View
    231

  • Download
    0

Embed Size (px)

Citation preview

EUROGRAPHICS 2006 / E. Gröller and L. Szirmay-Kalos(Guest Editors)

Volume 25 (2006), Number 3

Real-Time Weighted Pose-Space Deformation on the GPU

Taehyun Rhee†1 J.P. Lewis‡2 and Ulrich Neumann§1

1University of Southern California, U.S.A.2Stanford University, U.S.A.

Abstract

WPSD (Weighted Pose Space Deformation) is an example based skinning method for articulated body animation.The per-vertex computation required in WPSD can be parallelized in a SIMD (Single Instruction Multiple Data)manner and implemented on a GPU. While such vertex-parallel computation is often done on the GPU vertexprocessors, further parallelism can potentially be obtained by using the fragment processors. In this paper,we develop a parallel deformation method using the GPU fragment processors. Joint weights for each vertexare automatically calculated from sample poses, thereby reducing manual effort and enhancing the quality ofWPSD as well as SSD (Skeletal Subspace Deformation). We show sufficient speed-up of SSD, PSD (Pose SpaceDeformation) and WPSD to make them suitable for real-time applications.

Categories and Subject Descriptors (according to ACM CCS): I.3.1 [Computer Graphics]: Hardware Architecture-Parallel processing, I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling-Curve, surface,solid and object modeling, I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-Animation.

1. Introduction

Skinning is an important part of realistic articulated bodyanimation and is an important topic of computer graphicsand animation. Generally, skinning can be categorized intoalgorithmic, physically-based, and example-based methods.Although widely used, simple algorithmic skinning schemescannot capture the complexity and subtlety of real skin de-formation, and revised approaches will be required to in-crease character animation realism. Physically-based skin-ning is based on the biomechanics of skin deformation aris-ing from the motions of muscles and tendons. Although thisapproach can generate physically accurate simulations ofeach layer, it is not at present suitable for real time applica-tions such as gaming due to the large computation required.Example-based methods capture some of the complexity ofreal skin deformation by interpolating scanned or sculptedexamples of the desired skin shape in various poses. Al-

[email protected][email protected]§ [email protected]

though this requires gathering a sufficient number of sam-ples and some pre-calculation, example-based methods canpotentially be used in real-time applications due to their rel-atively simple real-time computation.

Weighted pose space deformation (WPSD) is an exam-ple based skinning method that generates high quality skin-ning with a limited number of sample poses [KM04]. Al-though it can generate an accurate skinning, it requires morecomputation than the original pose space deformation (PSD)[LCF00], since joint distances are computed independentlyfor each vertex. As such, this method has not been suitablefor real-time applications.

Furthermore, both WPSD and SSD require joint weightsfor each vertex, and accurate joint weights are required toachieve good results. However, the weights are usually man-ually generated by artists, which requires effort and greatskill in the case of a complex skeletal system such as thehuman hand.

In this paper, we present a parallel WPSD algorithm (in-cluding automatic determination of joint weights) suitablefor SIMD architectures such as current GPUs. The joint

c© The Eurographics Association and Blackwell Publishing 2006. Published by BlackwellPublishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden,MA 02148, USA.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

weights for each vertex are automatically computed from thesample poses. This can enhance the skinning quality not onlyof SSD but also WPSD, since both methods require accuratejoint weight values.

The deformation required in WPSD and SSD is indepen-dent for each vertex and this per-vertex computation can beparallelized in a SIMD architecture. The GPU is a generalSIMD architecture having one-sided (unidirectional) com-munication to texture memory. We demonstrate our parallelWPSD method using GPU fragment processors. In our ex-periments, we can speed up SSD, PSD, as well as WPSDto around 20 times faster than on the CPU (from 1.2FPS to25FPS speed-up of WPSD on a detailed model having 22836triangles with 11574 vertices) using a modern graphics card,thus making WPSD a feasible real-time skinning solutionfor various applications including games, virtual reality, andother real-time simulations.

2. Related work

Many commercial software packages generate skin defor-mation arising from joint movement using a method knownas (linear blend) skinning, Skeletal Subspace Deformation(SSD), enveloping, etc., based in part on work published byThalmann et al. [MTLT88]. SSD is based on the weightedblending of affine transformations of each joint and used inmany real-time applications due to its simple and fast com-putation. However, it also exhibits some well known artifactssuch as skin that collapses around the joints at increasingbend angles, and a variety of solutions for these problemshave been published [Web00, WP02, MTG03, KZ05].

Recently, example-based methods [LCF00, SRC01,ACP02, KJP02, KM04] have permitted more complexskinning effects such as muscle bulges and major wrinkles,while also addressing the artifacts of simple algorithmicschemes. In these methods, a number of provided (scannedor sculpted) samples of the desired skin shape are simplyinterpolated based on the creature’s pose (and possibly addi-tional abstract control “dimensions”). These example-basedmethods can also be considered as a non-parametric ap-proach to skin deformation. In common with non-parametricsampling methods in texture synthesis (and more generallyin statistical regression), the amount of memory for thesemethods grows with the number of training samples, butarbitrary distributions can be approximated.

Some of the most impressive example-based results todate are those of Kurihara and Miyata’s hand model derivedfrom medical images [KM04]. Since acquiring 3D medi-cal images is relatively expensive, they developed weightedpose space deformation (WPSD) to generate proper skinningfrom a limited number of pose samples. They modify the dis-tance between poses using the joint weights of each vertex toprovide a more appropriate distance measure for skinning.

Although the joint weights for each vertex are important

data for SSD and WPSD calculations, they have traditionallybeen manually generated by skilled artists. Least-squaresbased vertex weight estimation was shown in the skinningmethods [WP02, MTG03]. James et al. describe mesh basedskinning including estimation of bone parameters and ver-tex weights for each bone [JT05]. In their paper, the vertexweights of each joint are calculated by NNLS (non-negativeleast squares) and we derive a similar approach to calculateweights for SSD and WPSD.

In recent years, since the performance of GPUs has beenimproving more rapidly than that of CPUs, and GPUs havemany processing units serving as a SIMD parallel architec-ture, many algorithms have been accelerated by GPU pro-gramming [LHK∗04, PF05, GPG]. Deformation and skin-ning algorithms can also be enhanced by GPUs and severalpapers have profited from this [JP02, KJP02, BK05, JT05].

However, in previous research, since vertex informationcannot be accessed in the fragment program, GPU-basedvertex deformation is usually performed by vertex programs.In this paper, we develop a parallel WPSD method using thefragment processors to gain greater parallelism and perfor-mance.

Person-specific data modeling and its deformation is alsoan interesting topic in realistic articulated body simulation.Rhee et al. described human hand modeling from surfaceanatomy of the person [RNL06]. Anguelov et al. developedshape completion and animation of people, derived from theset of range scan data and example based deformation inpose and shape space [ASK∗05].

Physically inspired skinning should be also recognized asanother important area of articulated body animation. How-ever, we entrust the review of the subject to the recent relatedpapers [AHS03, CBC∗05, PCLS05, SNF05].

3. Skin deformation

Example-based skinning problems can be described by thefollowing general equation,

v(pa) = S(v0 +D(pa)) (1)

where pa is an arbitrary pose, v(pa) is a vertex of a de-formed target surface of the arbitrary pose, v0 is an unde-formed (rest pose) vertex, S is the SSD function, and D(pa)is a displacement as a function of the arbitrary pose.

In skeletal subspace deformation the displacement D(pa)is omitted and the target surface is calculated by SSD asa blend of affine transforms of v0 [section 3.1]. Skinningmethods related to PSD use the displacement of an arbitrarypose D(pa), calculated by interpolation in pose space [sec-tion 3.2].

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

(a) (b) (c)

(d) (e) (f)

Figure 1: Skinning result of each algorithm: (a) SSD, (b) PSD, (c) WPSD, (d) Difference between SSD and PSD (blue dottedarea), (e) Difference between SSD and WPSD (blue dotted area), (f) Difference between PSD and WPSD (blue dotted area);areas around blue and red arrows represent unexpected results of SSD and PSD respectively.

3.1. Skeletal subspace deformation (SSD)

SSD [MTLT88] is based on the weighted blending of anaffine transformation of each joint by equation 2.

va = S(v0) = (njoint

∑j=1

wjTj)v0 (2)

where n joint is the number of joints, va is a vertex in anarbitrary pose pa, v0 is a vertex in the rest pose, and wj

is a joint weight that defines the contribution of joint j’stransformations to the current vertex. The weight wj canbe assigned by the artist to control deformation and usu-ally ∑

njoint

j=1 (wj) = 1.0. This simple algorithm is used in manycommercial graphics packages and real-time rendering ap-plications but shows several limitations, because the defor-mation of this method is restricted to the subspace of theaffine transformation of the joints [LCF00].

3.2. Pose space deformation (PSD)

If we have a sufficient set of examples to describe the move-ment of an articulated object, we can interpolate displace-ment in “pose space” [LCF00]. Each sample pose consistsof sample skin geometry and the related joint skeleton, anda vector containing the joint angles represents the pose.

If we translate each skinning sample k to the rest coordi-nate frame using inverse SSD, the displacement between the

sample vertex vk and the rest pose vertex can be calculated:

dk = (njoint

∑j=1

wjTj)−1vk − v0 (3)

where vk is a vertex in sample pose pk and dk is the displace-ment of this vertex relative to v0 in the sample pose pk; theother variables are defined as in equation 2. Note that theinverse here is of the weighted sum of affine transforms.

After defining the displacement of each pose, the dis-placement at an arbitrary pose can be calculated by RBF(Radial Basis Function) [LCF00] or normalized radial basisfunction [KM04] interpolation of the example poses’ dis-placements.

The displacement da of a vertex in an arbitrary pose pa

can be calculated as

da = D(pa) =npose

∑k=1

rk(pa)dk (4)

where npose is the number of sample poses, da is a displace-ment of the vertex in an arbitrary pose pa, and the weightrk(pa) defines the contribution of each sample pose.

Normalized Radial Basis Functions can smoothly interpo-late pose space using:

ft(pa) =npose

∑k=1

λ t,kφk(γk) (5)

where ft(pa) is the radial basis weight function for examplet evaluated at an arbitrary pose pa, npose is the number of

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

sample poses, λt,k are real valued weights between pose tand k, φk are the radial basis functions, and γk is the distancebetween the pose pk and the arbitrary pose pa (defined as theEuclidian distance between the joint vectors of each pose).

The weight rk(pa) is calculated using normalized RBFsand is used in equation 4 to calculate the displacement da ofa vertex in an arbitrary pose pa:

rk(pa) =fk(pa)

∑npose

t=1 ft(pa)(6)

Gaussian radial basis functions φk(γk) = exp(−γ2k

2σ2 ) are onepossible choice of the basis and the constant σ can be speci-fied experimentally [LCF00].

3.3. Weighted pose space deformation (WPSD)

WPSD is developed by Kurihara et al. [KM04] to deformtheir example-based human hand model derived from med-ical images. In the original PSD, the distance between twoposes pa and pk having njoint number of joints for each poseis defined as

γk(pa, pk) =

√√√√njoint

∑j=1

(pa, j − pk, j)2 (7)

In equation 7, since the γk is the difference of njoint-dimensional joint vectors of related poses, every vertex inthe pose pk has same distance γk resulting in the sameweight rk(pa) in every vertex of the pose pk. Further-more, because each element of the joint vector equally con-tributes to the distance calculation, two vectors having asame value but different order generate same pose distance.For example, three different joint vectors p1 = (θ,0,0), p2 =(0,θ,0), p3 = (0,0,θ) have same distance between them andit can cause unexpected results in PSD.

In WPSD [KM04], Kurihara et al. modify the distancedefinition between poses using joint weight of each vertexi to give proper weight to each element of a joint vector,

γi,k(pa, pk) =

√√√√njoint

∑j=1

wi, j(pa, j − pk, j)2 (8)

where γi,k(pa, pk) is the distance between pose pa and pk ofvertex i, and wi, j is weight of joint j of vertex i used in equa-tion 2. From this definition, a more accurate pose distance isobtained and it generates better skinning in arbitrary poses,especially when the poses are far from the examples.

Figure 1 shows result of three different skinning methods,but we entrust the detail comparison between quality of eachalgorithm to their original papers [MTLT88,LCF00,KM04].

4. Computing joint weights from samples

The joint weights of each vertex are important to generateaccurate skinning in SSD (equation 2) as well as in WPSD

(equation 8). In many applications, the weights are manuallygenerated by skilled artists and it is hard to generate accuratevalues when a number of joints are involved in deforminga region. In this paper, we automatically calculate the jointweights of each vertex from the sample poses to enhance theaccuracy of the weight value. This results in better skinningand reduces the elaborate manual work required to createweight maps.

In each sample pose pk, we have following equation basedon SSD:

vk − ek = (njoint

∑j=1

wjTj)v0 (9)

where vk is a particular vertex from skin sample k, the righthand side is the SSD deformation of vertex v0 from the restpose, ek is a displacement between the SSD deformation andvk, and the other variables are as in equation 2.

If we have sufficient examples involving the same set ofn joint joints, we have npose equations of the form:

vk − ek = (njoint

∑j=1

v jw j) (10)

where v j is v0 transformed by Tj. Although the ek is un-known, we can solve for weights that minimize the ek ina least squares sense by stacking the equations 10 (with ekomitted) into a linear matrix system

‖v−Aw‖2 (11)

where w is a n joint -dimensional weight vector, v is a 3npose-dimensional vector containing the vertex vi from every sam-ple, and A is a 3npose×n joint matrix .

From equation 11, we can calculate w from the givenvalue of v and A to reduce the error of this equation. Weuse the non-negative least square (NNLS) method to solvethis problem and it determines positive weight values mini-mizing error in equation 10. The calculated weight vector wis normalized to satisfy ∑

njoint

j=1 wj = 1.0.

In order to avoid a singular matrix A, the number of posesshould be greater or equal to the number of overall DOF(Degree Of Freedom) of the joint vector (each joint has 3DOF), and the sample poses should be sufficiently different.

James et al. used a similar approach to estimate vertexweights in each joint [JT05] and we demonstrate their effortsin our skinning method.

5. Parallel deformation on GPU

Skinning deformations vary across vertices. In SSD andWPSD, this per-vertex computation is independent for eachvertex and can be parallelized by a SIMD parallel architec-ture. We developed a parallel skinning algorithm for SSDand WPSD that is suitable to GPUs having a SIMD archi-tecture with one-side communication to texture memory.

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

5.1. Parallel WPSD

The computation cost of the SSD skinning algorithm isO(nvertex × n joint ) from equations 1, 2, PSD is O(nvertex ×n joint × npose) from equations 1, 2, 4, and WPSD isO(nvertex × n joint × npose × npose × npose) from equations 1,2, 4, 5, 6. Where, computation cost of original PSD is de-fined by equation 1, 2, 4, since ri is same in all vertices anddi can be pre-calculated.

The number of joints njoint and poses npose can be reducedto the smaller numbers using the method developed by Kryet al. [KJP02], as will be discussed in section 5.2.1 with ef-forts to reduce texture memory space.

In previous research, the Eigenskin method based on PSDwas developed using GPU vertex programming [KJP02].The vertex program uses a relatively small number of slowprocessing units compared with the fragment processors,and the per-vertex computation cost of the original PSDis O(n joint × npose). Therefore WPSD, having higher per-vertex computation cost O(njoint × npose × npose × npose),can clearly benefit from parallel computation on fragmentprocessors.

5.2. Parallel WPSD on GPU

We developed parallel skinning using the GPU fragmentprocessors and demonstrate our method using three render-ing passes. In order to minimize real-time computation, weseparate possible pre-calculation steps and save the resultsinto texture memory using texture maps. Because the valuein the texture memory is not changed in the successive defor-mation, it can be pre-computed and stored in the read-onlytexture memory.

In the first and second pass, per-vertex deformation is cal-culated in the fragment program and the results are storedin texture maps using the FBO (Frame Buffer Object) exten-sion [Gre05]. These texture maps are bound to the geometryof the rest pose with their texture coordinates. In the thirdpass, each vertex in the rest pose is changed by the deformedvertex stored in the output texture generated in the first andsecond passes using vertex texture fetch.

5.2.1. Packing data into textures

The fragment processors cannot access vertex information.Instead, we can use texture memory to send data to the frag-ment program. Information needed in the fragment programis packed into texture maps and stored into texture memory.

Geometry information from the rest pose is stored intotwo RGB texture maps, a vertex texture Tv and normal tex-ture Tn; each has size nvertex × 3. These textures representparameter v0 in equation 2 and each 3D element (x, y, z) isstored into the (r, g, b) value of a texel [Figure 2].

The joint weights calculated in section 4 are also stored

Figure 2: Packing data into textures: texture map Ti can beTv, Tn, Tw1, Tw2, and Td j. V(vertex), N(normal), W(weight),J(joint index), and D(displacement) represent each texel(RGB(A)) value of the related texture. Td consist of eight Td jstoring displacements of each pose j.

in the texture maps. In general, the distribution of skin-ning effects in an articulated body is local to several joints[MMT97,KJP02], even in a region as complicated as a hand.For example, deformations arising from the PIP (ProximalInterphalangeal) joint of index finger do not propagate tothe other fingers, and deformation on the middle phalanx ofindex finger is only affected by the movement of PIP andDIP(Distal phalanx) joints. From this observation, we canreduce joint weight storage from the actual number of jointn joint to a smaller number of “principal joints” n joint selectedby sorting on the weight value. We threshold njoint at four inour tests with an additional four elements to hold the relatedjoint index. As a result, we can save the joint weights of en-tire geometry in two RGBA textures Tw1, Tw2 each with sizenvertex × 4(rgba) and store the entire information requiredfor SSD [equation 2] in four textures Tv, Tn, Tw1, and Tw2.

The displacement values calculated by equation 3 can bestored in npose displacement textures; npose is the number ofsample poses. In case of complex joint structures and a largeDOF model, we need many sample poses to calculate ac-curate joint weights and PSD deformation. However, sincethe joint weights can be pre-calculated, we can reduce thenumber of sample poses needed in real-time PSD compu-tation. PCA (Principal Component Analysis) of pose spacecan yield an orthogonal basis called “ Eigendisplacement ”[KJP02]. If we reduce the size of pose space from npose tonpose “principal poses” (npose < npose), we can reduce thenumber of displacement textures. In our paper, we set npose

as eight in our experiment and save displacements of allposes into a RGB texture Td having size nvertex ×8(npose)×3(rgb).

Therefore, from the two important observations of “prin-cipal joints” and “principal poses”, the original computation

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

cost for SSD, PSD, and WPSD discussed in section 5.1 canbe reduced using n joint and npose rather than njoint and npose.

In the original PSD, since the weight ri in equation 4 is thesame at every vertex, we do not need to calculate this valuein the GPU. Since the size of this value is just npose, we cansimply pass them to the GPU as parameters without gener-ating a texture map. Therefore, we store all the informationneeded to calculate the original PSD at this point.

In order to reduce real-time computation, we pre-calculateTj in equation 2 and λ in equation 5 and store them intoanother one channel texture Tx having size npose × (npose +n joint × 3(x,y, z)).

As a result, we store all the variables required tocalculate WPSD, PSD, and SSD in six texture maps:Tv,Tn,Tw1,Tw2,Td , and Tx. The values in the texture mapsare stored in the texture memory at setup time, since theyare not changed during the deformation process.

In current graphic card architectures, data transfer fromCPU to GPU is slow compared with memory access withinthe GPU. Since the only data changed in each deformationand passed from CPU to GPU is a joint vector pa (size =n joint ) representing the current arbitrary pose, the memoryaccess rate in our method is very efficient; In the originalPSD method, an additional rk value (size = npose) is required.

5.2.2. Configurations for fragment program

Variables: Tout = output texture, Tv = vertex texture

1 /* Set orthographic camera with same size of quad */;2 gluOrtho2D(-1, 1, -1, 1);3 bind(FBO);4 /* Bind Tout and set to FBO drawbuffer */;5 bind(Tout ), SetFBOdrawbuffer(Tout);6 bind(Tv);7 enable(fragment program);8 /* Set viewport to the resolution of the texture */;9 glViewport(0, 0, texWidth, texHeight);

10 /* Render a quad into Tout using FBO */;11 glBegin(GL_QUADS);12 { glTexCoord2f(0, 0); glVertex3f(-1, -1, -0.5f);13 glTexCoord2f(1, 0); glVertex3f( 1, -1, -0.5f);14 glTexCoord2f(1, 1); glVertex3f( 1, 1, -0.5f);15 glTexCoord2f(0, 1); glVertex3f(-1, 1, -0.5f);16 };17 disable(fragment program);

Algorithm 1: Configuration of fragment program for ver-tex refering and direct rendering into texture

Since vertex information cannot be accessed by the frag-ment program, vertex deformation on a GPU is usually per-formed by a vertex program [KJP02, BK05]. Although, we

cannot access vertex data in the fragment program, the ef-ficiency of parallel computation on a fragment program ishigher, since the fragment processor has more processingunits and each of them has more computation power than avertex processor. The fragment processing system is a gen-eral SIMD architecture using fragment streams as input data;each fragment is assigned to a fragment processor to calcu-late its final color value independently and in parallel.

We developed a parallel WPSD algorithms using the frag-ment processors to enhance the extent of parallel compu-tation. Geometry information like vertex positions and nor-mals are stored in texture maps Tv and Tn as described insection 5.2.1 and the vertex information is referred in thefragment processors to calculate final color values.

In order to assign each vertex value stored in a texturemap to a fragment, we bind the geometry texture Tv or Tn

to a quad and render it using an orthographic camera havingthe same width and height as the quad. Furthermore, sincethe viewport is set to the same resolution as the textures,each fragment is exactly matched with each texel holding thevertex information, and we can access each vertex using thetexture coordinates of the fragment; vertex weights and dis-placements stored in the texture maps can also be accessedby similar methods.

A similar idea was developed in [PBMH02] to calculateray tracing in a fragment program and is used in GPGPU(General Purpose computation on GPUs) applications [GPG,LHK∗04, PF05].

The FBO (Frame Buffer Object) extension [Gre05] sup-ports rendering into an attached texture. This saves memoryand time, since there is no copy operation from frame bufferto texture buffer. We implemented our WPSD algorithm us-ing the fragment program with the FBO extension to storethe result directly into texture maps accessed by vertex pro-gram in the next pass. A summary of this method is shownin Algorithm 1.

5.3. GPU implementation

We implemented GPU deformation using three renderingpasses, and the basic architecture is described in figure 3.

In the first pass, we parallelize per-vertex deformation us-ing GPU fragment processors. The data required to calculatethis deformation is stored in the textures as described in sec-tion 5.2.1 and the deformation for each vertex is calculatedin a fragment processor. In a given arbitrary pose defined bya joint vector, SSD is computed by equation 2 using texturemaps Tv, Tw1, Tw2 and Tx; refer to the texture map notation insection 5.2.1. PSD is computed by equation 4 using Td , Tx,after calculating rk(pa) by equation 6. In the case of WPSD,the weighted distance is computed by equation 8 using Tw1,Tw2, and Tx.

In the first pass, the result of the deformation is rendered

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

Figure 3: Overview of WPSD on GPU: Each Ti is the texturemap storing the required data for the calculation (refer tosection 5.2.1 for their values) , Ti′s are the texture maps tostore the output of the 1st and 2nd passes, P0 is the geometryin the rest pose, and Pa is a deformed model in an arbitrarypose.

into a texture map Tv′ , using the FBO, and passed to thethird pass. In the second pass we calculate and store normaldeformation with a similar method as in the first pass, andthe results are stored in the texture map Tn′ .

In the third pass, using a vertex program, each vertex ofthe rest pose is transformed to the final deformed position us-ing the information from the texture maps computed in theprevious two passes. In order to access related texture infor-mation in each vertex, we created texture coordinates of eachtexel in pre-processing and used them in the vertex program.Specifically, the two texture maps, Tv′ and Tn′ that are gener-ated in the first and second passes are accessed in the vertexprogram using the texture coordinate of the current vertex.

Alternatively, multiple render targets (MRTs) can com-bine the first and second pass, and vertex buffer objects(VBOs) could be used to render the deformed results backto the vertex array [OPE, GPG, LHK∗04].

6. Results

We tested our methods using upper arm models consistingof four joints (collar, shoulder, elbow, and wrist). Each hasthree DOF and the wrist is the end joint having no DOF.Three different resolution meshes are used to test the per-formance of GPU parallel computation: the high-resolutionmodel has 91460 triangles with 46036 vertices, the mid-resolution model has 22836 triangles with 11574 vertices,and the low-resolution model has 5762 triangles with 2972

vertices [Figure 4]. Note that these models are considerablymore detailed than those used in current games, so the re-ported frame rates would be much higher if typical game-resolution models were used. On the other hand, with theexpected growth of GPU processing power, models such asthese will be in wide use in a few years, and algorithms suchas WPSD will be required to produce realistic deformationsat this level of resolution.

Figure 4: Mesh of test data: the top row is a low-resolutionmesh, the second row is a mid-resolution mesh, and the bot-tom row is a high-resolution mesh

Eight sample poses were created by Poser [Cur] and thejoints weights and displacements of each sample were de-rived from these models [Figure 5].

Our parallel algorithm is based on three pass GPU com-putation. The fragment program for the 1st and 2nd pass,and the vertex program for the 3rd pass are implemented inthe Cg language [FK03]. For accuracy the GPU computationis performed by 32bit floating point operations with 32bitfloating point texture maps. Table 2 shows the total memoryspace to store texture maps required by the fragment pro-gram. Note that the maximum required memory space forthe highest resolution model is just 6.8 Mbytes; the size ofthe output texture Tv′ and Tn′ is the same as the size of Tv

and Tn.

The results of GPU-based deformation for SSD, PSD, andWPSD are shown in Figure 1 and 6, and the experiment isperformed in a GeForce 6800 Ultra GPU and a 3.4Ghz Pen-tium 4 CPU. The timing results of each algorithm on theCPU and GPU are summarized in table 1.

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

On average, our GPU-based deformation shows around20 times speed-up compared with CPU-based deformation.GPU-based WPSD has roughly the same speed as CPU-based SSD. Therefore, real-time applications using SSD cansubstitute WPSD running on the GPU without loosing theirreal-time performance. Since our algorithm shows speed-upfor SSD and PSD as well as WPSD, applications can choosethe most appropriate skinning method according to the re-quired deformation and detail.

Method Mesh CPU(FPS) GPU(FPS)

SSD low 150 1425

middle 39 630

high 5 164

PSD low 98 1230

middle 23 530

high 4.5 140

WPSD low 5 85

middle 1.2 25

high 0.29 7

Table 1: Timing results (in FPS) of each algorithm: the low-resolution mesh has 5762 triangles with 2972 vertices, themid-resolution mesh has 22836 triangles with 11574 ver-tices, and the high-resolution mesh has 91460 triangles with46036 vertices.

Vertices Tv&Tn Tw1&Tw2 Td Tx Tot

2972 (low) 35×2 46×2 278 1 441

11574 (mid) 135×2 180×2 1080 1 1711

46036 (high) 539×2 719×2 4315 1 6832

Table 2: Texture memory to store data required in fragmentprogram (in Kbytes); refer to section 5.2.1 for texture nota-tion .

7. Conclusions

In this paper, we present a parallel skinning algorithm suit-able for SIMD architectures such as GPUs. The joint weightsof each vertex are automatically computed by NNLS andused in the skinning computation for SSD and WPSD.

Independent per-vertex deformation is parallelized on theGPU using three rendering passes. In the first and secondpasses, per-vertex deformation is calculated by the fragmentprocessors and the results are stored in texture maps usingFBO. In the third pass, using vertex processors, each vertexof the rest pose is changed by the deformed vertex stored inthe textures generated by the first and second passes.

Figure 6: Arbitrary poses deformed by WPSD on GPU

Articulated body skinning using SSD, PSD, and WPSDare efficiently parallelized by our GPU-based method, andon a detailed model, we obtain around 20 times speed-upcompared with CPU-based computation.

Principal component compression of the examples andcareful analysis of joint distributions can reduce the domainof computation [KJP02] and other algorithms based on theSSD, PSD, and shape interpolation may be parallelized onGPU using our approach.

Acknowledgments

This research has been funded by the Integrated Media Sys-tem Center/USC, and Samsung Electronics. We wish tothank KyungKook Park, Changki Min, and Tim Foley fordiscussions about GPUs, and the anonymous reviewers fortheir sincere comments.

References

[ACP02] ALLEN B., CURLESS B., POPOVIc; Z.: Artic-ulated body deformation from range scan data. In SIG-GRAPH ’02: Proceedings of the 29th annual conference

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

Figure 5: Samples poses and displacements: the first row shows each sample poses, the second row shows displacement of eachsample pose with the rest pose in the first column of the third row. Please enlarge to see details.

on Computer graphics and interactive techniques (NewYork, NY, USA, 2002), ACM Press, pp. 612–619.

[AHS03] ALBRECHT I., HABER J., SEIDEL H. P.: Con-struction and animation of anatomically based humanhand models. In Proceedings of the 2003 ACM SIG-GRAPH/Eurographics Symposium on Computer Anima-tion (SCA-03) (2003), pp. 98–109.

[ASK∗05] ANGUELOV D., SRINIVASAN P., KOLLER D.,THRUN S., RODGERS J., DAVIS J.: Scape: shape com-pletion and animation of people. ACM Trans. Graph. 24,3 (2005), 408–416.

[BK05] BOTSCH M., KOBBELT L.: Real-time shape edit-ing using radial basis functions. Computer Graphics Fo-rum 24, 3 (2005), 611–621. (Proceedings of Eurographics2005).

[CBC∗05] CAPELL S., BURKHART M., CURLESS B.,DUCHAMP T., POPOVIc; Z.: Physically based riggingfor deformable characters. In SCA ’05: Proceedings ofthe 2005 ACM SIGGRAPH/Eurographics symposium onComputer animation (New York, NY, USA, 2005), ACMPress, pp. 301–310.

[Cur] CURIOUSLAB: Poser 6. http://www.curiouslabs.com.

[FK03] FERNANDO R., KILGARD M. J.: The Cg Tuto-rial; The Definitive Guide to Programmable Real-TimeGraphics. Addison-Wesley, 2003.

[GPG] GPGPU: General-purpose computation usinggraphics hardware. http://gpgpu.org.

[Gre05] GREEN S.: The OpenGL Framebuffer Ob-ject Extension. Game Developers Conference, 2005.http://developer.nvidia.com/object/gdc_2005_presentations.html.

[JP02] JAMES D. L., PAI D. K.: Dyrt: dynamic responsetextures for real time deformation simulation with graph-ics hardware. In SIGGRAPH ’02: Proceedings of the 29thannual conference on Computer graphics and interac-

tive techniques (New York, NY, USA, 2002), ACM Press,pp. 582–585.

[JT05] JAMES D. L., TWIGG C. D.: Skinning mesh ani-mations. ACM Trans. Graph. 24, 3 (2005), 399–407.

[KJP02] KRY P. G., JAMES D. L., PAI D. K.: Eigen-Skin: Real time large deformation character skinning inhardware. In Proceedings of the 2002 ACM SIGGRAPHSymposium on Computer Animation (SCA-02) (2002),pp. 153–160.

[KM04] KURIHARA T., MIYATA N.: Modeling de-formable human hands from medical images. In Proceed-ings of the 2004 ACM SIGGRAPH Symposium on Com-puter Animation (SCA-04) (2004), pp. 357–366.

[KZ05] KAVAN L., ZARA J.: Spherical blend skinning:A real-time deformation of articulated models. In 2005ACM SIGGRAPH Symposium on Interactive 3D Graphicsand Games (April 2005), ACM Press, pp. 9–16.

[LCF00] LEWIS J. P., CORDNER M., FONG N.: Posespace deformation: a unified approach to shape interpo-lation and skeleton-driven deformation. In SIGGRAPH’00: Proceedings of the 27th annual conference on Com-puter graphics and interactive techniques (New York,NY, USA, 2000), ACM Press/Addison-Wesley PublishingCo., pp. 165–172.

[LHK∗04] LUEBKE D., HARRIS M., KRUGER J., PUR-CELL T., GOVINDARAJU N., BUCK I., WOOLLEY C.,LEFOHN A.: Gpgpu: general purpose computation ongraphics hardware. In GRAPH ’04: Proceedings of theconference on SIGGRAPH 2004 course notes (New York,NY, USA, 2004), ACM Press, p. 33.

[MMT97] MOCCOZET L., MAGNENAT-THALMANN N.:Dirichlet free-form deformations and their application tohand simulation. In Computer Animation (1997).

[MTG03] MOHR A., TOKHEIM L., GLEICHER M.: Di-rect manipulation of interactive character skins. In SI3D’03: Proceedings of the 2003 symposium on Interactive3D graphics (New York, NY, USA, 2003), ACM Press,pp. 27–30.

c© The Eurographics Association and Blackwell Publishing 2006.

T. Rhee, J.P. Lewis, and U. Neumann / EG 2006

[MTLT88] MAGNENAT-THALMANN N., LAPERRIÈRE

R., THALMANN D.: Joint–dependent local deformationsfor hand animation and object grasping. In Graphics In-terface ’88 (June 1988), pp. 26–33.

[OPE] OPENGL: Opengl extension registry. http://oss.sgi.com/projects/ogl-sample/registry/.

[PBMH02] PURCELL T. J., BUCK I., MARK W. R.,HANRAHAN P.: Ray tracing on programmable graph-ics hardware. ACM Transactions on Graphics 21, 3 (July2002), 703–712. ISSN 0730-0301 (Proceedings of ACMSIGGRAPH 2002).

[PCLS05] PRATSCHER M., COLEMAN P., LASZLO J.,SINGH K.: Outside-in anatomy based character rig-ging. In SCA ’05: Proceedings of the 2005 ACM SIG-GRAPH/Eurographics symposium on Computer anima-tion (New York, NY, USA, 2005), ACM Press, pp. 329–338.

[PF05] PHARR M., FERNANDO R.: GPU Gems 2; Pro-gramming Techniques for High-Performance Graphicsand General-Purpose Computation. Addison-Wesley,2005.

[RNL06] RHEE T., NEUMANN U., LEWIS J.: Humanhand modeling from surface anatomy. In I3DG ’06:Proc. of ACM SIGGRAPH Symposium on Interactive 3DGraphics and Games (2006).

[SNF05] SIFAKIS E., NEVEROV I., FEDKIW R.: Au-tomatic determination of facial muscle activations fromsparse motion capture marker data. ACM Trans. Graph.24, 3 (2005), 417–425.

[SRC01] SLOAN P.-P. J., ROSE C. F., COHEN M. F.:Shape by example. In SI3D ’01: Proceedings of the 2001symposium on Interactive 3D graphics (New York, NY,USA, 2001), ACM Press, pp. 135–143.

[Web00] WEBER J.: Run-time skin deformation. In InProceedings of Game Developers Conference (2000).

[WP02] WANG X. C., PHILLIPS C.: Multi-weightenveloping: least-squares approximation techniques forskin animation. In SCA ’02: Proceedings of the 2002ACM SIGGRAPH/Eurographics symposium on Computeranimation (New York, NY, USA, 2002), ACM Press,pp. 129–138.

c© The Eurographics Association and Blackwell Publishing 2006.