3D MODEL COMPRESSION USING IMAGE COMPRESSION BASED …kilyos.ee.bilkent.edu.tr/~signal/Theses/3DModelComp.pdfI certify that I have read this thesis and that in my opinion it is fully

3D MODEL COMPRESSION USING IMAGE

COMPRESSION BASED METHODS

a thesis

submitted to the department of electrical and

electronics engineering

and the institute of engineering and sciences

of bilkent university

in partial fulfillment of the requirements

for the degree of

master of science

By

Kıvanc Kose

January 2007

I certify that I have read this thesis and that in my opinion it is fully adequate,

in scope and in quality, as a thesis for the degree of Master of Science.

Prof. Dr. Enis Cetin(Supervisor)



Prof. Dr. Levent Onural



Assoc. Prof. Dr. Ugur Gudukbay



Approved for the Institute of Engineering and Sciences:

Prof. Dr. Mehmet BarayDirector of Institute of Engineering and Sciences

ii

ABSTRACT

3D MODEL COMPRESSION USING IMAGE

COMPRESSION BASED METHODS

Kıvanc Kose

M.S. in Electrical and Electronics Engineering

Supervisor: Prof. Dr. Enis Cetin

January 2007

A connectivity-guided adaptive wavelet transform based mesh compression algorithm

is proposed. On the contrary to previous work, which process the mesh models as 3D

signals, the proposed method uses 2D image processing tools for compressing the mesh

models. The 3D mesh is first transformed to 2D images on a regular grid structure

by performing orthogonal projections onto the image plane. This operation is com-

putationally simpler than parameterization. The neighborhood concept in projection

images is different from 2D images because two connected vertex can be projected to

isolated pixels. Connectivity data of the 3D mesh defines the interpixel correlations

in the projection image. Thus, the wavelet transforms used in image processing do

not give good results on this representation. Connectivity-Guided Adaptive Wavelet

Transform is defined to take advantage of interpixel correlations in the image-like rep-

resentation. Using the proposed transform the pixels in the detail (high) subbands

are predicted from their connected neighbors in the low-pass (lower) subbands of the

wavelet transform. The resulting wavelet data is encoded using either “Set Partition-

ing In Hierarchical Trees” (SPIHT) or JPEG2000. SPIHT approach is progressive

because different resolutions of the mesh can be reconstructed from different parti-

tions of SPIHT bitstream. On the other hand, JPEG2000 approach is a single rate

iii

coder. The quantization of the wavelet coefficients determines the quality of the re-

constructed model in JPEG2000 approach. Simulations using different basis functions

show that lazy wavelet basis gives better results. The results are improved using

the Connectivity-Guided Adaptive Wavelet Transform with lazy wavelet filterbanks.

SPIHT based algorithm is observed to be superior to JPEG2000 and MPEG in rate-

distortion. Better rate distortion can be achieved by using a better projection scheme.

Keywords: 3D Model Compression, Image-like mesh representation, Connectivity-

guided Adaptive Wavelet Transform

iv

OZET

UC BOYUTLU MODELLERIN IMGE SIKISTIRMA

YONTEMLERIYLE SIKISTIRILMASI

Kıvanc Kose

Elektrik ve Elektronik Muhendisligi Bolumu, Yuksek Lisans

Tez Yoneticisi: Assoc. Prof. Dr. Enis Cetin

Ocak 2007

Uc boyutlu modellerin imge sıkıstırma yontemleri kullanılarak sıkıstırılması icin bir

yontem onerilmektedir. Onerilen yontem, literaturdeki bircok algoritmanın ter-

sine, modellerin 3-Boyutlu veriler yerine 2-Boyutlu veriler olarak ele almaktadır.

3-Boyutlu modeller ilk olarak duzenli ızgara yapıları uzerinde 2-Boyutlu imgelere

donusturulmektedir. Onerilen yontem diger yontemlerde kullanılan parametriza-

syon teknigine gore, hesaplama acısından daha basittir. Elde edilen imge ben-

zeri temsilin sıradan imgelerden tek farkı, pikseller arası ilintinin yanyanalık ile

degil, 3-boyutlu modelin baglanılırlık verisi kullanılarak saglanmasıdır. Bu ne-

denle yaygın kullanılan dalgacık donusumu teknikleri bu temsil uzerinde cok iyi

sonuclar vermemektedir. Burada onerilen Baglanılırlık Bazlı Uyarlamalı Dal-

gacık Donusumu sayesinde dalgacık donusumu sıraduzensel yapısının detay katman-

larında bulunan piksel degerleri, alcak frekans katmanlarında bulunan komsularından

ongorulebilmektedir. Boylece olusturulan dalgacık donusumu verileri Sıraduzensel

Agac Yapılarının Kumelere Boluntulenmesi (Set Partitioning In Hierarchical Trees

- SPIHT) ya da JPEG200 tekniklerinden biri kullanılarak kodlanmaktadı. SPIHT

teknigi sayesinde elde edilen veri dizgisi asamalı gosterime uygundur; cunku, dizginin

farklı uzunluktaki bolumlerinden farklı cozunurluklerde modeller geri catılabilmektedir.

iv

JPEG200 yonteminin burada onerilen sekli tek cozunurluklu gericatıma olanak

saglamaktadır. Onerilen yontemde dalgacık donusumu katsayılarının nicemlenme sekli

geri catılan modelin cozunurlugunu belirlemektedir. Farklı dalgacık donusumu ta-

ban vektorleri kullanılarak yapılan deneyler sonucunda lazy dalgacık donusumunun

en iyi sonucları verdigi gozlemlenmistir. Baglanırlık bazlı uyarlamalı dalgacık

donusumu kullanılarak yapılan deneylerin sonuclarında bir onceki yonteme gore

gelisme gozlemlenmistir. Dalgacık donusumu verilerinin SPIHT ile kodlanmasıyla elde

edilen sonuc, JPEG2000 ile yapılan kodlamanın sonucundan ve 3B modellerin MPEG

ile kodlamasından daha basarılı olmustur. Daha iyi sonuclar elde etmek icin daha iyi

bir izdusum methodu denenmelidir.

Anahtar kelimeler: 3 Boyutlu modellerin Sıkıstırılması, 3B modellerin imge ben-

zeri temsili, Baglanılırlık BazlıUyarlamalıDalgacık Donusumu

v

ACKNOWLEDGEMENTS

I gratefully thank my supervisor Prof. Dr. Enis Cetin for his supervision, guid-

ance, and suggestions throughout the development of this thesis.

I also would like to thank Prof. Dr. Ugur Gudukbay and Prof. Dr. Levent

Onural for reading and commenting on the thesis.

This work is supported by the European Commission Sixth Framework Pro-

gram with Grant No: 511568 (3DTV Network of Excellence Project).

v

Contents

1 Introduction 2

1.1 Mesh Representation . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Compression and Redundancy . . . . . . . . . . . . . . . . 8

1.3 Related Work on Mesh Compression . . . . . . . . . . . . . . . . 10

1.3.1 Single Rate Compression . . . . . . . . . . . . . . . . . . . 12

1.3.2 Progressive Compression . . . . . . . . . . . . . . . . . . . 26

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.5 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Mesh Compression based on Connectivity-Guided Adaptive

Wavelet Transform 35

2.1 3D Mesh Representation and Projection onto 2D . . . . . . . . . 36

2.1.1 The 3D Mesh Representation . . . . . . . . . . . . . . . . 36

2.1.2 Projection and Image-like Mesh Representation . . . . . . 37

vi

2.2 Wavelet Based Image Coders . . . . . . . . . . . . . . . . . . . . . 41

2.2.1 SPIHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.2.2 JPEG2000 Image Coding Standard . . . . . . . . . . . . . 46

2.2.3 Adaptive Approach in Wavelet Based Image Coders . . . . 47

2.3 Connectivity-Guided Adaptive Wavelet Transform . . . . . . . . . 49

2.4 Connectivity-Guided Adaptive Wavelet Transform Based Com-

pression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.4.1 Coding of the Map Image. . . . . . . . . . . . . . . . . . . 53

2.4.2 Encoding Parameters vs. Mesh Quality . . . . . . . . . . . 54

2.5 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Results from the Literature and Image-like Mesh Coding Re-

sults 57

3.1 Mesh Coding Results in the Literature . . . . . . . . . . . . . . . 57

3.2 Mesh Coding Results of the Connectivity-Guided Adaptive

Wavelet Transform Algorithm . . . . . . . . . . . . . . . . . . . . 59

3.3 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 Conclusions and Future Work 95

Bibliography 99

vii

List of Figures

1.1 Sphere (a) and torus (b) are manifold surfaces. The connection of

two rectangles edge to edge, as seen in (c), creates a non-manifold

surface. Sphere has no hole so it is a genus-0 surface, where torus

has one hole so it is a genus-1 surface. All the objects have one

shell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 A triangle strip created from a mesh. . . . . . . . . . . . . . . . . 13

1.3 Spanning trees in the triangular mesh strip method. . . . . . . . . 14

1.4 Opcodes defined in the EdgeBreaker. . . . . . . . . . . . . . . . . 15

1.5 Encoding example for EdgeBreaker. . . . . . . . . . . . . . . . . . 15

1.6 Decoding example for EdgeBreaker. . . . . . . . . . . . . . . . . . 16

1.7 TG Encoding of the same mesh as Edgebreaker. The resulting

codeword will be [8,6,6,4,4,4,4,4,6,4,5,Dummy 13,3,3]. . . . . . . . 18

1.8 The delta-coordinates quantization to 5 bits/coordinate (left) in-

troduces low-frequency errors to the geometry, whereas Cartesian

coordinates quantization to 11 bits/coordinate (right) introduces

noticeable high-frequency errors. The upper row shows the quan-

tized model and the bottom row uses color to visualize correspond-

ing quantization errors. (Reprinted from [32]) . . . . . . . . . . . 20

viii

1.9 Linear prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.10 Parallelogram prediction. . . . . . . . . . . . . . . . . . . . . . . . 22

1.11 (a) A simple mesh matrix; (b) its valence matrix; and (c) its ad-

jacency matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.12 Geometry images: (a) The original model; (b) Cut on the mesh

model; (c) Parameterized mesh model; (d) Geometry image of the

original model. (Reprinted from [5]Data Courtesy Hugues Hoppe) 25

1.13 Edge collapse and vertex split operations are inverse of each other. 27

1.14 Progressive representation of the cow mesh model. The model re-

constructed using (a) 296, (b) 586, (c) 1738, (d) 2924,and (e) 5804

vertices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.15 Progressive forest split. (a) initial mesh; (b) group of splits; (c)

remeshed region and the final mesh. . . . . . . . . . . . . . . . . . 29

1.16 Valence based progressive compression approach. . . . . . . . . . 30

2.1 (a) 2D rectangular sampling lattice and 2D rectangular sampling

matrix (b) 2D quincunx sampling lattice and 2D quincunx sam-

pling matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2 The illustration of the projection operation and the resulting

image-like representation. . . . . . . . . . . . . . . . . . . . . . . 39

2.3 Projected vertex positions of the mesh; (a) projected on XY plane;

(b) projected on XZ plane. . . . . . . . . . . . . . . . . . . . . . . 40

2.4 (a) Original image (b) 4-level wavelet transformed image. . . . . 43

2.5 Lazy filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

ix

2.6 (a) The relation between wavelet coefficients in EZW; (b) the tree

structure of EZW; (c) the scan structure of wavelet coefficients in

EZW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.7 1D lifting scheme without update stage. . . . . . . . . . . . . . . 48

2.8 Lazy filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.9 The block diagram of the proposed algorithm. . . . . . . . . . . . 52

3.1 Reconstructed sandal meshes using parameters (a) lazy wavelet, 60%

of bitstream, detail level=3; (b) lazy wavelet, 60% of bitstream, de-

tail level=4.5; (c) Haar wavelet, 60% of bitstream, detail level=3;

(d) Daubechies-10, 60% of bitstream, detail level=3.(Sandal model

data is courtesy of Viewpoint Data Laboratories) . . . . . . . . . 62

3.2 Meshes reconstructed with a detail level of 3 and 0.6 of the bit-

stream. (a) Lazy, (b) Haar, (c) Daubechies-4, (d) Biorthogonal4.4,

and (e) Daubechies-10 wavelet bases are used. . . . . . . . . . . . 63

3.3 Homer Simpson model compressed with JPEG2000 to 6.58 KB

(10.7 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67


(10.1 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


(15 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


(20.6 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.7 Homer Simpson model compressed with SPIHT to 4.07 KB

(6.6 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

x


(7.1 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


(7.6 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.10 Homer Simpson model compressed with SPIHT to 4.96 KB (8 bpv). 74

3.11 Homer Simpson model compressed with SPIHT to 5550 KB (9 bpv). 76

3.12 Homer Simpson model compressed with SPIHT to 6.76 KB (11 bpv). 77

3.13 Homer Simpson model compressed with SPIHT to 7.92 KB (12.8 bpv). 78

3.14 The qualitative comparison of the meshes reconstructed with with-

out prediction (a and c) and adaptive prediction (b and d). Lazy

wavelet basis is used. The meshes are reconstructed using 60% of

the bitstream with detail level=5 in the Lamp model and 60% of

the bitstream with detail level=5 in the Dragon model. (Lamp

and Dragon models are courtesy of Viewpoint Data Laboratories) 79

3.15 Distortion measure between original (images at left side of the fig-

ures) and reconstructed Homer Simpson mesh models using Mesh-

Tool [69] software. (a) SPIHT at 6.5 bpv; (b) SPIHT at 11 bpv;

(c) JPEG2000 at 10.5 bpv. The grayscale colors on the original

image show the distortion level of the reconstructed model. Darker

colors mean more distortion. . . . . . . . . . . . . . . . . . . . . . 80

3.16 Comparison of our reconstruction method with Garland’s simpli-

fication algorithm [48] (a) Original mesh; (b) simplified mesh us-

ing [48] (the simplified mesh contains 25% of the faces in the orig-

inal mesh); (c) mesh reconstructed by using our algorithm using

60% of the bitstream. . . . . . . . . . . . . . . . . . . . . . . . . . 81

xi

3.17 (a) Base mesh of Bunny model composed by PGC algorithm (230

faces); (b) Model reconstructed from 5% of the compressed

stream (69, 967 faces); (c) Model reconstructed from 15% of the

compressed stream (84, 889 faces); (d) Model reconstructed from

50% of the compressed stream (117, 880 faces); (e) Model re-

constructed from 5% of the compressed stream (209, 220 faces);

(f) Original Bunny mesh model (235, 520 faces). The original

model has a size of 6 MB and the compressed full stream has

a size of 37.7 KB. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.18 Homer and 9 Handle Torus models compressed using MPEG mesh

coder. The compressed data sizes are 41.8 KB and 82.8 KB respec-

tively. Figures on the left side show the error of the reconstructed

model with respect to the original one. Reconstructed models are

shown on the right side. . . . . . . . . . . . . . . . . . . . . . . . 83

3.19 The error between the original dancing human model and recon-

structed dancing human models compressed using SPIHT 13.7 bpv

(a) and 9.7bpv (c) respectively. (b) and (d) show the reconstructed

models. The error between the original dancing human model and

reconstructed dancing human models compressed using MPEG at

63 bpv (e). (f) the reconstructed models. . . . . . . . . . . . . . . 84

3.20 (a) Dragon (5213 vertices) and (c) Sandal (2636 vertices) models

compressed using MPEG mesh coder.(b) Dragon (5213 vertices)

and (d) Sandal (2636 vertices) models compressed using The pro-

posed SPIHT coder. Compressed data size are 43.1 KB and 10.4

KB, respectively for Dragon model and 22.7 KB and 2.77 KB re-

spectively for Sandal model. . . . . . . . . . . . . . . . . . . . . . 85

xii

3.21 9 Handle Torus model compressed with JPEG2000 to 14.6 KB

(12.4 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


(11.6 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.23 9 Handle Torus model compressed with JPEG2000 to 14 KB

(11.9 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88


(14.2 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.25 9 Handle Torus model compressed with SPIHT to 7.84 KB (6.7 bpv). 90

3.26 9 Handle Torus model compressed with SPIHT to 8.18 KB (7 bpv). 91

3.27 9 Handle Torus model compressed with SPIHT to 8960 KB

(7.63 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.28 9 Handle Torus model compressed with SPIHT to 11.9 KB

(10.1 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.29 9 Handle Torus model compressed with SPIHT to 12.7 KB

(10.8 bpv). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

xiii

List of Tables

3.1 Compression results for the single rate mesh connectivity coders

in literature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Compression results for the single rate mesh geometry coders in

literature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 Compression results for the progressive mesh coder in literature

(Geometry + Connectivity). . . . . . . . . . . . . . . . . . . . . . 60

3.4 Compression results for the Sandal model. . . . . . . . . . . . . . 61

3.5 Comparative compression results for the Cow model compressed

without prediction and with adaptive prediction. . . . . . . . . . . 61

3.6 Compression results for the Lamp model using lazy wavelet filter-

bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.7 Compression results for the Homer Simpson model using SPIHT

and JPEG2000. Hausdorff distances are measured between the

original and reconstructed meshes. . . . . . . . . . . . . . . . . . 65

3.8 Compression results for the 9 Handle Torus mesh model using

SPIHT and JPEG2000. Hausdorff distances are measured between

the original and reconstructed meshes. . . . . . . . . . . . . . . . 65

xiv

3.9 Comparative results for the Homer,9 Handle Torus, Sandal,

Dragon, Dance mesh models compressed using MPEG and SPIHT

mesh coders. Hausdorff distances are measured between the orig-

inal and reconstructed meshes. . . . . . . . . . . . . . . . . . . . . 75

xv

To My Family and most beloved. . .

Chapter 1

Introduction

The demand to visualize the real world scenes in digital environments and make

simulations using those data is increased in last years. Three-dimensional (3D)

meshes are used for representing 3D objects. The mesh representations of 3D

objects are created either manually or by using 3D scanning and acquisition tech-

niques [1]. Meshes with arbitrary topology can be created using these methods.

The 3D geometric data is generally represented using two tables: a vertex list

storing the 3D coordinates of vertices and a polygon list storing the connectivity

of the vertices. The polygon list contains pointers into the vertex list.

Multiresolution representations can be defined for 3D meshes [2]. In a fine

resolution version of an object, more vertices and polygons are used as com-

pared to a coarse representation of the object. It would be desirable to obtain

the coarse representation from the fine representation using computationally ef-

ficient algorithms. Wavelet-based approaches are applied to meshes to realize a

multiresolution representation of a given 3D object [2].

As the scenes and the objects composing those scenes becomes more complex

and detailed, the size of the data also grows. So the problem of transmitting

this data from one place to another becomes a more difficult and important task.

2

kkose

Cross-Out

kkose

Cross-Out

kkose

Inserted Text

by

kkose

Cross-Out

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

The transmission can be over a band limited channel, either from one system

to another system or from a storage device to processing unit eg. from main

memory to graphics card [3].

“The fundamental problem in communication is that of reproducing at one

point either exactly or approximately a message selected at another point” [4].

However the choice of the message data is not unique. There exists several pos-

sible sets of messages that can be used to describe the transmitted information.

The problem is creating a description that expresses the data best with the

smallest size.

There exists several mesh compression approaches in the literature. Most of

those approaches treats the meshes as 3D graphs in the space. The geometry and

the connectivity data is compressed separately. The geometry images technique

explained in [5] compresses the meshes using image compression methods. The

meshes are parameterized to two dimensional (2D) planes. Those parameteri-

zations of meshes are treated as images and compressed using a wavelet based

image coder. Parameterization of a surface mesh is a complex task to be applied

to an arbitrary object because many linear equations need to be solved. As the

objects get more complex, the parameterization operation becomes nearly impos-

sible. The surfaces are cut for reducing the complexity of parameterization [6].

The adaptation of signal processing algorithms [7] to surface meshes is also a

challenging task although it is easier than parameterization. It is much easier to

transform the data and apply any algorithm that is needed rather than adapting

signal processing algorithms to 3D graphs.

These drawbacks in [5] gave us the idea of finding easier ways for mapping

meshes to images and using image compression tools directly on those images.

Since image processing is a well established branch of signal processing, there

exists a wide spectrum of algorithms that can be used. Thus, understanding of

3

kkose

Inserted Text

ny

kkose

Cross-Out

kkose

Cross-Out

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

the fundamentals behind compression especially image compression is an essential

issue in our work.

1.1 Mesh Representation

3D Meshes are visualization of 3D objects using vertices (geometry), edges, faces,

some attributes like surface normals, texture, color, etc, and connectivity. 3D

points {v1, ...,vn} ∈ V in R3 are called vertices of a 3D mesh. The convex hull

of two vertices in R3, conv{vn,vm} is called an edge. So an edge is mapped to

line segment in R3 with end points at vn and vm. Face of a triangular mesh

is a surface which is conv{vn,vm,vk}. Thus, a face is mapped to a surface in

R3 that is enclosed by the edges incident to the vertices vn,vm,vk. A face may

have no direction or its direction can be determined using the surface normals

data. The additional attributes of a mesh are mostly carried by the vertices.

That information can be extended along the edges and the faces using linear

interpolation or other techniques (eg. linear, phong shading of a surface).

The connectivity information summarizes which mesh elements are connected

to each other. Edges {e1, ..., en} ∈ E are incident to its two end vertices. Faces

{f1, ..., fn} ∈ F are surrounded by its composing edges and incident to all the

vertices of its incident edges. The edges have no direction. Two types of mesh

connectivity are common in mesh representations. Edge Connectivity is the list of

edges in the mesh and Face Connectivity list of faces in the mesh. In a triangular

mesh, since all the vertices incident to a face lie in a plane, the face also lies in

a plane. In polygonal meshes the number of the vertices that are incident to the

face, is four or more. So face of a polygonal mesh not necessarily lie in a plane.

Vertices of a mesh can be incident to any number of edges. The number of

the edges that are incident to a vertex is named as the valence of the vertex [8].

4

kkose

Cross-Out

kkose

Inserted Text

of

kkose

Cross-Out

kkose

Inserted Text

called

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

The number of the edges that are incident to a face is named as the degree of a

face [8].

The number of the faces incident to an edge and number of the face loops in-

cident to a vertex, are important concepts while defining, if the mesh is manifold

or non-manifold. “A 2-manifold is a topological surface where every point on the

surface has a neighborhood topologically equivalent to an open disk of R2” [9].

If the neighborhood of a point on the surface is equivalent to an half disk than

the mesh is manifold with boundary [9]. In Figure 1.1 examples of manifold and

manifold with boundary and non-manifold surfaces can be seen.

(a) (b) (c)

Figure 1.1: Sphere (a) and torus (b) are manifold surfaces. The connection oftwo rectangles edge to edge, as seen in (c), creates a non-manifold surface. Spherehas no hole so it is a genus-0 surface, where torus has one hole so it is a genus-1surface. All the objects have one shell.

Two other important concept about meshes are shell and genus. Shell is a

part of the mesh that is edge-connected. The genus of a mesh is an integer that

can be derived from the number of closed curves that can be drawn on the mesh

without dividing it into two or more separate pieces. It is equal to the number

of handles on the mesh object [8]. As seen in Figure 1.1(b) a torus is genus-1

since it has one hole and sphere Figure 1.1(a) is genus-0 since it has no hole [8].

5

kkose

Cross-Out

kkose

Inserted Text

called

kkose

Inserted Text

the

kkose

Cross-Out

kkose

Replacement Text

then

kkose

Cross-Out

kkose

Inserted Text

,

kkose

Cross-Out

kkose

Replacement Text

The connection of two prisms

kkose

Inserted Text

a

kkose

Cross-Out

kkose

Cross-Out

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

Connectivity of a mesh is a quadruple (V,E,F,Q) where Q is the incidence

relation [9]. Relationships between mesh elements can be found using the con-

nectivity information in Euler equation [8]. The Euler characteristics κ of a mesh

can be calculated using :

κ = v − e + f, (1.1)

where v, e, f are the number of vertices, edges and the faces of a mesh respectively.

The Euler characteristics of a mesh depends on the number of shells, genus and

boundaries of the mesh. For closed and manifold meshes, the Euler characteristic

is given as :

κ = 2(s− g), (1.2)

where s is the number of the shells and g is the genus number.

Simple meshes are meshes that are homeomorphic to a sphere which means

topologically they are the same. Homeomorphism is a function between two

spaces which satisfies the conditions of having bijection, being continuous and

having a continuous inverse [10]. In other words, homeomorphism is a continuous

stretching and bending of the mesh into a new shape.

Each triangle of a simple mesh has exactly three edges and each edge is

adjacent to exactly two triangles. This leads us to :

2e = 3f. (1.3)

Substituting Equation 1.3 in to Equation 1.1 we obtain :

6

v − e + f = 2, (1.4)

v − 1

2f = 2, (1.5)

v

f=

2

f+

1

2. (1.6)

For large meshes the assumption of Equation 1.7 can be made. Considering

that each edge is connected to its two end vertices, the average valence for large,

simple meshes can be calculated by averaging the sum of the valence vali of each

individual vertex vi where i ∈ Z, as in :

f = 2v and e = 3v, (1.7)

1

v

v∑

i = 1

vali ≈ 2e

v= 6. (1.8)

Basically a mesh is a pair M = (C,G) where C represents the mesh connec-

tivity information and G represents the geometry (3D coordinates). Connectivity

information is closely related with the mesh elements whose adjacency and in-

cidence informations are important for navigating in the mesh. Different mesh

manipulation and compression algorithms depend on different properties of the

mesh. Therefore representing the mesh in an appropriate way so that it can be

easily used by those algorithms is also essential.

For example, a connectivity compression algorithm EdgeBreaker [11], tra-

verses faces of the mesh to code the connectivity data of the mesh. Therefore

usage of a data structure which enables easy access to adjacency information

of the mesh, would increase the speed of the algorithm. Thus, choosing an ap-

propriate data structure, to be used with the algorithms is also an important

performance issue. Different edge based data structures such as halfedge data

structure [12] and winged edge data structure [13] are discussed in [14]

7

kkose

Inserted Text

information

kkose

Inserted Text

, it is essential to represent

kkose

Cross-Out

kkose

Cross-Out

kkose

Inserted Text

en

kkose

Inserted Text

,

kkose

Cross-Out

kkose

Inserted Text

that can be used

kkose

Inserted Text

choosen

kkose

Cross-Out

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

1.2 Compression

Compression is selection of a more convenient description for transmitting the

information using the smallest data size. For lossless transmission the best that

can be done is to reach a near entropy bit-rate. For lossy transmission the

situation is more complex. For different distortion levels of the data, different

entropies can be found. Thus different lower bounds for data compression exists.

The idea is to transmit the information using smallest amount of information

size by which the original data can be reconstructed in the predefined distortion

bounds.

Mostly data and information are confused with each other. Data is the means

by which the information is conveyed. However various data sizes can be used to

transmit the same amount of information. It is just like two person describing

the same scene in two different ways. One may tell it is a “forest” and the

other may tell “several trees enumerated near each other”. Data compression is

the process of reducing the data size required to represent a given quantity of

information [15]. For example a 256 color leveled 1024 x 1024 sized image which

has linearly independent uniformly distributed pixel color values, would have a

data size of 8 MB. However, if the image has only one color, it can be coded

using 8 bits for its color information and 20 bits for its size information which is

a total of nearly 3 Bytes. For this image, the remaining data restates the already

known information that can be named as data redundancy.

1.2.1 Compression and Redundancy

There exists several types of redundancies for different types of data. For ex-

ample in digital image compression, three basic types of redundancy are coding,

interpixel and pyschovisual redundancy. For the proposed 2D representations of

8

kkose

Inserted Text

the

kkose

Cross-Out

kkose

Note

Marked set by kkose

kkose

Note

Marked set by kkose

meshes all these redundancies are valid except interpixel redundancy which left

its place to redundancies between vertex positions (geometry) and connectivity.

Explanation of coding redundancy comes from the probability theory. If a

symbol is more probable than another one, it should be represented using less

number of bits. Histograms are very useful for this purpose since they show how

many times a value is taken in the data. Normalized versions of histograms with

infinite number of samples converges to probability density functions with the

ergodicity assumption. So histograms with finite number of samples gives a very

good insight about the probability of a specific symbol. Probability of each level

in an L color image with n pixel is :

p(k) =nk

n, k = 0, 1, ..., L− 1, (1.9)

where nk is the number of times that kth gray level appears in the image, and n is

the number of image pixels. The Average Code Length (ACL) can be calculated

using :

ACL =L−1∑

k=0

l(k)p(k), (1.10)

where l(k) is the code length of the level k.

To minimize the total code length, average code length must be minimized.

Using different symbol code lengths and giving shorter codes to the symbols with

higher probability will minimize the average code length.

Lower variance in the histogram leads to better compression as the data is

more predictable. The histogram of a single colored image has only one value at

one specific color level so the whole data can be predicted from that pixel very

well. Another example can be coding sentences in English. The ’e’ letter is the

9

kkose

Cross-Out

kkose

Note

Marked set by kkose

most frequently used letter in English so assigning it a shorter symbol than the

other letters, gives us smaller coded data size.

Pyschovisual redundancy comes from the imperfection of the human visual

system. The difference between the positions of two points can not be perceived

after some distance. So quantizing coordinates of points in 3D space do not

change the visualization of the data.

In images neighboring pixels are not mostly independent from each other.

This means there is a correlation between them. At the low frequency parts of

the image this correlation is even more stronger. This is also true for meshes. But

the situation is a little bit different. The neighboring concept is dependent on not

only being near to each other but also being connected to each other. Connected

pixels can be predicted from each other more reliably. Like the images, at the

low pass parts of the object (smooth parts of the object) this correlation is more

stronger.

1.3 Related Work on Mesh Compression

There are several mesh compression algorithms in the literature [16, 8, 17]. They

can be classified as single-rate compression and progressive mesh compression.

This classification can be further extended to sub groups called, connectivity and

geometry compression. In most of the algorithms the connectivity information

is exploited to more efficiently compress the geometry information of the mesh.

Compression is not the only way of reducing the size of a mesh. Methods

like simplification and remeshing are also used for this purpose. In this section

those concepts will also be mentioned in the context of compression. The aim

of compression that is mentioned here is to transmit a mesh from one place to

another using as small information as possible. To achieve this aim, connectivity

10

is encoded in a lossless manner and geometry is encoded in either lossless or lossy

manner. Due to the quantization of the 3D positions of the vertex points, most

of the geometry encoders are lossy.

Remeshing is a popular method for converting an irregular mesh to a semi-

regular or regular mesh. The idea in remeshing is to represent the same object

with more correlated connectivity and vertex position information while causing

the least distortion. This makes the new mesh more compressible since many

of the vertices can be predicted from its neighbors. It can be thought of as

regularizing the mesh data. However, remeshing is a complex task.

Single rate encoders compress the whole mesh into a single bit stream. The

encoded connectivity and the geometry streams are meaningless unless the whole

stream is received by the decoder. In the context of single rate encoders, there

exists several algorithms like, triangle strip coding [3], topological surgery [18],

EdgeBreaker [11], TG coder [19], valence-based coder of Alliez and Desbrun [16],

Spectral coding [20], geometry images [5]. They are also used in compression of

the base meshes of progressive representation. Single rate mesh compression will

be dealt in more detail at 1.3.1

Progressive compression has the advantage of representing the mesh at multi-

ple detail levels. A simplified version of the original model can be reconstructed

from some initial chunks of coded stream. The remaining chunks will add refine-

ments to the reconstructed model This is an important opportunity for environ-

ments like Internet or scenes with multiple meshes where impression of existence

of the object is more important then its details.

Progressive mesh representations [21] store a 3D object as a coarse mesh and

a set of vertex split operations that can be used to obtain the original mesh. Pro-

gressive mesh representations use simplification techniques such as edge collapse

11

and vertex removal to obtain the coarser versions of the original mesh. Subdivi-

sion technique can also be regarded as a progressive mesh representation [22, 46].

In [24, 2], multiresolution approaches for mesh compression using wavelets are

developed. Progressive mesh compression will be dealt in more detail at 1.3.2

Besides static meshes, there exists animation of meshes called dynamic

meshes. Some of the static and progressive mesh compression algorithms are

used for encoding them. Besides intra-mesh redundancies those mesh sequences

have inter-mesh redundancies. Each mesh can be thought as a frame of an

animation and the redundancies between frames should be exploited for more

efficiently compressing dynamic meshes. Several algorithms like, Dynapack [25],

D3DMC [26, 27], RD-D3DMC [28, 27] exists for encoding dynamic meshes.

Uncompressed mesh data is large in size and contains redundant information.

Each triangle is specified by three vertices which are 3D points in R3 and some

attributes like surface normals,color,etc. If the vertex coordinates are quantized

to 4 byte floating point values, a vertex needs 12 bytes only for its geometry

information. From equation 1.8 it is known, that each vertex is connected to 6

triangles. If a triangle is represented by the data of its three vertices, each vertex

would be transmitted 6 times. This means a data flow of 72 bytes per vertex.

To decrease this data flow, more efficient representations of the mesh should be

used. Encoding the mesh is one of the best way of reducing this data flow.

1.3.1 Single Rate Compression

A 3D mesh model is basically composed of two data namely, Connectivity and

Geometry. So two types of compression for those two data is given; Connectivity

Comprssion and Geometry Compression.

12

Connectivity Compression

In the connectivity information each vertex is six times repeated in average. One

simple way to reduce the number of vertex transmission is creating triangular

strips from the mesh (Figure 1.2). In this method, the initial triangle is described

using its three vertices. The next triangle which is a neighbor of the current one,

will be formed using the two vertices from the joined edge and a new vertex. So

for each triangle of the strip, one vertex will be transmitted. From Equation 1.7

it is known that in large simple meshes there exist twice as many faces as the

vertices. So if triangle strip method is used, each vertex will be transmitted

twice.

Figure 1.2: A triangle strip created from a mesh.

Deering [3] proposed an idea of using a vertex buffer to store some of the

transmitted vertices in order not to transmit each vertex twice. The new triangle

in the strip either introduces a new vertex which will be pushed in the vertex

buffer or a reference to the vertex buffer. This gives the algorithm the advantage

of re-using the transmitted vertices.

For achieving this, Deering proposed a new mesh representation called gen-

eralized triangle mesh. The mesh is converted into an efficient linear form using

which the geometry can be extracted by a single monotonic scan over the vertex

array. However references to used vertices are inevitable. Buffering solves the

problem to some extend. Deering used a vertex buffer with size 16.

13

A triangle strip is a part of triangle spanning tree (Figure 1.3(a)) whose nodes

are the faces of the mesh and edges are the adjacent edges of the mesh faces. This

is the dual of the connectivity graph whose nodes are the vertices (Figure 1.3(b))

and edges are the edges of the mesh. Taubin proposed Topological Surgery [18]

algorithm to encode the connectivity of a mesh using those two trees. Both the

triangle and the vertex spanning trees are encoded and sent to the receiver. The

decoder at the receiver first decompresses the vertex spanning tree and doubles

each of its edges (Figure 1.3(c)). After decoding the triangle spanning tree the

resulting triangles are inserted between pairs of doubled edges of the vertex

spanning tree.

(a) (b) (c)

Figure 1.3: Spanning trees in the triangular mesh strip method.

Region growing is another approach for connectivity coding. Rossignac’s

EdgeBreaker [11] is one of the examples for this approach. In [11] a region

growing starting with a single triangle is done. The grown region contains the

processed triangles. The edges between the processed and to be processed region

is called the cut-border. A selected edge in the cut border which defines the next

processed triangle, is called the gate. Those mentioned elements can be seen

in Figure 1.4.

Each next triangle can have among five possible different orientations with

respect to the gate and cut-border. Each orientation has its own opcode. By this

way each processed triangle is associated with an opcode. This iterative process

goes on until no unprocessed triangles left in the mesh. Figure 1.5

14

Figure 1.4: Opcodes defined in the EdgeBreaker.

Figure 1.5: Encoding example for EdgeBreaker.

Since the decoder knows how the encoder coded the mesh connectivity, it can

reconstruct the data by inverting the operations done by the encoder. Figure 1.6

shows decoding of the mesh in Figure 1.5. Split operation is a special occasion

while decoding. Each split operation has an associated end operation which

finishes the cut-border introduced by the split operation. The length of each

run starting with a split is needed while decoding the symbol array. Therefore

EdgeBreaker needs two runs over the symbol array.

This problem is solved by Wrap&Zip [29] and Spiral Reversi [30] which are

extensions on EdgeBreaker. Wrap&Zip solves the problem by creating dummy

vertices for split operations while encoding the mesh (wrap) and identifying them

in the decoding phase (zip). Spiral Reversi decodes the symbol stream starting

from the end. By this way the decoder first finds the end opcodes and then

associated split opcodes.

15

Figure 1.6: Decoding example for EdgeBreaker.

In [19], Touman and Gotsman also introduced a region growing algorithm.

But instead of coding the relation between gates and the triangles, they code the

valences of the processed vertices. If the variance of the valences is small in the

mesh it can be compressed efficiently. From Eq. 1.8 we know that, vertices of

large meshes have an average valence of six. Also by remeshing, irregular meshes

can be converted into semi-regular or regular meshes whose valences are mostly

six. So this method is very efficient on regular and semi-regular meshes.

The TG coder first connects all the boundary vertices of the mesh with a

dummy vertex as seen in Figure 1.7. Instead of selecting a starting triangle like

EdgeBreaker, TG coder selects a starting vertex and outputs its valence with two

other vertices of a triangle incident to the starting vertex. The triangle is marked

as conquered and its edges are added to the cut-border. The concurrence of the

incident triangles are iterated counter-clockwise around the focus vertex. When

all the triangles incident to the focus vertex are conquered, the focus moves to

the next vertex along the cut-border. This iterates until all the triangles are

16

conquered. If dummy vertex becomes the focus vertex, the special symbol for

dummy is output together with the valence of the dummy vertex.

17

Figure 1.7: TG Encoding of the same mesh as Edgebreaker. The resulting code-word will be [8,6,6,4,4,4,4,4,6,4,5,Dummy 13,3,3].

18

A split situation can occur in this algorithm, too. It can arise when the cut-

border is split into two parts. In this situation one of the cut-borders is pushed

into the stack. The number of edges along the cut-border must also be encoded.

The split situation is an unwanted event since it complicates the encoding

process. To reduce the number of the split situations, in [16] Alliez and Desbrun

proposed an improved version of the TG coder. They approach the problem with

the assumption of, “split situations are tend to arise in convex regions”. So it is

more reasonable to select focus vertices from concave regions instead of convex

regions. While choosing the focus vertices, the algorithms pays attention to how

many free edges does the vertex is neighboring. The vertex with minimum free

edges is chosen. If there exists a tie between two vertices, number of the free

edges of their neighbors will be taken into account.

Geometry Compression

The geometry data has a floating point representation. Due to the limitations

of the human visual perception, this representation can be restricted to some

precision which is called quantization. Nearly for all the mesh geometry codes,

quantization is the first step of compression. The early works uniformly quantize

each coordinate value separately in Cartesian plane. Also vector quantization of

the mesh geometry is proposed in [31]. Karni and Gotsman also demonstrated

the need for applying quantization on spectral coefficients [20] .

Sorkine et al. proposed a solution to the problem of minimizing the distortion

in visual effects due to the quantization [32]. They quantize the vertices that are

on the smooth parts of the mesh more aggressively. The basis behind this idea

is the fact that human visual system is more sensitive to distortion in normals

than geometric distortion. So, they try to preserve the normal variations over

19

the surfaces. In Figure 1.8 an illustrative comparison between quantization in

Cartesian coordinates and delta-coordinates can be seen.

Figure 1.8: The delta-coordinates quantization to 5 bits/coordinate (left) in-troduces low-frequency errors to the geometry, whereas Cartesian coordinatesquantization to 11 bits/coordinate (right) introduces noticeable high-frequencyerrors. The upper row shows the quantized model and the bottom row uses colorto visualize corresponding quantization errors. (Reprinted from [32])

The prediction of the quantized vertices is a commonly used technique in

mesh compression. To predict pixels from its neighbors is a well known tech-

nique in image processing. In some sense, the same approach is used in mesh

compression. While traversing the mesh, position of the vertices are predicted

from the formerly processed vertices. The prediction error, which is the difference

between the predicted position and the real position, is coded. The prediction

error tends to be a smaller value than the real position.

There exists several schemes to predict the vertex positions.The most known

ones are; linear prediction using the weighted sum of the previous vertices (in

20

the order given by the connectivity coder) shown in Figure 1.9, parallelogram

prediction shown in Figure 1.10, and multi-way prediction in [33].

v2

v1

v3

Vertex Coordinates are: y

x

[−5,−1],[−3,2],[3,1],[5,−4]

v4

(a)

v2

v1

v3

New Vertex Coordinates are: y

x

[−5,−1],[2,3],[6,−1],[2,−5]

v4

v2

v1

v3

New Vertex Coordinates are: [−5,−1],[2,3],[4,−4],[−2,−1]

y

x

v4

(b) (c)

Figure 1.9: Linear prediction.

Touma and Gotsman introduced the Parallelogram prediction together with

their TG coder [19]. This type of prediction is currently one of the most popular

one. The basis of the idea is that, each edge has two incident triangles. So the

position of a vertex v4 can be predicted using the vertices of the neighboring

triangle, v1,v2,v3 with respect to the opposing edge of v4. Vertices v2,v3 also

composes an edge which is adjacent to both mentioned triangles. Figure 1.10

gives an illustration of parallelogram prediction algorithm. The vertex v4 is

predicted using :

v4 = v2 + v3 − v1, (1.11)

21

and the error can simply be found by subtracting the predicted value from the

actual position as:

e = v4 − v4. (1.12)

e

3v

2v

v4

v1

v

4^

Figure 1.10: Parallelogram prediction.

In linear and parallelogram prediction schemes the vertex is predicted from

one direction. This kind of prediction is not efficient for meshes with creases and

corners. In [33] a prediction scheme, which uses the neighborhood information

of the vertices to make predictions of the vertex positions, is proposed. In that

scheme a vertex is predicted from all of its connected neighbors. The connected

neighbors of a vertex can be found using the connectivity information of the

mesh.

Another approach in geometry coding is, adapting the idea of spectral cod-

ing [20] to 3D signals so they can be used in compression of meshes. The idea

is finding the best representatives for the mesh and code them. It is just like

transforming an image into another domain and take the parts that has the most

information. A known example of this kind of coding is DCT or wavelet coding

22

of the images. More attention is paid to the low frequency parts since they rep-

resent the image much better than the high frequency parts . Using the same

convention, basis functions that represents the signal best are found.

For geometry, the eigenvectors of the Laplacian matrix corresponds to the

basis functions. The Laplacian L of a mesh is defined using diagonal valence

matrix VL and adjacency matrix A. The Laplacian L can be calculated as :

L = VL−A, (1.13)

L = UDUT , (1.14)

V = UTV. (1.15)

Also Figure 1.11 shows L, VL and A for a simple mesh.

2

5

3

4

1

0 0 0 034 0 0 000 4 0 000 0 4 000 0 0 30

1 1 001 1 11

1 0 1 111 1 0 111 1 1 00

10

(a) (b) (c)

Figure 1.11: (a) A simple mesh matrix; (b) its valence matrix; and (c) its adja-cency matrix.

Using Equation 1.14 eigenvectors of L can be evaluated. U contains the

eigenvectors of L sorted with respect to their corresponding eigenvalues. The

geometry is represented using a v by 3 matrix V where v is the number of the

vertices of the mesh. V is projected into the new basis by Equation 1.15. The

23

rows of V that are corresponding to low eigenvalues. Thus, they can be skipped

and remaining coordinates can be encoded efficiently.

This representation also allows for progressive transmission. First important

coordinates are coded first and send. Then rows corresponding to smaller eigen-

values are encoded and send. A low resolution mesh can be reconstructed from

the first stream. Then the mesh can be refined as the lower priority rows arrive.

However there exists a disadvantage of using spectral coding. For larger ma-

trices, decomposition in Equation 1.14 runs into numerical instabilities because

many eigenvalues tend to have similar values. Also in terms of computation time

calculating Equation 1.14 becomes an expensive job as the mesh grows.

Not only spectral coding but also the other mesh coding algorithms sometimes

can not handle massive datasets like the model of David of Michelangelo. Those

datasets should be partitioned into small enough meshes called in-core, in order to

be compressed efficiently. In [35] Ho et al proposed an algorithm that partitions

the massive datasets into small pieces and encode them using EdgeBreaker and

TG. Also there is an overhead for stitching information exists. %25 increase

with respect to the small mesh compression of the same tools. In [36] Isenburg

and Gumbold made several improvements over [35] like avoiding to break the

mesh,decoding the entire mesh in a single pass, streaming the entire mesh through

main memory with a small memory foot-print. This is achieved by building new

external memory data structures (the out-of-core mesh) dedicated for clusters of

faces or active traversal fronts which can fit in this cores.

Besides cutting and stitching, some of the algorithms transform the meshes to

other data structures and encode those new data structures. Formly mentioned

Spectral coding and Geometry images [5] are examples to this type of algorithms.

The basic idea in [5], is finding a parameterization of the 3D mesh to a 2D image

and use image coding tools to code this image. To parameterize the mesh, it cut

24

along some of its edges and becomes topologically equivalent to a disk. This is

the most challenging task of the algorithm. Finding those cuts is not a straight-

forward think to do.

(a) (b)

(c) (d)

Figure 1.12: Geometry images: (a) The original model; (b) Cut on the meshmodel; (c) Parameterized mesh model; (d) Geometry image of the original model.(Reprinted from [5]Data Courtesy Hugues Hoppe)

Parameterization domain is the unit square (2 pixel by 2 pixel). The pixel

values of the parameterized mesh corresponds to points on the original mesh.

Those points may either be vertex locations or surface points (points in triangular

faces.). X,Y,Z coordinates of the points are written to the RGB channels of

the image (Figure 1.12). Also a normal map image which defines all normals

for the interior of a triangle, is stored. A geometry image is decoded simply by

25

drawing 2 triangles for each unit square and taking the RGB values as the 3D

coordinates. Using the normal map the new mesh can be rendered. But the

original mesh cannot be reconstructed. The decoded mesh will become like a

remeshed version of the original one not exactly the original.

Remeshing is the process of approximating the original mesh using a more

regular structure. There exists 3 types of meshes in this sense; Irregular, semi-

regular, regular meshes. Most of the algorithms in the literature are adapting

to the uniformity and regularity of the meshes [37]. So converting meshes into

regular structures will bring the opportunity of more efficiently compressing the

meshes. For example valence based compression approach in [16] codes regular

meshes very efficiently since in a regular most the vertices has a valence of 6.

But remeshing is a hard task to accomplish. In [38], Szymczak introduced

an idea of partitioning the mesh and resampling the partitioned surface. The

resampled surface is retriangulated referencing the normals of the original mesh

surface. Here both partitioning and retriangulation are computationally expen-

sive and non-trivial tasks to do.

1.3.2 Progressive Compression

Lossless Compression Techniques

The basic idea behind Progressive Mesh (PM) compression is to simplfy a mesh

and record the simplification steps. So the simplification can be inverted using

the recorded information. PM coding is first introduced by Hoppe in [21]. He

defined operation called edge collapse and vertex split. Figure 1.13 show the

illustrates these operations.

An edge collapse operation merges two vertices incident to the chosen edge

and connects all the edges connected to those vertices to the merged vertex. The

26

v2v4

v3

v2

v1

v4

v3

v1

v5Vertex Split

Edge Collapse

Figure 1.13: Edge collapse and vertex split operations are inverse of each other.

vertex split operation is the inverse of edge collapse. After a sequence of edge

collapses, a simplified version of the original mesh is established (Figure 1.14(c)).

One of the single rate coder can be used to encode this simplified mesh. The

receiver side first decodes the simplified mesh and then uses the coming informa-

tion to inverse the edge collapses by vertex split operation. Merged vertices are

separated and those new vertices are connected to their neighbors

The selection of the edge to be collapsed next is an important issue since it

affects the distortion of the simplified mesh. Hoppe[21] uses an energy function

which takes into account the distortion that will be created by collapsing the

edge. An energy value is assigned to each vertex. Incrementally an edge queue

is created by sorting those energy values. The selection of the next edge to be

collapsed is done according to this queue.

The progressive simplicial complexes approach in [39] extends the PM algo-

rithm [21] to non-manifold meshes. Popovic and Hoppe [39] observed that using

edge collapses resulting in non-manifold points, gives a lower approximation er-

ror for the coarse mesh. So they generalized the edge collapse method to vertex

unification operation. Contrary to edge collapse, in vertex unification, the uni-

fied two vertices may not be connected by an edge. Arbitrary two vertices from

27

(a) (b) (c)

(d) (e)

Figure 1.14: Progressive representation of the cow mesh model. The modelreconstructed using (a) 296, (b) 586, (c) 1738, (d) 2924,and (e) 5804 vertices.

the mesh can be unified. Inverse operation is called generalized vertex split. The

method can simplify arbitrary topology but has a higher bitrate than the PM

algorithm since it has to record the connectivity of the region collapsed while

unifying two vertices.

Another approach that is based on the PM method is introduced in [40] called

progressive forest split(PFS). Contrary to the PM method, between to successive

levels of detail there exists a group of vertex splits in PFS. Due to the split of

multiple vertices in the mesh, cavities may occur in the mesh. Such cavities

are filled with triangulation. The positions of newly triangulated vertices are

corrected using translations. So each PFS operation encodes the forest structure,

triangulation information and vertex position transitions. The PFS method is

part of the MPEG-4 version 2 standard.

Compressed progressive meshes (CPM) method introduced by Pajarola and

Rossignac is similar to the PFS method because it also refines the mesh using

28

multiple vertex splits called batches. But contrary to the PFS method in the

CPM method cavities do not occur since the vertices who will result in new

triangles after vertex split, are put in the batch. Connectivity compression is

also easier in the CPM method, because of lack of cavities and as a result no

triangulation information is needed. Transition information in the PFS method is

replaced with prediction information. Position of the new vertices are predicted

from their neighbors using butterfly scheme.

(a) (b)

(c)

Figure 1.15: Progressive forest split. (a) initial mesh; (b) group of splits; (c)remeshed region and the final mesh.

Techniques mentioned so far are all the PM method based approaches. Alliez

and Desbrun proposed in [41] a method which uses the valence information of

the vertices of the mesh. From Equation 1.8 it is known that average valence

29

value of a mesh is 6. They observe that the entropy of the mesh is closely related

with the distribution of this valence. Their proposed algorithms has two parts:

decimation conquest and cleaning conquest.

Figure 1.16: Valence based progressive compression approach.

The decimation conquest first subdivides the mesh into patches. Each patch

consists of triangles which are incident to a vertex as shown in Figure 1.16 (a).

Then the encoder enters the patches one-by-one, removes the common vertex

and re-triangulates the patch as shown in Figure 1.16(b-c). The valence of the

removed vertex is output. This is applied to all the patches in the mesh. Then

the cleaning conquest decimates the remaining vertices with valence 3. This

30

algorithm preserves the average value around 6. The mesh geometry is encoded

using barycentric prediction, and the prediction error is coded.

Lossy Compression Techniques

The basic idea behind lossy mesh compression is existence of another surface

which is a good approximation of the original one and more capable for compres-

sion. In lossy compression the original geometry and the connectivity informa-

tion is lost. The distortion between the original and its representative models is

measured as the geometric distance between surfaces of those.

Multiresolution analysis and wavelets are the key methods in lossy mesh

compression algorithms. They allow to decompose a complex surface into coarse

representations together with refinements stored in the wavelet coefficients. Sur-

face approximations at different distortions levels can be obtained by discarding

or quantizing wavelet coefficient at some level. Since the mesh is decomposed

into more energetic low frequency (multiresolution mesh levels) and low variance

high frequency (wavelet coefficients) parts, an more compact compression can be

achieved.

Wavelets are known as signal processing tools on cartesian grids like, audio,

video, and image signals. Lounsberry et al. in [2] extended the wavelets to 2-

manifold surfaces of arbitrary type. Due to their non-regularity, it is impossible

to adapt wavelet transform to irregular meshes [42]. However, by creating semi-

regular meshes using remeshing process, the mesh can be made regular. On this

regular structure an extension of the wavelet transform can be used.

Here the remeshing process is implemented by subdivision. The algorithm

starts with the coarse representation of the original model called base mesh.

Each triangle of the base mesh subdivided into 4 triangles by creating vertices

on the edges of the face. The positions of those vertices represent again surface

31

samples of the original model. So the subdivided coarse mesh becomes more and

more similar to the original model. However, now the new representation is a

semi-regular mesh whose connectivity is regular.

After obtaining the semi-regular structure, the model is now ready for pro-

gressive compression. The idea in multiresolution mesh analysis is to use wavelet

transform between two resolution levels to code the prediction errors in the ver-

tex locations. While going from the fine to course resolution the positions of

the deleted vertices are predicted using subdivision. The difference (ve) between

the predicted(vp) and the decimated vertex position (v) is stored as wavelet

coefficients as follows :

ve = vp − v, (1.16)

where v,ve,vp ∈ R3.

The process is similar to the high-pass filtering since the wavelet coefficients

store the details of the mesh vertex positions. In the coarse to fine transition,

first the vertex positions are predicted using subdivision and then repositioned

using the wavelet coefficients. Discarding some of the wavelets to achieve better

compression rates is possible. However, it is obvious that this brings distortion

to the reconstructed model.

The third important part in progressive compression of the meshes is the

coding of the base mesh and wavelet coefficients. As mentioned base mesh coding

can be done using one of the single rate mesh coder. Wavelet coefficients are

coded using entropy coders. Zero-tree coders would efficiently code this type of

data since the coefficient tend to decrease from coarse to fine levels.

Khodakovsky in [43] proposed a lossy progressive mesh compression approach

that uses MAPS [45] algorithm to remesh the original model a semi-regular mesh.

32

In [43] Loop scheme [34] is used to predict the vertex positions. In this algorithm

the wavelet coefficients are 3D vectors as seen in Equation 1.16.

In [44] Khodokovsky and Guskov used normal meshes [23] and NMC wavelet

coder to encode meshes. NMC coder uses normal meshes to encode the wavelet

coefficients. Wavelet coefficients are shown as scalar offsets (normals) in the

perpendicular direction relative to the face. So wavelet coefficients become scalar

values instead of 3D vectors. They used unlifted butterfly scheme [47] as predictor

as it is also used while normal remeshing.

Maron and Garcia proposed a method that generates the base mesh using

Garland’s quadratic-mesh simplification [48]. They make predictions using the

butterfly scheme in [47]. The prediction errors are coded as wavelet coefficients.

Since the normal component of the wavelet coefficients carry more information

than the tangential components, wavelet coefficients are finely quantized in nor-

mal direction and coarsely quantized in tangential direction. A bit-wise inter-

leaving of three detail components gives a scalar value so that a stream of scalar

values is generated. This stream is coded using the SPIHT algorithm [49].

1.4 Contributions

This thesis proposes a framework to compress 3D models using image compres-

sion based methods. In this thesis two methods are presented:

• a projection method to transform meshes into images objects,

• a connectivity based adaptive wavelet transformation (WT) scheme that

can be embedded into a known image coder and this WT scheme is used in the

coding of image objects.

33

The proposed projection method is computationally easier than the cutting

and parameterizing the mesh proposed in [5]. The proposed connectivity guided

adaptive wavelet transform newly defines the pixel neighborhoods. Thus, it is

more efficient than using ordinary wavelet transform schemes on the projection

images

1.5 Outline of the Thesis

In the second chapter of this thesis, the proposed algorithm is explained in detail.

The components of the algorithm include the like projection operation, connec-

tivity guided adaptive wavelet transform, Set Partitioning in Hierarchical Trees

(SPIHT), JPEG2000, map coding and reconstruction are explained. The main

idea is to find a way to use image coders for mesh coding and embed adaptiveness

to image coder so that better bit rates are achieved. In this thesis, this aim is

achieved by using adaptive versions of the SPIHT and JPEG2000 image coders.

In the third chapter, mesh coding results of our algorithm and comparison

with the algorithms in literature will be presented. Chapter 4 involves the con-

clusions related to this thesis.

34

Chapter 2

Mesh Compression based on

Connectivity-Guided Adaptive

Wavelet Transform

This chapter is composed of several sections dealing with different parts of

the mesh coding algorithm. These sections include mesh to image projection,

connectivity-guided adaptive wavelet transform, etc. The combination of all of

those parts results in a complete mesh coder which can encode 3D models using

any wavelet based image coders with minor modifications.

The proposed mesh coding algorithm introduces an easy way of converting

meshes to image like representations so that 2D signal processing methods be-

come directly applicable to a given mesh. After the projection, the mesh data

becomes similar to an image whose pixel values are related to the respective

coordinates of the vertex point to the projection plane.

The mesh data on a regular grid is transformed into wavelet domain using an

adaptive wavelet transform. The idea of adaptive wavelets [50] is a well known

and proved to be a successful tool in image coding. Exploiting the directional

35

neighborhood information between pixels, adaptive wavelet transform beats its

non-adaptive counterparts. Thus, instead of using non-adaptive wavelet trans-

form we come up with the idea of defining an adaptive scheme so that neighbor-

hood information of the vertices can be better exploited.

2.1 3D Mesh Representation and Projection

onto 2D

A mesh can be considered as an irregularly sampled discrete signal. This means

that the signal is only defined at vertex positions. In the proposed approach, the

mesh is converted into a regularly sampled signal by putting a grid structure in

space; corresponding to resampling the space with a known sampling matrix and

quantization.

2.1.1 The 3D Mesh Representation

The 3D mesh data is formed by geometry and connectivity information. The

geometry information of the mesh is constituted by vertices in R3. The 3D-

space, where the geometry information of the original mesh model is defined, is

X′ = (x′, y′, z′)T ∈ R3. The vertices of the original 3D model is represented

by v′i = (x′i, y′i, z

′i)

T , i = 1, . . . , v where v is the number of the vertices in the

mesh.

First, the space that we use is normalized in R3[−0.5, 0.5] by,

X = (x, y, z)T = αX′, α = (αx, αy, αz) ∈ (0, 1). (2.1)

This results to a normalization in the coordinates of the mesh vertices. The

normalized mesh vertices are represented as,

36

vi = (xi, yi, zi)T = (αxx

′i, αyy

′i, αzz

′i), i = 1, . . . , v. (2.2)

Connectivity information is represented as triplets of vertex indices. Each

triplet corresponds to a face of the mesh. Faces of the mesh can be represented

as,

Fi = (va, vb, vc)T , i = 1, . . . , f & a, b, c ∈ {1, . . . , v}, a 6= b 6= c (2.3)

where f is the number of the faces and v is the number of the mesh. So a mesh

in R3 can represented as,

M = (V,F) , (2.4)

where V is the set of vertices of the mesh and F is the set of faces of the mesh.

2.1.2 Projection and Image-like Mesh Representation

The 3D mesh representation is transformed into a 2D image-like representation

by projecting the mesh data onto a plane. In this thesis a projection is defined

as the transformation of points and lines in one plane onto another plane by

connecting corresponding points on the two planes with parallel lines [53]. This

is called Orthographic projection. The points of projection in our algorithm are

vertices of the mesh. The selected projection plane is defined as P(u,w). The

projection plane P is discritized using the sampling matrix S. Different sampling

matrices S, which are seen in Figure 2.1, can be defined as:

Srect =

T 0

0 T

, (2.5)

37

for rectangular sampling and

Squinc =

T T/2

0 T/2

, (2.6)

for quincunx sampling.

n2

(t, t)...

...

n1

T 00 T( )rectS =

(a)

n2

n1

(t/2, t/2)

...

...

T T/20 T/2( )quincS =

(b)

Figure 2.1: (a) 2D rectangular sampling lattice and 2D rectangular samplingmatrix (b) 2D quincunx sampling lattice and 2D quincunx sampling matrix.

Due to its shape quincunx sampling lattice makes better approximations of

the projected mesh vertices. But implementation of the rectangular is more

straight-forward and computationally easy. After projection operation, the grid

points sampled using quincunx sampling lattice must be transformed to a rect-

angular arrangement since the pixels of the image-like representation lined up in

a rectangular manner. The projection of a 3D mesh structure mainly depends

on two parameters of the vertex; coordinates of the vertex and the perpendicular

distance of the vertex to the selected projection plane.

Let vi, which is a 2D vector, be the projection of vi onto the plane P. Further-

more let di be the perpendicular distance of vi to the plane P. The illustration

of the projection operation can be seen in Figure 2.2. A simple 3D mesh of

a rectangular prism with its vertices on its corners is projected onto a plane.

38

In the Projection Image, the pixel positions are determined using the respec-

tive positions of the vertices to the projection plane. The values of the pixels

are determined using the perpendicular distance of the vertices to the projec-

tion plane. Thus, using different projection planes creates different image-like

representations as seen in Figure 2.3.

Figure 2.2: The illustration of the projection operation and the resulting image-like representation.

The determination of vertex-grid point correspondence is the most crucial

task in the projection operation. The vertices that can be assigned to a grid

point n = [n1, n2] forms a set of indices J defined by:

J =

{i |vi − S nT| < T/2 ∀ n1, n2

}(2.7)

39

(a) (b)

Figure 2.3: Projected vertex positions of the mesh; (a) projected on XY plane;(b) projected on XZ plane.

where S is a sampling matrix and n = [n1, n2] represent the indices of the discrete

image as shown in Figure 2.2. Sampling matrix S determines the distance be-

tween neighboring grid points, and can be defined as a quincunx or a rectangular

sampling matrix [1].

Then the 3D mesh is transformed to two 2D image-like representations. The

first image stores the perpendicular distances of the vertices to the respective

grid points on the selected planes as :

I1[n1, n2] =

di, i ∈ J,

0, otherwise.(2.8)

The second image holds the indices of the vertices as follows :

I2[n1, n2] =

i, di = I1[n1, n2],

0, otherwise.

(2.9)

The first channel image is then wavelet transformed using our newly defined

Connectivity-Guided Adaptive Wavelet Transform. The transformed image is

then encoded using SPIHT [49] or JPEG2000 [52]. The second channel image

is converted to a list of indices. This list is differentially coded and sent to the

other side.

40

Using Equations. 2.8 and 2.9 we try to find a pixel-vertex correspondence

for each vertex. However as seen from Equation 2.7, sometimes more than one

vertex have the same projection in the image-like representation. Thus, one

of the vertices is chosen for the calculation of the pixel value and the others

are discarded. By increasing the number of projection planes or decreasing the

sampling period by changing the sampling matrix S we can handle more vertices.

Both methods increase the reconstruction quality since they decrease the

number of lost vertices. However, they will lead to a decrease in the compression

ratio. In our approach we used one densely sampled plane. The recovery of the

lost vertices is handled by connectivity - based interpolation.

2.2 Wavelet Based Image Coders

Multiresolution is a very important concept in image processing. The main

idea of multiresolution methods is to analyze an image in different resolutions.

Wavelet transform is an efficient tool of multiresolution signal analysis. Using a

filterbank containing a high pass and a low pass filter, an approximation and a

residual part of an image is calculated, respectively. The approximation part of

the image has nearly the same pixel value distribution with the original image.

But the values of the pixels in residual part becomes concentrated around a mean

value with a small variance. This kind of signal is more suitable for encoding

since most of the information is now concentrated in a smaller pixel value region.

As the wavelet transform goes on new high pass parts are created thus, the pixel

value distribution changes and signal becomes more suitable for encoding.

Choice of the wavelet basis, number of decomposition levels and design of the

quantizer are very important issues in wavelet based image coders. Using complex

wavelet functions would slow down the process but the resultant coefficients are

less correlated.

41

Increasing the number of multiresolution decomposition level slows down the

process since more wavelet transform stages have to be computed during the

compression. But it leads to better compression results since as the decomposi-

tion level increases the number of the coefficients that can be coarsely quantized

increases, too.

Quantizer design is the most critical issue that affects the coding compression

and reconstruction error issues. Instead of using uniform quantizers, the nature

of the signal must be examined carefully, and a non-uniform quantizer which

takes care of the important coefficients should be used. Also adapting the size of

the quantization intervals from scale to scale results in a better rate-distortion.

In this thesis among several wavelet-based mesh coders two well known ones,

namely SPIHT and JPEG2000 are used. Zero-tree coding approach is very suit-

able for the compression of the proposed image-like representation. SPIHT is

chooses because it is one of the most successful algorithm that uses zero-tree

coding. JPEG2000 is chosen because it is a new technology image coder and

seems to give very good results on images.

2.2.1 SPIHT

In Discrete Wavelet Transform (DWT), a 1-D discrete signal is processed by a

filterbank having a complementary half-band low-pass and high-pass filter pair.

Outputs of the two filters are downsampled and two subsignals are obtained. The

high-band subsignal contains the first level wavelet coefficients of the original

signal corresponding to the normalized frequency range of [π/2, π]. The low-

band subsignal is processed using the same filterbank once again; the second

level wavelet coefficients covering the frequency range of [π/4, π/2] and the low-

low band subsignal covering [0, π/4] are obtained. In a typical application, this

process is repeated several times. Extension of the 1-D DWT to the 2-D signals

42

can be carried out in a separable and non-separable manner. In the separable

case, the 2-D signal is first processed horizontally row by row by the 1-D filterbank

and two subimages are obtained. These two subimages are then filtered column

by column by the same filterbank and, as a result, four subimages, low-low,

high-low, low-high and high-high subimages are obtained (the order of filtering

is immaterial). High-band subimages contain the first-level wavelet coefficients

of the original 2-D signal. The low-low subimage can be processed by the 2-D

filterbank recursively.

(a) (b)

Figure 2.4: (a) Original image (b) 4-level wavelet transformed image.

A 1-D DWT can be extended into a 2-D case in a non-separable manner and 2-

D signals in quincunx grids can be analyzed using the fast wavelet transform [54].

Figure 2.4 shows an image (a) and its bands of the 4-level wavelet transform

(b). The Embedded Zero-Tree Wavelet Coding (EZW) takes advantage of the

zeros in multi-scale wavelet coefficients or a 2-D signal [51]. Wavelet coefficients

corresponding to smooth portions of a given signal are either zero or close to

zero. Wavelet coefficients only differ from zero around the edges of a given image.

Based on this assumption, zeroing some of the small-valued wavelet coefficients

would not cause much distortion while reconstructing the image. EZW also takes

advantage of the relation between multiresolution wavelet coefficients obtained

using the 2-D wavelet tree.

43

A typical natural image is low-pass in nature. As a result most of the wavelet

coefficients are zero or close to zero except the coefficients corresponding to edges

and textures of the image. On the other hand the data in image-like signals that

we extract from meshes are sparse. Most of the grid points have zero values as

there are very few vertices around smooth parts of the mesh. Filtering the mesh

using a regular low-pass or high-pass filter of a wavelet transform spreads an

isolated mesh value in the wavelet subimages. One can use the lazy filterbank

of the lifting wavelet transform shown in Figure 2.5 in which the filter impulse

responses are trivial, hl[n] = δ[n] and hh[n] = δ[n− 1]. Therefore, using the lazy

filterbank has advantages over the regular filterbanks in mesh images. Using

the lazy filterbank rearranges the mesh image data into a form that EZW or

SPIHT can exploit. As in the case of a natural image, most of the high-band

wavelet coefficients turn out to be zero and there is a high correlation between

the high-band subimages.

Z−1

2

2

2 Z−1

+

2 I

org

I even

odd

IrecI

Figure 2.5: Lazy filter bank.

EZW, which is the predecessor of SPIHT, is also an algorithm based on

wavelet transform. An embedded code is a list of binary decisions. The orig-

inal signal is obtained from an initial signal according to the list of decisions.

The crucial aspect of embedded coding is that the decisions are ordered accord-

ing to their importance. This makes EZW a coder that produces progressive

compression-like coding.

44

In EZW the bitstream can be truncated anywhere according to the users’

needs. The longer the parts of the bitstream that are chosen, the closer the

reconstructed initial signal gets to the original signal.

A dependency exists between the wavelet coefficients, as shown in Figure

2.6 (a). This is the root-node hierarchy. The roots correspond to lower bands

and the nodes correspond to higher bands. In most cases, the coefficients in the

lower-level nodes are smaller than the coefficients in their roots. Figure 2.6 (b)

shows the tree structure of the same hierarchy.

Higher Subbands

SubbandsLower Higher Subbands

(a)

level 3

level 2

level 1

(b) (c)

Figure 2.6: (a) The relation between wavelet coefficients in EZW; (b) the treestructure of EZW; (c) the scan structure of wavelet coefficients in EZW.

When a coefficient in the root is smaller than a threshold, its nodes contain

even smaller values. Thus, this branch of the tree, which is called a zero-tree,

can be pruned and coded easily. EZW checks for zerot-rees in the transformed

image and codes them with a special symbol.

In principle, SPIHT carries out the same procedure. It takes the wavelet

transform of the image until a user-defined scale is reached. It creates three

lists by scanning the wavelet domain image in the order shown in Figure 2.6 (c).

These lists are List of Significant Pixels (LSP), List of Insignificant Pixels (LIP),

and List of Insignificant Sets (LIS). First, the coefficients in LIP are compared

to a chosen threshold. If the value of the coefficient is bigger than the threshold

45

it is put into LSP. The same procedure is carried out for the coefficients in LIS.

This is called the sorting pass.

In the refinement pass, the nth most significant bit of each entry in LSP is

transmitted except those included in the last sorting pass [55]. The details of

EZW and SPIHT can be found in [51, 56, 49].

2.2.2 JPEG2000 Image Coding Standard

Unlike its predecessor JPEG, JPEG2000 is not a DCT based compression ap-

proach. By using wavelets, JPEG2000 provides increased flexibility in both the

compression of the images and access to the compressed data. Any portion of

the image can be extracted without a need of extracting the whole compressed

data. Quantization of the transform coefficients are adapted to the scale and

subband of the transformed image. At last arithmetic coding takes place to code

the quantized coefficients.

The compression process starts with DC level shifting of the samples so that

the mean of the image becomes zero. Then the color channels are decorrelated

using RGB to YCbCr color transformation. After those the image is partitioned

into tiles. Tiles are rectangular parts of the image which are coded individually.

This approach gives the algorithm the opportunity of extracting individual parts

of the image independent from the other parts.

Each tile is transformed into subbands using Discrete Wavelet Trans-

form(DWT). Either 5-3 biorthogonal or 9-7 biorthogonal [57],[58] wavelet-scaling

basis are used for DWT. The transform is computed either by fast wavelet trans-

form or lifting structure. Number of the transformation coefficients in a tile is

equal to the number of the pixels in that tile. But after the transformation

most of the information is concentrated in few coefficients in most natural im-

ages. Using a quantizer those coefficients that have little information can be

46

suppressed (big quantization steps) and the ones with the more information con-

tent stay untouched or disturbed as least as possible (small quantization steps).

The quantization process is implemented on each individual tile separately.

The final step is creating a bitstream from the quantized coefficients. First

DWT transformed subbands are partitioned into code-blocks. Each code block is

encoded bit-plane at a time. Bit planes are then coded starting from the most

significant bit-plane. Layers are created from those coded bit-lanes and then the

layers are partitioned into packets.

The decoder of JPEG2000 simply reverses the operations done by the encoder.

The number of the encoded and decoded bit-plane levels may change, according

to the user’s requirements. Any nondecoded bits are then set to zero. This gives

a progressive nature to the algorithm. The image can be reconstructed without

having all the bit-planes.

After inverting the quantization operation the tiles are transformed by IDWT

using the appropriate wavelet and scaling bases. The resulting image is inverse

color transformed from YCbCr to RGB if necessary. More detailed information

on JPEG2000 can be found in [59, 15]. Furthermore, implement of the JPEG2000

algorithm is briefly explained in [60]

2.2.3 Adaptive Approach in Wavelet Based Image Coders

Lifting structure used in wavelets gives the opportunity of predicting high sub-

band from low subband. So only transmitting the residuals of the high subband

is enough. The block diagram of the lifting scheme without update stage is shown

in Figure 2.7.

47

Figure 2.7: 1D lifting scheme without update stage.

Lifting structure also gives user the opportunity of implementing wavelet

transform in a computationally efficient and separable way. Computational effi-

ciency in wavelet transform is achieved by performing the filtering operation after

the downsampling block. Separability of the lifting wavelet transform comes from

the separability property if the used wavelet and scaling functions. Using lifting

structure, an N-dimensional signal can transformed through each of its dimen-

sions one-by-one separately. This can cause a drawback while exploiting the

inter-sample correlation since the correlations in transform direction are taken

care of. 2D signals like images can be a good example for the mentioned problem.

An image can be defined as X[n,m] where n and m are the indices of the pixels

in horizontal and vertical directions respectively. Thus, XL = X[n, 2m] and

XH = X[n, 2m+1] are low and high subbands in horizontal direction respectively.

The pixels in XH can be predicted from XL by,

XH [n, 2m + 1] = (X[n, 2m] + X[n, 2m + 2])/2. (2.10)

Thus, the residual of high subband can be calculated as,

XH e[n, 2m + 1] = XH [n, 2m + 1] − XH [n, 2m + 1]. (2.11)

48

The problem here is that the predictions are always done in one direc-

tion (horizontal or vertical). Thus, when a diagonal edge is encountered, it can

not be predicted efficiently by this scheme. Following the edge direction is a

good idea while making predictions. What the adaptive wavelet transform does

is adding flexibility to the prediction structure of the lifting wavelet transform.

Thus, predictions in diagonal directions can also be done.

The predictor first calculates the derivatives along diagonal and horizontal

directions as :

XH [n, 2m + 1] = (X[n, 2m] − X[n, 2m + 2])/2,

XH135[n, 2m + 1] = (X[n− 1, 2m] − X[n + 1, 2m + 2])/2,

XH225[n, 2m + 1] = (X[n + 1, 2m] − X[n− 1, 2m + 2])/2 . (2.12)

The minimum of XH , XH135, XH225 in Equation 2.12 gives the direction of

the best prediction. Same approach can be used in the decoding stages so by

adaptive inverse wavelet transform perfect reconstruction is also possible.

2.3 Connectivity-Guided Adaptive Wavelet Trans-

form

The image-like representation of a 3D mesh structure can be composed using

the operations in Section 2.1. At a first glance, the new data structure has no

difference from an ordinary image so it can be coded using any image coders

wanted. But unlike natural images there may be no correlation between neigh-

boring pixels in our image-like mesh representation since neighboring pixels may

not be coming from neighbouring vertices of the mesh.

49

Thus, instead of predicting non-zero pixels in our representation from their

neighboring non-zero pixels, we here introduce an adaptive wavelet transform

scheme which makes predictions using connectivity information. Ordinary

wavelet transforms do not care about connectivity relationship. Side-by-side pix-

els are predicted from each other and then encoded. Our Connectivity-Guided

Adaptive Wavelet Transform uses the connectivity information of the mesh and

predict the pixels from its connected neighbors. In our approach we use lazy

wavelet filter banks (Figure 2.5) and add a connectivity-guided adaptive predic-

tion stage to it (Figure 2.8).

2 Z −1

2

recX

Z −1

Connectivity AdaptivePrediction

+

2

2

Connectivity

Prediction Adaptive

InverseXorg X transformed

Figure 2.8: Lazy filter bank.

I1[n1, n2] is the first channel image that stores the perpendicular distance

between the corresponded vertex - pixel pair and I2[n1, n2] stores the vertex

indices. The lifting wavelet transform is implemented in a separable manner.

Thus, both I1 and I2 are polyphased in the horizontal direction as,

Ia[n1, n2] = [Ia1|Ia2],

Ia1[n1, n2] = I[n1, 2n2],

Ia2[n1, n2] = I[n1, 2n2 + 1], a = 1, 2 .

(2.13)

Thus, I22[n1, n2] = i, i ∈ {1, . . . , v}. Using the connectivity information

we find a list of neighbors nlist(j), j = 1, . . . , v that holds the indices of

the vertices connected to the vertex with index j. The predictions for I12[n1, n2]

values are done using nlistvalid, which is defined as,

50

nlistvalid(j) = nlist(j) ∩ I21[n1, n2] . (2.14)

List of vertex indices that are on image I22 is list22. For each element k of

list22, a prediction should be found from I11 image. Valid neighbors of k can be

found using Equation 2.14. So the prediction of vertex k is defined as :

Ik pred =

∑m(I11[n1, n2])

m, (2.15)

where I21[n1, n2] ∈ nlistvalid(k) and m is the number of the elements of

nlistvalid(k)

Inew12[n1, n2] = I12[n1, n2]− Ikpred, (2.16)

where I22[n1, n2] = k.

If no valid neighbors exist for a vertex, no estimation is carried out. Oth-

erwise, a prediction is made for the value of the pixel. The same procedure is

carried out between low and high subbands of the vertically transformed I11 im-

age. The high pass part is only polyphased using lazy filters. So we have four

small images at the end of each level. The inverse of the adaptive transform is

also possible so that perfect reconstruction of the images is possible.

2.4 Connectivity-Guided Adaptive Wavelet Trans-

form Based Compression Algorithm

An overview of our coding approach can be seen in Figure 2.9. A 3D mesh is rep-

resented by two image-like signals by applying the proposed mesh-to-image trans-

form in Section 2.1. After transforming the projection image using connectivity-

guided adaptive wavelet transform the data is quantized using a non-uniform

51

Image−to−MeshTransform

M 4 M 5

M 1 Mesh−to−ImageTransform

M 2

Transform

Transform

SPIHTM 4

M 6

M 3

M NewAdaptive Wavelet

Adaptive Wavelet

SPIHT−1

−1

Figure 2.9: The block diagram of the proposed algorithm.

quantizer. After the non-uniform quantizer, the image is ready for SPIHT and

JPEG2000 coding.

As mentioned in Section 2.1 geometry information of a mesh can be repre-

sented using two image-like signals by applying the proposed mesh to image

transform. The correlations between projected mesh vertices are exploited using

the connectivity-guided adaptive wavelet transform. A hierarchical representa-

tion for mesh vertices is achieved by this transform. Low subbands of the image

contains more important vertices since the prediction of the higher subbands are

made using those vertices. Before sending the image to SPIHT or JPEG2000

coders, it is quantized. Histogram of the images show that most of the pixel val-

ues are concentrated around zero. Thus, an 8 bit non-uniform quantization whose

quantization steps are smaller around zero and larger on the sides is performed.

The quantized image is then fed into to the SPIHT or JPEG2000 coder.

SPIHT coder hierarchically codes the image which means the lower subbands

are at the initial parts of the code-stream and higher subbands follow them. So

pruning the leading bits of the SPIHT stream causes a higher distortion in the

reconstructed model than pruning the ending bits. By first reconstructing the

model using leading bits and then refining it using the newly coming stream is

possible. The decoding process is explained in Section 2.5.

When the quantized image is fed into JPEG2000 coder, it further quantizes

the subbands of the tiles to the user specified levels. What makes JPEG2000

52

progressive is coding of the tiles. The tiles that corresponds to the lower subbands

of the transformed image can be transmitted first and the reconstruction of the

mesh can be done using them plus zeros padded instead of the other tiles. As

the tiles corresponding to the higher subbands received, the reconstructed mesh

can be refined. Here in this thesis this approach is not implemented. So our

JPEG2000 coder is not a progressive coder.

After the bitstream is obtained by the SPIHT encoder, it should be arith-

metically coded. We used gzip software as an arithmetic coder [61]. It is an

implementation of the Lempel-Ziv coding algorithm [62]. For comparison pur-

poses, both the original vertex list and the SPIHT bitstream are compressed

using gzip software.

2.4.1 Coding of the Map Image.

The map images provide the projected vertex indices as its pixel val-

ues (Equation 2.9). The most important issue in the compression of these images

is that it should be lossless. So no quantization can be done on the image. The

pixel values has a wide range and they are equiprobable. Thus, a coding structure

like Huffman of Lempel-Ziv is not appropriate for this kind of data.

For compression of these map images, a new algorithm which uses the prin-

ciples of differential coding is proposed. The basis assumption of the proposed

algorithm is near pixels represent near vertices. In the perfect case all K vertices

of the mesh would be projected. Thus, a list of vertex locations on the image-like

representation is created. The ith entry of the list LCoor(i) is defined as follows,

LCoor(i) =

[n1, n2], I[n1, n2] = i ,

0, otherwise(2.17)

53

where i = 1, ..., v. Then the LCoor list is differentially coded. First the non-zero

enteries of LCoor are found as;

q(j) = {i | LCoor(i) 6= [0, 0] and j = 1, ..., G, } (2.18)

where G is the number of non-zero enteries of the LCoor list. The LCoor is

updated as;

LCoor(q(j + 1)) = LCoor(q(j + 1)) − LCoor(q(j)). (2.19)

By this way a predicted version of LCoor is created whose mean is around 0

and variance is concentrated around the mean. Thus predicted version of LCoor

can be compressed more efficiently.

Then the LCoor list is converted to a bitstream and sent to the receiver.

What the receiver does is reversing the procedure. It finds again the first non-

zero entry and it neighbors, inverses the predictions of the neighbors and then

add those neighbors to the buffer. Processing the elements in the buffer one by

one reverses the encoding process.

2.4.2 Encoding Parameters vs. Mesh Quality

Two issues defining the mesh quality are: (i) Length of the used bit-

stream (SPIHT) & Quantization level for JPEG2000, (ii) Number of wavelet

decomposition levels. Decreasing the length of the bitstream leads to more com-

pression at the expense of higher distortion. Increasing the number of wavelet

decomposition levels usually leads to higher compression ratios at the expense of

more computational cost.

54

The distortion level of the reconstructed 3D mesh is measured visually or

using some tools like METRO [63]. Mean Square Error (MSE) and “Hausdorff

Distance” between the original and the reconstructed object are mostly used

error measures in the literature.

The issue of how much of the SPIHT stream should be taken and what

should be the quantization levels of JPEG2000 are closely related to the detail

level parameter used in the orthogonal projection operation. If the detail level is

low, the percentage of the used bit-stream must be increased to reconstruct the

3D mesh without much distortion.

Estimation of the number of levels of wavelet decomposition is easier than

determining how much of the bitstream should be taken without introducing

visual distortion. It is better to increase the number of scales in the wavelet

decomposition as much as possible. This is because the data contains mostly

insignificant pixels and as we get higher on the scales, the chance of having larger

zero-trees is higher. Having larger zero-trees leads to better compression levels.

However, increasing the decomposition level also increases the computational

cost thus, adversely affects the performance.

2.5 Decoding

The 3D mesh is reconstructed using the SPIHT or JPEG2000 bitstream and

some other side information, such as the vertex indexes (the second channel), the

detail level used in the image-like representation. First the bitstream is trans-

formed to image-like representation by decoding. Decoding process of SPIHT

and JPEG2000 are explained in [49, 59] respectively. Then using the connec-

tivity information of the mesh the inverse of the connectivity-guided adaptive

wavelet transformation is applied to the image.

55

Finally, using the projected vertex coordinates vi and the vertex indices, the

image is back-projected to the 3D space. Since the only exact data available

(except quantization) is the orthogonal component di of the 3D mesh vertices,

the mesh cannot be perfectly reconstructed. The real coordinates of the mesh

are quantized to the grid point locations. The normalized 3D coordinates vi of

the mesh can be reconstructed without using any extra data.

From Equation 2.7 it is known that for each grid point the projected vertex

is chosen among a set of vertices. The number of the elements of this set can be

zero, one or more than one. Thus, some of the vertices can be coincided while

they are projected onto the n plane. In projection operation, we choose one of

the vertices from the set and discard coincided ones. Two methods are used

to recover these vertices. One is to use a second plane to send the data of the

coincided vertices; the second method is based on estimating the value of the

vertex from its neighbors as in Equation 2.20. The connectivity list is used to

find the neighbors of the lost vertex. Thus, the lost vertices can be predicted

from their connected neighbors by,

vi =

∑k vk

k, (2.20)

where k is the number of elements of nlist(i).

56

Chapter 3

Results from the Literature and

Image-like Mesh Coding Results

This section is mainly consists of 3 parts: Section 3.1 gives the mesh coding

results using the existing approaches, Section 3.2 gives the coding results of our

algorithm,and Section 3.3 gives the comparisons between the our approach and

the existing approaches.

3.1 Mesh Coding Results in the Literature

In Table 3.1, the compression rates of connectivity coders that were reviewed

in Section 3.1 are given. The results in Table 3.2 show that valence based ap-

proaches [19],[16] give the best results [44]. Especially remeshing decreases the

bit rate created by the valence based approaches drastically since most of the

vertices of a regular mesh have a valence of six.

Since the method that is proposed in this thesis only compresses the geome-

try of the mesh, a connectivity compression algorithm is needed to compress the

57

Author Name Type Bitrate (bpv)Deering [3] Generalized Triangle ∼ 8-11

Triangle Mesh StripsTaubin & Rossignac [18] Topological Dual ∼ 4

Surgery TreesGumbold & Strasser [64] Cut Border Face 4.36

Triangle Machine BasedRossignac [11] Edgebreaker Face ∼ 3

Based (3.55 guaran.)Isenburg & Snoeyink [30] FaceFixer Edge ∼ 2.5

BasedTouma & Gotsman [19] The TG Coder Valence ∼ 2

Based ∼ 0 when regularAlliez & Desbrun [16] Adaptive Valence ∼ 1.89

Valence Based Based- ∼ 0 when regular

Table 3.1: Compression results for the single rate mesh connectivity coders inliterature.

connectivity information of the mesh. Among the algorithms in Table 3.1, Edge-

breaker [11] is used as the connectivity coder of the proposed method because it

is one of the most widely used connectivity coding algorithm and has a public

source code that can be used.

In Table 3.2 the compression rates of the existing single rate geometry coders

are given. Corresponding quantization levels of the mesh vertices are given in

the 4th column of the table. Quantization level is an important concept while

compressing geometry information of the mesh. From Table 3.2, it can be seen

that by lowering the number of quantization levels (decreasing the number of bits)

decreases the bit-rates. On the other hand, coarse quantization level results in

more distortion in the reconstructed model.

In Table 3.3, the compression rates of existing progressive geometry coders

are given. The best result is achieved by Khodakovsky et al. Their algorithm is

based on the wavelet transformation of remeshed models. Like in connectivity

coding, models that are remeshed to semi-regular or regular structures have lower

58

Author Name Bitrate Quantization(bpv) Levels

Touman & Gotsman [19] Parallelogram ∼ 20 12 Bits∼ 26 14 Bits

Cohen-Or [33] Multiway ∼ 18 12 Bits∼ 23 14 Bits

Isenburg & Alliez [65] Polygonal Parallelogram ∼ 16 12 Bits∼ 20 14 Bits

Table 3.2: Compression results for the single rate mesh geometry coders in liter-ature.

data rates. Due to the regular structures of the remeshed models, more accurate

predictions can be done on those models so smaller residues result from the

predictions of their vertices.

3.2 Mesh Coding Results of the Connectivity-

Guided Adaptive Wavelet Transform Algo-

rithm

Image coding results are significantly affected by the choice of wavelet basis.

Therefore, wavelet transform basis to decompose the image-like meshes should

be investigated. Among several wavelet bases like lazy wavelet, daubechies-4,

biorthogonal-4.4, etc. lazy wavelet basis gave the best results. The reason is that

most of the neighboring pixels in the image-like representation are not neighbors

of each other in the original mesh representation. So a wavelet transform basis

with a small support (lazy wavelet basis) gives better compression rates than

those with larger support regions. This is because the image-like meshes contain

isolated points.

In Tables 3.4 and 3.5 data sizes and distortions of the sandal and cow meshes

that are compressed using SPIHT using wavelet transform of various basis, can

59

Author Name Bitrate (bpv)Popovic & Hoppe [39] PSC over 35 bpv

Hoppe [21] PM about 35 bpvTaubin [18] PFS slightly below 30 bpv

Pajarola & Rossignac [66] CPM about 22 bpvAlliez & Desbrun [41] VDC 14-20 bpv

Author Name Reconstruction Quality

Khodakovsky et. al. [43] PGC 12 db better quality thanCPM at the same bitrate

Khodakovsky & Guskov [44] NMC 2-5 db better quality thanPGC at the same bitrate

Moran & Garcia [17] MG between PGC and NMCGu et al. [5] Geometry 3 db worse quality than

Images (GI) PGC at the same bitratePraun & Hoppe [67] Spherical Better than PGC and GI

GI worse than NMC

Table 3.3: Compression results for the progressive mesh coder in literature (Ge-ometry + Connectivity).

be seen. Some of the corresponding meshes that are depicted in Table 3.4 and

3.5 can be seen in Figures 3.1 and 3.2.

Hausdorff distance metric is used for measuring the distortion between orig-

inal and the reconstructed models. The Hausdorff distance between two given

set of points A = a1, a2, ..., an and B = b1, b2, ..., bn is :

d(A, b) = max{maxa∈A

minb∈B

|b− a|, maxb∈B

mina∈A

|a− b|}, (3.1)

where a − b is the Euclidean distance between two points. So the Hausdorff

distance is the maximum distance of a set to the nearest element of the other

set [68].

The factors that are affecting the Hausdorff distance between the original

and the reconstructed model are: the Detail level and the used bitstream. Detail

level parameter affects the size of the projection image and the closeness of the

grid samples on the projection image. As the Detail level parameter increases

60

Filters Bitstream Detail Size Org. - Quan. Org. - Recons.Used (%) Level (KB) Hausdorff Dist. Hausdorff Dist.

Lazy 40 3.0 6.39 0.022813 0.022830Lazy 60 3.0 7.01 0.022813 0.0228224Lazy 80 3.0 7.71 0.022813 0.022493Lazy 60 4.5 7.84 0.021336 0.021347Haar 40 3.0 8.51 0.022813 0.023225Haar 60 3.0 10.2 0.022813 0.022827Haar 80 3.0 12.1 0.022813 0.022825

Daubechies-10 60 3.0 13.0 0.022813 0.023813

Table 3.4: Compression results for the Sandal model.

Filters Bitstream Detail Size Org. - Recons.Used (%) Level (KB) Hausdorff Dist.Without Prediction

Lazy 40 3.0 9.51 0.014547Lazy 60 3.0 10.4 0.014219Haar 40 3.0 11.5 0.016085Haar 60 3.0 13.8 0.014102Haar 80 3.0 16.0 0.014011Haar 100 3.0 18.0 0.014.28

Daubechies-4 40 3.0 12.0 0.018778Daubechies-4 60 3.0 15.1 0.014661Daubechies-4 80 3.0 18.2 0.014611Daubechies-10 40 3.0 12.1 0.014194Daubechies-10 60 3.0 15.2 0.045020

Biorthogonal-4.4 60 4.0 16.0 0.014549Biorthogonal-4.4 80 3.0 18.3 0.023555

With Adaptive PredictionLazy 20 3.0 9.64 0.025806Lazy 30 3.0 10.6 0.013936Lazy 60 3.0 12.6 0.014053Lazy 80 3.0 13.6 0.014059Lazy 40 5.0 19.0 0.007805Lazy 60 5.0 21.2 0.007805Lazy 80 5.0 23.4 0.007805Lazy 60 7.0 22.1 0.007240Lazy 80 7.0 23.9 0.007240

Table 3.5: Comparative compression results for the Cow model compressed with-out prediction and with adaptive prediction.

61

(a) (b)

(d) (e)

Figure 3.1: Reconstructed sandal meshes using parameters (a) lazy wavelet, 60%of bitstream, detail level=3; (b) lazy wavelet, 60% of bitstream, detail level=4.5;(c) Haar wavelet, 60% of bitstream, detail level=3; (d) Daubechies-10, 60% ofbitstream, detail level=3.(Sandal model data is courtesy of Viewpoint Data Lab-oratories)

the projection image size increases and thus, the grid samples becomes closer in

distance. the used bitstream parameter shows how much of the encoded bitstream

is used in the reconstruction of the original model. Both the Detail level and the

used bitstream parameters are linearly proportional to the size of the data used

for reconstruction.

Using a standard wavelet filter basis including the lazy filter, interpixel cor-

relations can not be exploited. Since the neighborhood concept between mesh

vertices is related to the connectivity of the mesh, a wavelet transform that can

adapt itself according to the connectivity of the mesh is needed. This idea leads

62

(a) (b) (c)

(d) (e)

Figure 3.2: Meshes reconstructed with a detail level of 3 and 0.6 of the bitstream.(a) Lazy, (b) Haar, (c) Daubechies-4, (d) Biorthogonal4.4, and (e) Daubechies-10wavelet bases are used.

to the implementation of the connectivity-guided adaptive wavelet transform. In

this way pixels of the image-like representation are predicted from each other

and small prediction errors are obtained. The comparative results of SPIHT

compression of cow and lamp models, using non-adaptive and adaptive wavelet

transforms are given in Tables 3.5 and 3.6.

Tables 3.5 and 3.6 show that adaptive wavelet structure produces better re-

sults than non-adaptive lazy filterbanks. Having the same data size adaptive

structure causes less distortion in the reconstructed mesh. This means better

compression for the same distortion rate. However, the computational power

needed for the adaptive structure is higher since multiple searches in connectiv-

ity data must be performed.

63

Bitstream Detail Size Size Org. - Recons. Org. - Recons.Used (%) Level Non-adaptive Adaptive Hausdorff Dist. Hausdorff Dist.

(KB) (KB) (Non-adaptive) (Adaptive)60 3.0 9.05 10 0.069620 0.02336780 3.0 9.64 10.6 0.050859 0.01902340 5.0 14.7 16.5 0.027902 0.01863860 5.0 16.4 18.6 0.029870 0.01855180 5.0 17.8 20.5 0.025991 0.01842560 7.0 17.9 19.8 0.033990 0.01143680 7.0 19.1 21.6 0.041133 0.012037

Table 3.6: Compression results for the Lamp model using lazy wavelet filterbank.

Figure 3.14 gives meshes reconstructed with and without using adaptive pre-

diction for a qualitative comparison. Especially, the quality difference of the

reconstructed meshes is noticeable at sharp features, such as the paws in the

dragon mesh, and the base of the lamp mesh.

After embedding connectivity-guided adaptiveness in the algorithm, quanti-

zation is added to the algorithm. Quantization made the rate-distortion values

better since; SPIHT and JPEG2000 adapt themselves better with the quantized

data (1) and by using irregular quantization, the residuals part of the data is

better represented (2). Figure 3.15 gives a qualitative comparison between the

original and the reconstructed meshes which are compressed by SPIHT (a-b)

and JPEG2000 (c). Meshes compressed with SPIHT are superior to JPEG2000

compressed meshes. This can also be noticed from Tables 3.7 - 3.8 which give

rate distortion values for compressed Homer Simpson and 9 Handle Torus mesh

models.

Captures from MeshTool [69] program shown in Figures 3.7, 3.8, 3.9, 3.10,

3.11, 3.12, 3.13,3.25, 3.26, 3.27, 3.28, 3.29, 3.3, 3.4, 3.5, 3.6, 3.21, 3.22, 3.23, 3.24,

3.29 visualize the distortion in the reconstructed models. MeshTool program

also uses the Hausdorff distance as the distortion metric. The images referenced

by (c) show the reconstructed model and the images referenced by (b) show

64

SPIHT Haus. Dist. JPEG2000 Haus. Dist.Size-(KB) (SPIHT) Size-(KB) (JPEG2000)

4.07 0.060922 6.58 0.0762154.37 0.033648 6.27 0.0761074.67 0.019715 9.64 0.0764884.96 0.013422 9.28 0.0763745.55 0.015236 12.7 0.0759226.76 0.005503 12.2 0.0756997.92 0.005216 11.4 0.075680

Table 3.7: Compression results for the Homer Simpson model using SPIHT andJPEG2000. Hausdorff distances are measured between the original and recon-structed meshes.

SPIHT Haus. Dist. JPEG2000 Haus. Dist.Size-(KB) (SPIHT) Size-(KB) (JPEG2000)

7.84 0.009638 13.6 0.0363878.18 0.010951 14.0 0.0361838.96 0.010699 14.1 0.03646811.9 0.008904 16.7 0.03626612.7 0.007685 16.9 0.036459

Table 3.8: Compression results for the 9 Handle Torus mesh model using SPIHTand JPEG2000. Hausdorff distances are measured between the original andreconstructed meshes.

65

the distortion levels as color levels on the original model. The graphs in those

Figures show the histogram of the error on the original model. Blue is the lowest

distortion level. As the distortion increases the color index goes from blue to

green and then red. Red is the highest distortion level.

Figures 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13 give the visualization of Homer

model and Figures 3.25, 3.26, 3.27, 3.28, 3.29, give the visualization of 9-Torus

mesh model reconstructed from different lengths of bitstream of the SPIHT code

stream. As the length of the bitstream gets shorter, the distortion increases. The

sharp features of the reconstructed model distort, as the length of the bitstream

shortens.

Figures 3.3, 3.4, 3.5, 3.6 give the visualization of Homer model and Figures

3.21, 3.22, 3.23, 3.24, 3.29 give the visualization of the 9-Torus model recon-

structed from different subband quantization levels. As the higher subbands are

quantized more, the bitrate decreases but the distortion of the reconstructed

model increases. The lower subbands of the image carries more information than

the higher subbands because they are used for the prediction of the higher sub-

bands. So fine quantization is done on low subbands and coarse quantization is

done on the higher subbands.

The sandal and cow models are obtained from Viewpoint Datalabs

Inc. and Matthias Muller (ETH Zurich), respectively. The original san-

dal model has a compressed data size of 31.7 KB and the cow model

has a data size of 27.2 KB. The 9 Handle Torus model is obtained from

“http://www.ics.uci.edu/∼pablo/files/data/genus-non-0/9HandleTorus.ply” and

is composed of 9392 vertices with 165 KB compressed data size. The Homer

Simpson model is obtained from ”INRIA Gamma Team Research database Web-

site Collections” and is composed of 4930 vertices with 98 KB data size.

66

(a) (b) (c)

Figure 3.3: Homer Simpson model compressed with JPEG2000 to 6.58 KB(10.7 bpv).

3.3 Comparisons

In single rate compression schemes, the mesh data is compressed before trans-

mission and then all the data is sent. In progressive compression schemes, the

mesh data is compressed during transmission and sent progressively. First low-

resolution data is decoded and then the decoded model is updated to a higher

resolution using newly arriving data. In our approach, we use the prior compres-

sion property of single rate compression, and updateable decoded model property

of progressive compression.

67

(a) (b) (c)


The bitstream created by SPIHT coding has an hierarchical structure. Thus,

after receiving the first l coefficients of the bitstream where l is smaller than

the total length of the bitstream L, the decoder reconstructs a lower resolution

representation of the mesh. Once a mesh is formed using the first l coefficients,

the decoder can update the mesh using the remaining bitstream elements in a

progressive manner. The mesh can be perfectly reconstructed once the whole

bitstream is received. This property makes the proposed SPIHT approach a

progressive mesh compression technique.

68

(a) (b) (c)

Figure 3.5: Homer Simpson model compressed with JPEG2000 to 9.28 KB(15 bpv).

Most of the progressive mesh compression algorithms use edge collapses at the

encoder side and vertex splits at the decoder side. A typical low resolution version

of a given mesh has a smaller number of vertices and larger triangles than the

original mesh [48]. One cannot retrieve a lower resolution mesh structure from the

bitstream in the proposed algorithm because the SPIHT encoder compresses the

wavelet domain data by taking advantage of the multiscale correlation between

scales. Because of this the proposed compression technique is similar to single

rate coding schemes. All the vertices can be recovered from a part of the bit-

stream. The degradation in the quality comes from the distortion due to the

69

(a) (b) (c)


inexact positioning of the vertices. These features of the proposed algorithm are

demonstrated in Figure 3.16 for various resolution degradation levels.

The difference between two resolutions in progressive mesh compression is

represented by using a sequence of edge collapses or vertex splits [66]. In the

proposed SPIHT algorithm, the difference between two resolutions is represented

by using the parts of the bitstream that fine tune the positions of the mesh

vertices.

On the other hand, our JPEG2000 algorithm is not a progressive encoder

as it is implemented here. But different resolutions of the mesh model can be

70

(a) (b) (c)

Figure 3.7: Homer Simpson model compressed with SPIHT to 4.07 KB (6.6 bpv).

reconstructed by using different quantization levels in JPEG2000. In the wavelet

subband decomposition high bands are suppressed due to coarse quantization

so that lower resolution mesh models are established. Using the scheme, which

is explained in Subsection 2.4, this approach can be converted to a progressive

mesh coding algorithm.

Another image-like coding technique for mesh models is also mentioned

in [35]. The images are created using parameterization of the mesh. Param-

eterizing a mesh is a very computationally hard job to do. Also before parame-

terization the mesh model should be cut and opened which are also hard tasks

71

(a) (b) (c)


to do for complex models. In contrary to parameterization, our projection oper-

ation is a easy to implement and computationally easy job to do. It do not need

the to be cut and opened. But the drawback of projection operation is that it

does not guarantee all the vertices to be projected on the image-like represen-

tation. But using connectivity based interpolation, the unprojected vertices can

be interpolated from the know vertex locations.

Also other wavelet based mesh coders, like PGC in [44], do exist in the lit-

erature. PGC is one of the most successful mesh coder in literature. From

Figure 3.17 it can be seen that using only the % 15 (b) of a stream of 37.7KB

results in a visually very good result. As the percentage of the used stream

72

(a) (b) (c)


increases the quality of the model increases too (Figure 3.17 (a)-(f)). But the

problem with PGC is that it can work only on semi-regular or regular meshes.

Thus, before using wavelet transformation directly on meshes, the mesh model

must be remeshed so that a semi-regular or regular mesh structure is established.

Remeshing is also a computationally hard task; especially for large models. On

the other hand, our scheme can be applied to any mesh regardless of its regularity.

In Figures 3.18,3.19,3.20 a visual comparison between MPEG mesh coder

and the proposed SPIHT based mesh coder is given. As it is shown in Table 3.9

the proposed algorithm gives comparable results with the MPEG mesh coder.

When the same mean distance between the original and reconstructed models

73

(a) (b) (c)

Figure 3.10: Homer Simpson model compressed with SPIHT to 4.96 KB (8 bpv).

taken into account,the data size of the proposed coder’s is superior to the MPEG

coder. Error in the high pass parts of the models is due to the lost vertices in the

projection operator. If a better projection operator can be defined, those error

can be corrected and much better results can be obtained.

74

Model Compression Data Max Distance Mean DistanceAlgorithm Size-(KB) Hausdorf

Homer MPEG 41.8 0.002645 0.00066Homer SPIHT 9.41 0.003704 0.00060Homer SPIHT 7.92 0.005216 0.00093Inek MPEG 26.1 0.001780 0.00068Inek SPIHT 7.25 0.005631 0.00041Lamp MPEG 36 0.43 0.1Lamp SPIHT 4.77 0.01468 0.00170

9Han.Torus MPEG 82.8 0.001563 0.00059769Han.Torus SPIHT 12.7 0.009797 0.000927

Sandal MPEG 22.7 0.001904 0.000743Sandal SPIHT 5.91 0.007705 0.000273Sandal SPIHT 4.2 0.020076 0.000788Dance MPEG 55.4 0.002007 0.000673Dance SPIHT 17.3 0.003393 0.000326Dance SPIHT 12.1 0.009140 0.00106Dragon MPEG 43.1 0.001473 0.000557Dragon SPIHT 7.18 0.05672 0.00192

Table 3.9: Comparative results for the Homer,9 Handle Torus, Sandal, Dragon,Dance mesh models compressed using MPEG and SPIHT mesh coders. Hausdorffdistances are measured between the original and reconstructed meshes.

75

(a) (b) (c)

Figure 3.11: Homer Simpson model compressed with SPIHT to 5550 KB (9 bpv).

76

(a) (b) (c)

Figure 3.12: Homer Simpson model compressed with SPIHT to 6.76 KB (11 bpv).

77

(a) (b) (c)

Figure 3.13: Homer Simpson model compressed with SPIHT to7.92 KB (12.8 bpv).

78

(a) (b)

.

(c) (d)

Figure 3.14: The qualitative comparison of the meshes reconstructed with with-out prediction (a and c) and adaptive prediction (b and d). Lazy wavelet basisis used. The meshes are reconstructed using 60% of the bitstream with detaillevel=5 in the Lamp model and 60% of the bitstream with detail level=5 inthe Dragon model. (Lamp and Dragon models are courtesy of Viewpoint DataLaboratories)

79

(a) (b)

(c)

Figure 3.15: Distortion measure between original (images at left side of the fig-ures) and reconstructed Homer Simpson mesh models using MeshTool [69] soft-ware. (a) SPIHT at 6.5 bpv; (b) SPIHT at 11 bpv; (c) JPEG2000 at 10.5 bpv.The grayscale colors on the original image show the distortion level of the recon-structed model. Darker colors mean more distortion.

80

(a) (b) (c)

Figure 3.16: Comparison of our reconstruction method with Garland’s simpli-fication algorithm [48] (a) Original mesh; (b) simplified mesh using [48] (thesimplified mesh contains 25% of the faces in the original mesh); (c) mesh recon-structed by using our algorithm using 60% of the bitstream.

81

(a) (b) (c)

(d) (e) (f)

Figure 3.17: (a) Base mesh of Bunny model composed by PGC algorithm (230faces); (b) Model reconstructed from 5% of the compressed stream (69, 967 faces);(c) Model reconstructed from 15% of the compressed stream (84, 889 faces); (d)Model reconstructed from 50% of the compressed stream (117, 880 faces); (e)Model reconstructed from 5% of the compressed stream (209, 220 faces); (f) Orig-inal Bunny mesh model (235, 520 faces). The original model has a size of 6 MBand the compressed full stream has a size of 37.7 KB.

82

(a) (b)

(c) (d)

Figure 3.18: Homer and 9 Handle Torus models compressed using MPEG meshcoder. The compressed data sizes are 41.8 KB and 82.8 KB respectively. Figureson the left side show the error of the reconstructed model with respect to theoriginal one. Reconstructed models are shown on the right side.

83

(a) (b) (c)

(d) (e) (f)

Figure 3.19: The error between the original dancing human model and recon-structed dancing human models compressed using SPIHT 13.7 bpv (a) and 9.7bpv(c) respectively. (b) and (d) show the reconstructed models. The error betweenthe original dancing human model and reconstructed dancing human modelscompressed using MPEG at 63 bpv (e). (f) the reconstructed models.

84

(a) (b)

(c) (d)

Figure 3.20: (a) Dragon (5213 vertices) and (c) Sandal (2636 vertices) modelscompressed using MPEG mesh coder.(b) Dragon (5213 vertices) and (d) Sandal(2636 vertices) models compressed using The proposed SPIHT coder. Com-pressed data size are 43.1 KB and 10.4 KB, respectively for Dragon model and22.7 KB and 2.77 KB respectively for Sandal model.

85

(a) (b) (c)

Figure 3.21: 9 Handle Torus model compressed with JPEG2000 to 14.6 KB(12.4 bpv).

86

(a) (b) (c)


87

(a) (b) (c)

Figure 3.23: 9 Handle Torus model compressed with JPEG2000 to 14 KB(11.9 bpv).

88

(a) (b) (c)


89

(a) (b) (c)

Figure 3.25: 9 Handle Torus model compressed with SPIHT to 7.84 KB (6.7 bpv).

90

(a) (b) (c)

Figure 3.26: 9 Handle Torus model compressed with SPIHT to 8.18 KB (7 bpv).

91

(a) (b) (c)

Figure 3.27: 9 Handle Torus model compressed with SPIHT to 8960 KB(7.63 bpv).

92

(a) (b) (c)

Figure 3.28: 9 Handle Torus model compressed with SPIHT to 11.9 KB(10.1 bpv).

93

(a) (b) (c)

Figure 3.29: 9 Handle Torus model compressed with SPIHT to 12.7 KB(10.8 bpv).

94

Chapter 4

Conclusions and Future Work

In this thesis, a new mesh compression framework that uses connectivity-guided

adaptive wavelet transform based image coders is introduced. Two newly defined

concepts are : projection and the connectivity based wavelet transform. Here in

this thesis it is shown that a mesh can be compressed using ready-made wavelet

based image compression tools. Also it is shown that by using these tools both

single rate and progressive encoding can be achieved.

The results in this thesis show that the idea of using image processing tools

on meshes can be realized without making any parameterizations [5] on the

mesh or manipulations on the used image processing tools [7]. Furthermore,

the results given in Chapter 3 show that the use of newly defined connectivity-

guided adaptive wavelet increases the encoding efficiency against the use ordinary

wavelet transform.

The projection operation defined here is simple to implement and needs less

computation than the parameterization introduced in [5]. The parameterization

approach needs the mesh to be cut and opened homeomorphic to a disc. Then

by solving several linear equations, mesh is transformed to an image. Despite

95

parameterization, projection operation needs only two computationally easy lin-

ear equations to be solved: one for determining the pixel position of the mesh

vertex on the image and one for the finding the values of the projected pixels.

Although the introduced approach is simple it has some drawbacks. Since

the 3D mesh models are not homeomorphic to a disk, most of the time it is not

possible to find correspondence between each vertex in a mesh and an image

pixel. So some of the vertices get lost in the during the projection operation.

To handle those situations, a neighborhood based interpolation scheme which

predicts the values of the lost vertices from its projected neighbors is defined.

However, it is observed that if many vertices are lost, this algorithm does

not work well. Especially for complex models this can be the situation. So for

complex models, more than one projection of the mesh should be taken. This

increases the bit rates but decreases the distortion level of the reconstructed

mesh. A way to find the best projection, by which maximum number of mesh

vertices are projected, must found to improve the distortion rates.

Using projection operation an image-like representation of the 3D mesh is cre-

ated. The new image-like structure can be encoded using any image coding tool.

Here it is shown that the idea of using wavelet transform gives the opportunity of

creating a progressive representation for the mesh. Since the correlation between

the pixels in the image-like representation is different from the ordinary images,

an algorithm like the defined connectivity-guided adaptive wavelet transform is

needed to better exploit the correlations.

In Tables 3.7 - 3.8 with results in [1], it can be said that for the same distor-

tion level, adaptive scheme has lower bit rates than non-adaptive scheme. Also

Figure 3.15 visually shows that for nearly the same rate, SPIHT coder has lower

distortion than the JPEG2000 coder. Thus, adaptive approach is superior to the

96

non-adaptive approach. This is due to the better exploitation of the correlation

between connected vertex positions.

The used connectivity-guided adaptive wavelet transform is based on the

prediction of grid points from its neighbors. Due to this reason, both SPIHT

and JPEG2000 encoders provide better results when the values of signal samples

are more highly correlated with each other. Hence, a mesh-to-image transform

providing high correlation between the grid values will lead to higher compression

ratios.

The best compression algorithms in literature use remeshing as the prelimi-

nary step, since they are not applicable to irregular meshes. Much smaller errors

reside while making predictions on the vertices of those models. Thus, their

compression bit rates are very low compared to the compression bit rates of al-

gorithms that are compressing irregular meshes. But they can not reconstruct

the original mesh since they remeshed the model. Also the wavelet based com-

pression algorithms like PGC in [44] in literature are only applicable to remeshed

models. The advantage of our algorithm is that it is applicable to any mesh model

structure like; irregular, semi-regular or regular.

The future work will include finding the best projection plane so that maxi-

mum number of vertices are projected on the image-like representation and more

correlation between the neighboring pixels arise. Adaptive approach can be ap-

plied to the image-like representation using other wavelet bases. It is known

that due to their big support other wavelet bases do not give good results in a

non-adaptive manner. However adaptiveness may change this situation since it

redefines the neighboring concept in every subband of image-like representations.

The future work will also include the compression of the dynamic meshes

using video compression methods. In dynamic meshes, there exists a inter-

vertex correlation between vertices of mesh sequence. Instead of compressing

97

each mesh frame separately, group of mesh frames should be coded together.

Thus the inter-vertex correlation between mesh frames can be exploited. Using

video compression algorithms are suitable for purpose.

98

Bibliography

[1] K. Kose, A. E. Cetin, U. Gudukbay, L. Onural, “Nonrectangu-

lar wavelets for multiresolution mesh analysis and compression”,

Proc. of SPIE Defense and Security Symposium, Independent

Component Analyses, Wavelets, Unsupervised Smart Sensors,

and Neural Networks IV, Vol. 6247, pp. 19-30 2006.

[2] M. Lounsbery, T. D. DeRose, J. Warren, “Multiresolution Anal-

ysis for Surfaces of Arbitrary Topological Type”, ACM Transac-

tions on Graphics, Vol. 16-1, pp. 34-73, 1997.

[3] M. Deering, “Geometry Compression”, Proc. of ACM SIG-

GRAPH, pp. 13-20, 1995.

[4] C. E. Shannon “A mathematical theory of communication”, Bell

Syst. Tech. Journal, Vol. 27, pp. 379-423, 623-656 1948.

[5] X. Gu, S. Gortler, H.Hoppe, “Geometry Images”, Proc. of ACM

SIGGRAPH pp.355-361, 2002.

[6] A. Sheffer, E. Praun, K. Rose, Mesh Parameterization Methods

and their Applications, Foundations and Trends in Computer

Graphics and Vision, Publishers Inc. 2006.

99

[7] I. Guskov, W. Sweldens and P. Schroder, “Multiresolution signal

processing for meshes”, Proc. of ACM SIGGRAPH, pp. 325-334,

1999.

[8] J. Peng, C.-S. Kim, Kuo, and C.-C. Jay, “Technologies for 3D tri-

angular mesh compression: a survey”, Journal of Visual Com-

munication and Image Representation,Vol. 16, No. 6 , pp. 688-

733, 2005.

[9] S. Gumhold, Mesh Compression, PhD Dissertation, 2000

[10] Weisstein, W. Eric, “Homeomorphic” From

MathWorld–A Wolfram Web Resource.

http://mathworld.wolfram.com/Homeomorphic.html

[11] J. Rossignac, “Edgebreaker: connectivity compression for tri-

angle meshes”, IEEE Trans. on Visualization and Computer

Graphics, Vol. 5, No. 1, pp. 47-61, 1999.

[12] M. Mantyla, “An Introduction to Solid Modeling”, Computer

Science Press 1988.

[13] Bruce G. Baumgart, “A Polyhedron representation for com-

puter vision”, Proc. of the National Computer Conference,

pp. 589596, 1975.

[14] K.Weiler, “Edge-based data structures for solid modeling in

curved-surface environments”, IEEE Computer Graphics and

Applications Vol. 5, No. 1, pp. 21-40, 1985.

[15] R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice

Hall, 2002.

100

[16] P. Alliez, M. Desbrun, “Valence-driven connectivity encoding of

3D meshes”, Computer Graphics Forum, Vol. 20, pp. 480-489,

2001.

[17] F. Moran and N. Garcia, “Comparison of wavelet-based three-

dimensional model coding techniques”, IEEE Transactions on

Circuits and Systems for Video Technology, Vol. 14, No. 7,

pp. 937-949, 2004.

[18] G. Taubin, J. Rossignac, “Geometry compression through topo-

logical surgery” ACM Transactions on Graphics, Vol. 17, No. 2,

pp. 84-115 1998.

[19] C. Touman, C. Gotsman, “Triangle mesh compression” Proc.

Graphics Interface, pp. 26-34, 1998.

[20] Z. Karni and C. Gotsman, “Spectral compression of mesh geom-

etry”, Proc. of ACM SIGGRAPH, pp. 279-286, 2000.

[21] H. Hoppe, “Progressive meshes”, Proc. of ACM SIGGRAPH,

pp. 99-108, 1996.

[22] E. Catmull and J. Clark, “Recursively generated B-spline sur-

faces on arbitrary topological meshes”, Computer Aided Design,

Vol. 10, pp. 350-355, 1978.

[23] I. Guskov, K. Vidimce, W. Sweldens, P. Schroder, “Normal

meshes”. Proc. of ACM SIGGRAPH, 2000.

[24] R. Ansari and C. Guillemort, “Exact reconstruction filter banks

using diamond FIR filters”, in Proc. Bilcon, pp. 1412-1424, 1990.

[25] L. Ibarria, J. Rossignac, “Dynapack :space-time compression of

the 3D animations of triangular meshes with fixed connectivity”,

Technical Report, TR-No. 03-08, 2003.

101

[26] J. Zhang, C. B. Owen, “Octree-based Animated Geometry Com-

pression” Data Compression Conference, pp. 508-517, 2004.

[27] K. Muller, A. Smolic, M. Kautzner, P. Eisert, T. Wiegand, “Pre-

dictive compression of dynamic 3D Meshes”, Proceedings of In-

ternational Conference on Image Processing, pp. 621-624, 2005

[28] J. Zhang, C. B. Owen, “Hybrid Coding for Animated Polygo-

nal Meshes: Combining Delta and Octree”, International Con-

ference on Information Technology: Coding and Computing,

Vol. 1, pp. 68-73, 2005

[29] J. Rossignac, A. Szymczak, “Wrap and Zip: Linear decoding of

planar triangle graphs”, IEEE Trans. Visual. Computer Graph-

ics, Vol. 5, No.1, pp. 47-61, 1999

[30] M. Isenburg, J. Snoeyink, “Compressing the property mapping

of polygon meshes”, Proc. of Pacific Graphics, pp. 4-11, October

2001.

[31] E. Lee and H. Ko, “Vertex data compression for triangular

meshes”, In Proc. of Pacific Graphics, pp. 225-234, 2000.

[32] O. Sorkine, D. Cohen-Or, and S. Toldeo, “High-pass quantization

for mesh encoding”, In Proc. of Eurographics Symposium on

Geometry Processing, pp. 42–51, 2003.

[33] R. Cohen, D. Cohen-Or and T. Ironi, “Multi-way Geometry En-

coding”, Technical report ,2002.

[34] C. Loop, “Smooth subdivision surfaces based on triangles”, Mas-

ter Thesis-University of Utah, 1987.

102

[35] J. Ho, K-C. Lee, D. Kriegman. “Compressing Large Polygo-

nal Models”, Proc. of IEEE Visualization Conference Proc.,

pp.357-362, 2001.

[36] M. Isenburg, S. Gumhold, “Out-of-Core Compression for Gigan-

tic Polygon Meshes”, Proc. of ACM SIGGRAPH, pp. 935-942,

2003.

[37] P. Alliez, C. Gotsman, “Recent advances in compression of 3D

meshes”, In Proc of Symposium on Multiresolution in Geomet-

ric Modeling, 2003.

[38] A. Szymczak, J. Rossignac, D. King, “Piecewise Regular Meshes:

Construction and Compression”, Graphics Models (Special Issue

on Processing of Large Polygonal Meshes), 2003.

[39] J. Popovic, H. Hoppe, “Progressive simplicial complexes”, Com-

puter Graphics, Vol. 31, No. Annual Conference Series, pp. 217-

224 1997.

[40] G. Taubin,A. Gueziec, W. Horn, F. Lazarus, “Progressive for-

est split compression”, Computer Graphics, Vol. 32, pp.123-132,

1998.

[41] P. Alliez, M. Desbrun, “Progressive compression for lossless

transmission of triangle meshes”, Proc. of ACM SIGGRAPH,

pp. 198-205, 2001.

[42] P. Schroder, W. Sweldens, “Digital geometry processing”,

Course Notes, ACM SIGGRAPH, 2001.

[43] A. Khodakovsky, P. Schroder, W. Sweldens, “Progressive geom-

etry compression”, Proceedings of the 27th annual conference on

Computer Graphics and interactive techniques, pp 271-278, 2000

103

[44] A. Khodakovsky, I. Guskov, “Normal mesh compression”, Proc.

of the 27th Annual Conference on Computer Graphics and in-

teractive techniques, pp. 95-102, 2000.

[45] A. W. F. Lee, W. Sweldens, P. Schroder, L. Cowsar, D.

Dobkin. “MAPS: Multiresolution adaptive parameterization of

surfaces”, Computer Graphics, Annual Conference Series,

Vol.32, pp.95104, 1998.

[46] C. Loop, Smooth subdivision surfaces based on triangles, Mas-

ter’s Thesis, University of Utah, Department of Mathematics,

1987.

[47] N. Dyn, D. Levin, J. A. Gregory, “A butterfly subdivision scheme

for surface interpolation with tension control”, ACM Transac-

tions on Graphics, Vol. 9-2 pp. 160-169, 1990.

[48] M. Garland and P. Heckbert, “Surface simplification using

quadric error metrics”, Proc. of ACM SIGGRAPH, pp. 209-216,

1997.

[49] A. Said and W. A. Pearlman, “A new fast and efficient im-

age codec based on set partitioning in hierarchical trees, IEEE.

Trans. Circ. Syst. Video Tech., Vol. 6, pp. 243–250, 1996.

[50] O. N. Gerek, A.E. Cetin, “Adaptive polyphase subband decom-

position structures for image compression” IEEE Transactions

on Image Processing, Vol. 9 No. 10 pp. 1649-1660, 2000

[51] J.M. Shapiro, “Embedded image coding using zerotrees of

wavelets coefficients”, IEEE Trans. Signal Processing, Vol. 41,

pp. 3445-3462, 1993

[52] JPEG 2000 Part I Final Committee Draft, Version 1.0, 2000.

104

[53] http://mathworld.wolfram.com/Projection.html.

[54] C. Guillemot, A.E. Cetin, and R. Ansari “M-channel nonrectan-

gular wavelet representation for 2-D signals: basis for quincunx

sampled signals” Proc. of International Conference on Acous-

tics, Speech, and Signal Processing (ICASSP), Vol. 4, pp. 2813-

2816, 1991.

[55] S. Arivazhagan, D. Gnanadurai, J.R. Antony Vance, K.M. Saro-

jini, and L. Ganesan. “Evaluation of zero tree wavelet coders”,

Proc. of International Conference on Information Technology:

Computers and Communications, p. 507, 2003.

[56] C. Valens, EZW encoding, Available at

http://perso.wanadoo.fr/polyvalens/clemens/ezw/ezw.html.

[57] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press.,

1999.

[58] G. Strang, T. Nguyen Wavelets and Filter Banks Wellesley Cam-

bridge, 1996.

[59] M. D. Adams “The JPEG-2000 Still Image Compression Stan-

dard”.

[60] D. Novosel, M.Kovac “JPEG2000 software implementation”

Proc. of 46th International Symposium Electronics in Marine

- ELMAR-2004, pp. 573-578, 2004

[61] J.-L. Gailly, gzip Compression Software.

[62] J. Ziv and A. Lempel, ”A universal algorithm for data com-

pression”, IEEE Trans. on Information Theory, Vol. 23, No. 3,

pp. 337-343, 1977.

105

[63] P. Cignoni, C. Rocchini, and R. Scopigno, “Metro: measuring er-

ror on simplified surfaces”, Computer Graphics Forum, Vol. 17,

No. 2, pp. 167-174, 1998.

[64] S. Gumhold, W. Strasser, “Real-Time compression of triangular

mesh connectivity”, Proc. of ACM SIGGRAPH, pp 133-140,

1998.

[65] M. Isenburg, P. Alliez “Compressing Polygon Mesh Geometry

with Parallelogram Prediction” IEEE Visualization, pp. 141-146,

2002.

[66] R. Pajarola and J. Rossignac, “Compressed progressive meshes”,

IEEE Trans. on Visualization and Computer Graphics, Vol. 6,

No. 1, pp. 79-93, 2000.

[67] H. Hoppe, E. Praun, “Shape compression using spherical geom-

etry images”, N. Dodgson, M. Floater, M. Sabin (Eds.), Ad-

vances in Multiresolution for Geometric Modelling, Springer-

Verlag, pp. 2746, 2005.

[68] G. Rote, “Computing the Minimum Hausdorff Distance Between

Two Point Sets on a Line Under Translation”, Information Pro-

cessing Letters, Vol. 38, No. 3, pp. 123-127, 1991.

[69] N. Aspert, D. Santa-Cruz and T. Ebrahimi,“MESH: Measuring

Error between Surfaces using the Hausdorff distance”, Proc. of

the IEEE International Conference on Multimedia and Expo

(ICME), Vol. 1, pp. 705-708, 2002

106

Documents

3D MODEL COMPRESSION USING IMAGE COMPRESSION BASED …kilyos.ee.bilkent.edu.tr/~signal/Theses/3DModelComp.pdfI certify that I have read this thesis and that in my opinion it is fully