10
Computers & Graphics 28 (2004) 15–24 Efficient representation and streaming of 3D scenes J. Sahm*, I. Soetebier, H. Birthelmer Fraunhofer Institut f . ur Graphische Datenverarbeitung, Abteilung f . ur Animation und Bildkommunikation, Fraunhoferstr. 5, Darmstadt 64283, Germany Abstract In the last few years several approaches (Proceedings of the Second International Workshop on Distributed Interactive Simulation and Real-Time Application, 1998, pp. 88–91; EUROGRAPHICS 2001, vol. 20(3) 2001) have been presented, which address the transmission and visualization of 3D scenes on distributed devices with different capabilities. These techniques are in the interest of a wide field of applications such as virtual chatrooms, product presentations, CAVE visualizations, 3D simulations, or 3D online games. In order to adapt the immense data effort to the capabilities of the devices and the networks, two basic problems have to be solved: the selection of the information according to the user’s interest and the reduction of the selected data. Usually the first problem is reduced to visual aspects, which can be determined by visibility culling algorithms. This paper concentrates on the second problem and introduces a system in order to stream the data of even large 3D scenes to remote devices. r 2003 Elsevier Ltd. All rights reserved. Keywords: Curve, surface, solid, and object representations; Distributed data structures; Distributed/network graphics; Distributed virtual environment 1. Introduction Due to advancements in 3D visualization in conse- quence of more powerful hardware, distributed 3D graphics has become more and more interesting for a wide field of industrial activities. Consequently, applica- tions such as virtual chatrooms, product presentations, CAVE visualizations, 3D simulations, or 3D online- games are an important factor of industrial economy. Unfortunately most of these techniques have their drawbacks: virtual chatrooms [1] typically lack of graphics quality and detail, product presentations concentrate on very simple 3D scenes with only a few elements [2,3], CAVE visualizations require very ex- pensive hardware, and 3D simulations and 3D online games [4] do not make use of the transmission of graphical information. Although most of these applica- tions provide level of detail (LOD) concepts, they do not support an efficient and exact adaptation of the data effort and the data itself to the capabilities of the devices. For that reason mobile devices such as laptops or palmtops are still not considered by many software systems. In almost the same manner it is unsatisfying, if the capabilities of the user’s PC or workstation are not exploited. In order to adapt the data to capabilities such as memory size, computing power, graphics support, and bandwidth, two basic approaches can be identified. The first approach is to select only small parts of the scene for transmission or visualization, which are currently in the user’s interest [5]. Usually this problem is constricted to visual aspects, so visibility culling algorithms [6] can be used for the computation of these areas. The second approach is to reduce the data effort by modifying the elements of a scene, for example with the help of multi- resolution techniques [7]. This paper concentrates on the second approach and introduces a client–server system for the representation, transmission, and visualization of distributed 3D scenes even on mobile devices. 1.1. Content The rest of the paper is structured as follows: First, an overview of related work is given. This is followed by the ARTICLE IN PRESS *Corresponding author. Tel.: +49-6151-155-645; fax: +49- 6151-155-139. E-mail address: [email protected] (J. Sahm). 0097-8493/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.cag.2003.10.014

Efficient representation and streaming of 3D scenes

  • Upload
    j-sahm

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient representation and streaming of 3D scenes

Computers & Graphics 28 (2004) 15–24

ARTICLE IN PRESS

*Correspond

6151-155-139.

E-mail addr

0097-8493/$ - se

doi:10.1016/j.ca

Efficient representation and streaming of 3D scenes

J. Sahm*, I. Soetebier, H. Birthelmer

Fraunhofer Institut f .ur Graphische Datenverarbeitung, Abteilung f .ur Animation und Bildkommunikation,

Fraunhoferstr. 5, Darmstadt 64283, Germany

Abstract

In the last few years several approaches (Proceedings of the Second International Workshop on Distributed

Interactive Simulation and Real-Time Application, 1998, pp. 88–91; EUROGRAPHICS 2001, vol. 20(3) 2001) have

been presented, which address the transmission and visualization of 3D scenes on distributed devices with different

capabilities. These techniques are in the interest of a wide field of applications such as virtual chatrooms, product

presentations, CAVE visualizations, 3D simulations, or 3D online games. In order to adapt the immense data effort to

the capabilities of the devices and the networks, two basic problems have to be solved: the selection of the information

according to the user’s interest and the reduction of the selected data. Usually the first problem is reduced to visual

aspects, which can be determined by visibility culling algorithms. This paper concentrates on the second problem and

introduces a system in order to stream the data of even large 3D scenes to remote devices.

r 2003 Elsevier Ltd. All rights reserved.

Keywords: Curve, surface, solid, and object representations; Distributed data structures; Distributed/network graphics; Distributed

virtual environment

1. Introduction

Due to advancements in 3D visualization in conse-

quence of more powerful hardware, distributed 3D

graphics has become more and more interesting for a

wide field of industrial activities. Consequently, applica-

tions such as virtual chatrooms, product presentations,

CAVE visualizations, 3D simulations, or 3D online-

games are an important factor of industrial economy.

Unfortunately most of these techniques have their

drawbacks: virtual chatrooms [1] typically lack of

graphics quality and detail, product presentations

concentrate on very simple 3D scenes with only a few

elements [2,3], CAVE visualizations require very ex-

pensive hardware, and 3D simulations and 3D online

games [4] do not make use of the transmission of

graphical information. Although most of these applica-

tions provide level of detail (LOD) concepts, they do not

support an efficient and exact adaptation of the data

effort and the data itself to the capabilities of the

ing author. Tel.: +49-6151-155-645; fax: +49-

ess: [email protected] (J. Sahm).

e front matter r 2003 Elsevier Ltd. All rights reserve

g.2003.10.014

devices. For that reason mobile devices such as laptops

or palmtops are still not considered by many software

systems. In almost the same manner it is unsatisfying, if

the capabilities of the user’s PC or workstation are not

exploited.

In order to adapt the data to capabilities such as

memory size, computing power, graphics support, and

bandwidth, two basic approaches can be identified. The

first approach is to select only small parts of the scene

for transmission or visualization, which are currently in

the user’s interest [5]. Usually this problem is constricted

to visual aspects, so visibility culling algorithms [6] can

be used for the computation of these areas. The second

approach is to reduce the data effort by modifying the

elements of a scene, for example with the help of multi-

resolution techniques [7]. This paper concentrates on the

second approach and introduces a client–server system

for the representation, transmission, and visualization of

distributed 3D scenes even on mobile devices.

1.1. Content

The rest of the paper is structured as follows: First, an

overview of related work is given. This is followed by the

d.

Page 2: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–2416

concept, which starts with the requirements of the

system. Furthermore the concept explains, how the

system fulfills these requirements. After that, the

implementation will be discussed, followed by the results

in comparison to other approaches. Finally, the paper

ends with a conclusion.

1.2. Related work

There are already some approaches to distribute 3D

graphics over networks. Those 3D environments depend

heavily on network connection, so it is an important

design criteria to represent scene information in an

efficient way to keep network traffic as low as possible.

Furthermore, the representation has to adapt to the

capabilities of the client devices.

Some virtual environments use a peer-to-peer-based

network topology. Because there is no central server, the

management of the scene is distributed on all devices

which are involved in the virtual environment. Examples

for those environments are NPSNET [8]. It was

developed for a large number of users in military

training and is based on the DIS protocol that uses

multicast IP. It uses a paradigm called players and ghost,

where each participant controls its own ‘player’, which

is replicated on all other participant as ‘ghost’. An

advancement to NPSNET is Bamboo [9]. Another peer-

to-peer-based environment is presented by Broll [10].

The ASCII-based virtual reality modeling language

(VRML) is used as scene description. The scene

description of the distributed interactive virtual environ-

ment ðDIVEÞ [11] is stored in a database, which isreplicated on each client. The scalable platform for large

interactive network environments ðSplineÞ [12] is a toolkitfor creating large scale multi-user environments. Its

world model is stored in an object-oriented database. A

remarkable application built with Spline is Diamond-

Park [12]. The virtual environment operating shell

ðVEOSÞ [13] is a multi-user environment, which uses apeer-to-peer architecture without the multicast option. It

is a system for rapid prototyping of virtual environ-

ments.

Distributed 3D graphics environments, which are

based on client–server communication, are able to use a

central scene description. Examples are the AVIARY VR

system presented by Snowdon et al. [14]. The virtual

society system from Lea et al. [15] is another client–

server-based virtual environment. Its 3D world are

designed using VRML and they are distributed using

an own communication protocol called virtual society

client protocol (VSCP). A programming toolkit for

creating multi-user environments is theMR ToolKit [16].

For distributing the 3D world informations it uses a

client–server-based shared memory abstraction. MacIn-

tyre et al. [17] proposed COTERIE. It distributes shared

objects by replication using a client–server topology.

Another virtual environment for a large number of users

using client–server communication is Ring presented by

Funkhouser [18]. The network graphics framework

ðNGF Þ [19] is an adaptive framework for transmitting3D graphics over networks. It considers several proper-

ties, like capabilities of network, server and client or user

preferences to choose the appropriate method for

transmission. Also a client–server-based approach with

a central scene description is presented by Teler et al.

[20]. Different to other virtual environments is the

approach to transfer an impostor-based representation

of the objects of the scene to have a fast response to the

user’s navigation. HOUCOM [21] is a framework for

creating cooperative applications and groupware appli-

cations. For this, it provides basic services, that can be

extended by an application developer with application-

specific functionality. A possible application could

be a cooperative virtual environment, but in opposite

to the presented system, HOUCOM is not specialized in

distributing 3D information to several user devices with

different capabilities.

Commercial frameworks for virtual environment are

for example the WorldToolKit [22] by Sense8 and

OpenGL Vizserver by SGI [23]. Together with the

World2World and the WorldUp extensions, the World-

ToolKit provides a development environment for client–

server-based virtual environments. A different approach

is used by OpenGL Vizserver. Though it is a client–server

solution, all rendering is done on the server, which is a

SGI supercomputer. The images then, are transferred to

the client. This offers high-quality graphics, but this

approach needs a very powerful and therefore expensive

server, especially if the number of concurrent clients is

very high.

Some general research on virtual environments is

presented by Benford et al. [24], a survey of existing

virtual environments was presented by Meehan [25].

2. Concept

This section introduces the concept of the client–

server system.

2.1. Requirements

The basic requirement is the transmission of a 3D

scene from a server system to multiple clients. The

complexity of the 3D scenes should reach from only a

few elements up to several thousands. Elements can

represent simple primitives such as lines, triangles, or

cubes but even more complex structures such as houses,

cars, or robots. In opposite to the approaches of

transferring server-rendered frames or detail levels (see

[20,23]) the system should transmit the elements’ data

containing vertices, normals, textures, colors, etc. There

Page 3: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–24 17

are several reasons for this requirement: The rendering

of complete frames or even the elements’ lowest level of

detail demands an excessive running power if applied for

multiple clients. With each registered client the load of

the server increases dramatically. So a very expensive

graphics-able server is necessary. Furthermore, the

transmitted frames or bitmaps represent only 2D

information and lack of interaction and transformation

possibilities.

Another important requirement is the ability to adapt

the scenes’ data effort to the capabilities of the clients

precisely. For that reason the server should not only

provide some precalculated levels of detail for each

element, but a scene representation, which allows a fast

and accurate on-the-fly-determination of the best fitting

data set for a specific client. In consequence to the

adaption requirement the clients usually hold only a

subset of the server’s scene information. This subset

changes dynamically with the user’s view position and

view direction.

Since even the complete transmission of the optimal

data set may require several seconds, it should be possible

to visualize the received data on the client side, although

the transmission has not been finished. So the scene and

the element representation has to be refineable and

renderable simultaneously on the client side in real time.

2.2. Element representation

An element of a 3D scene can contain multiple

information representing different media types such as

vertices, normals, texture coordinates, textures, or

colors. While vertices, normals, and texture coordinates

belong to the geometrical information, textures and

color arrays describe image information. Another

important information is the connectivity of the vertices,

which is denoted as the topological information of an

element in the following.

The visual appearance of a single element is described

with the help of an Element Graph. The idea behind the

VertexArrayNode

ColorFacetNode

NormalArrayNode

ITr

GroupNode

Fig. 1. An example of an Element Graph. Since the Vertex Array Node

they have the same color. While the Indexed Triangle Set Node refere

a transformation matrix. The Group Node works similar to the group

Element Graph is similar to other scene graph APIs such

as Open Inventor [26] or IRIS Performer [27]. Each node

of the Element Graph represents a specific data type, e.g.

the Vertex Array Node in Fig. 1 points to the vertices of

the element. Properties such as normals or texture

coordinates can be defined per vertex or per facet, what

is indicated by different node types. While Array Nodes

reference per vertex information, Facet Nodes point to

per facet data (e.g. the Color Facet Node in Fig. 1). In

order to render an Element Graph to file, network, or

frame buffer, the Element Graph is traversed in a DFS

order from left to right. Each time a node is visited its

state is set and remains valid until it is redefined by

another visited node of the same type. This principle

implies state engines such as OpenGL.

2.3. Scene representation

Since a 3D scene may contain thousands of elements,

the scene representation has to manage an according

number of Element Graphs. In order to avoid data

redundancy between the elements of a 3D scene, the

nodes of an Element Graph do not contain the data

directly (with the exception of Group Nodes and

Transform Nodes). Instead the information is stored in

several pools and the nodes only reference the entries of

these pools. For example, multiple Vertex Array Nodes

of different Element Graphs can share the identical

vertex data by pointing to the same pool entry as

illustrated in Fig. 2. According to the node types of the

Element Graph the system provides appropriate pool

types, e.g. a Geometry Pool for vertices, normals, or

texture coordinates, a Texture Pool for textures, a Color

Pool for colors, and a Topology Pool for the topological

information. In concern to the basic requirement of

Section 2.1 the pools provide a very helpful feature,

because they encapsulate the complete visual informa-

tion of a 3D scene in a compact and memory efficient

manner. So if the system is able to transmit the content

of the pools efficiently, then the requirements are

GroupNode

ndexediangleSetNode

TransformNode

and the Normal Array Node represent geometrical information,

nces the topological information, the Transform Node contains

concept of Open Inventor.

Page 4: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESS

GroupNode

VertexArrayNode

ColorArrayNode

NormalArrayNode

IndexedTriangleSet

Node

GroupNode

VertexArrayNode

TextureNode

TextureCoordFacet

Node

IndexedTriangleSet

Node

GeometryPool TopologyPoolColorPool TexturePool

ElementE2

ElementEn

ElementE1

3D Scene

Fig. 2. The two Element Graphs are identical with the exception, that the left Element Graph is colored per facet and the right one is

textured. So these Element Graphs can share the same geometry and topology information. Multiple elements are allowed to reference

the same Element Graph.

J. Sahm et al. / Computers & Graphics 28 (2004) 15–2418

almost fulfilled. Important information outside of the

pools are the element specific data such as the element’s

identification number (ID) and the Element Graph.

Since the clients typically hold only a subset of the 3D

scene (see Section 2.1), is not necessary to transmit the

elements’ spatial arrangement inside of the server’s scene

representation (e.g. a k-D-tree or an octree [28]). In fact,

the clients manage their own spatial arrangement.

2.4. Server scene preparation

Because of the adaption requirement in Section 2.1

the server’s scene representation has to be prepared in

order to determine the optimal data set for each

registered client fast and accurately. This preparation

is done by a progressive simplification algorithm, which

traverses all Element Graphs of the given 3D scene. The

algorithm memorizes all pool entries referenced by the

visited nodes of the current Element Graph. Each time a

node with topological information is reached, the

simplification is started. It is possible to apply the

preparation not only as a precalculation process but also

during runtime. In consequence the simplification

algorithm has to be very fast (see (13)). In return the

system is able to add new Element Graphs or to modify

existing Element Graphs and their information on-the-

fly. A restriction of the preparation is, that the elements

have to be represented by triangle meshes, i.e. the nodes

with the topological information inside of the Element

Graphs have to be Indexed Triangle Set Nodes.

The simplification bases on an edge collapsing

operation, which removes within each simplification

step one vertex and two triangles out of the mesh

represented by the current Vertex Array Node and the

current Indexed Triangle Set Node of the Element

Graph. If the Element Graph references additional data

such as normals or colors, then the per vertex informa-

tion is treated in the same way as the vertices and the per

facet data in the same way as the triangles. In order to

determine the removal sequence (priority queue) of the

vertices, for each vertex an error value is calculated with

the help of an error metric. The vertices with the lowest

error values are removed at first, followed by a

recalculation of the error values of the remaining

vertices affected by this operation. Because of speed

the calculation of the error metric has to be very fast.

Fig. 3 outlines how the error metric calculates an error

value for each vertex. First, the normals for the triangles

are calculated. They will be not normalized, the length

of the normal vector is proportional to the area size

of the triangle. Then, the normals are moved to one

point and an axis-aligned bounding box containing all

Page 5: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESS

V1

V2

T1

T2

V1

Fig. 3. The left side illustrates the metric used for calculating an error value per vertex. The right side shows the basic operation of the

simplification process, the edge collapsing.

Pointer to Vertex 1

Pointer to Vertex 2

NULL

Pointer to Vertex n

Pointer to Triangle 1

Pointer to Triangle 2

NULL

Pointer to Triangle n

Vertex Index

Per Vertex Data

Triangle Index

Per Facet Data

Occurence List

Triangle 1

Triangle 2

X Y Z

I0 I1 I2

Normal Color

Texture Coords

Normal Pool Color Pool

Vertex

Triangle

Texture Coords Pool

Vertex Pointer Array

Triangle Pointer Array

Pool Vertex Array

Pool Triangle Array

Fig. 4. All not colored data structures are temporary created by the simplification. The colored structures represent the pool entries.

The Pool Triangle Array is a pool entry inside of the TopologyPool and contains indices to the vertices inside of the Pool Vertex Array.

J. Sahm et al. / Computers & Graphics 28 (2004) 15–24 19

normals will be defined. For the bounding box, its

diagonal and the volume will be calculated. The error

value results from the diagonal, the volume and from the

number of the triangles which encircle the vertex:

evertex ¼V2Bounding Box � L2Diagonal

Number of triangles:

In addition to the information inside of the pool entries

referenced by the Vertex Array Node and the Indexed

Triangle Set Node the simplification is in need of some

further temporary data structures, which are illustrated

in Fig. 4. For each vertex v inside of the Pool Vertex

Array the Vertex Pointer Array contains a reference to a

structure with additional information about v: Thisstructure stores the index of v inside of the Pool Vertex

Array, an occurrence list indicating the triangles with

v; and references to the per vertex data (normal, color,etc.). If a vertex v2 (see Fig. 3) is removed from the mesh,

then its representation inside of the Pool Vertex Array is

exchanged with the last vertex of the Pool Vertex Array.

Although the new last vertex v2 is not deleted from the

Pool Vertex Array the size of the Pool Vertex Array is

decreased by one. The size minus one indicates the

currently last entry of the array. The v2 entry inside of

the Vertex Pointer Array is deleted. In opposite to the

Pool Vertex Array the position of a vertex inside of the

Vertex Pointer Array never changes. For that reason the

priority queue references the entries of the Vertex

Pointer Array in order to determine the next to be

removed vertex. With the help of the occurrence list of

the removed vertex v2; the simplification identifies theaffected triangles. The processing of these triangles is

analog to the vertices: The triangle representations of

the two removed triangles T1 and T2 inside of the Pool

Triangle Array are exchanged with the last representa-

tions of the Pool Triangle Array. The size of the Pool

Triangle Array is decreased by two. The according

entries of T1 and T2 inside of the Triangle Pointer Array

Page 6: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–2420

are deleted. Due to the edge collapsing the v2; indices ofall other effected triangles inside of the Pool Triangle

Array are replaced by the index of v1: The position ofthe per vertex data inside of the appropriate pools

(e.g. the Normal Array Pool) is modified analog to the

Pool Vertex Array, the position of the per facet data is

changed analog to the Pool Triangle Array.

After the simplification the sizes of the Pool Vertex

Array, the Pool Triangle Array, and all other affected

pool entries indicate the base mesh M0 (notation of

Hoppe [29]), i.e. all data below the size limits represent

M0: The sequence of the data above the size limits,which is called the progressive data in the following,

corresponds to the order, in which the vertices or

triangles, respectively, were removed. In order to

differentiate between the data of the base mesh M0

and the progressive data, the server creates a new pool

entry inside of each affected pool, which references the

progressive data. Furthermore each node of the

simplified Element Graph gets a progressive partner

node (see Fig. 5). The original nodes reference the pool

entries containing the data of the base mesh and the

progressive nodes point to the pool entries with the

progressive data.

2.5. Transmission and client refinement

Because the progressive data reflects the removal

sequence, the original mesh can be restored on the client

side by applying the reverse operations of the simplifica-

VertexArrayNode

NormalArrayNode

Group

Node

ProgessiveVertexArrayNode

ProgessiveNormalArrayNode

GeometryPool

v1 ... vi

vi + 1 ... vk

n1 ... ni

ni + 1 ... nk

ColorPool

c1 ... cp

cp + 1 ... cn

Fig. 5. The simplification algorithm modifies the scene representation

entries, which store the data in a sequential order. The appropriate no

model M0:

tion process. The vertices and triangles are inserted into

the base mesh in the reverse order of the removal

sequence, i.e. the last removed vertex is inserted at first.

To do so, the vertex is added to the end of the Pool

Vertex Array and exchanged with the same vertex, with

which the vertex was exchanged during the simplifica-

tion. In that way the simplification’s position change

operations inside of the Pool Arrays are reversed, too.

The same procedure is applied to the Pool Triangle

Array and all other affected pool entries. The insert and

the exchange operation are fast and simple procedures,

so they are applicable even on weak devices.

Since the data are not inserted arbitrarily but in the

reverse order of the simplification, it is possible to refine

a mesh (i.e. the visual appearance of an element) vertex

by vertex or triangle by triangle, respectively. For that

reason the server is able to adapt the data effort to the

clients’ capabilities fast and accurately: If the server has

to transfer an element to a client, the server identifies the

according Element Graph and pool entries. Taking the

client’s capabilities into account, the server determines

the number of triangles nt; which should be transmittedto the client. The number of the triangles nt implies the

number of the vertices nv: Since the server alwaystransmits the base mesh M0; the numbers nt and nv have

to be greater than the according sizes n0t and n0v of the

base mesh. Usually M0 only contains a few vertices and

triangles, so it is renderable even on weak devices. After

the transmission of theM0 data, the server transmits the

first nv–n0v entries of the Pool Vertex Array and of all

ColorFacetNode

IndexedTriangleSet

Node

ProgessiveColorFacetNode

ProgessiveIndexed

TriangleSetNode

TopologyPool

t1 ... tp

tp + 1 ... tn

on the server side. So the Element Graphs contain progressive

des in front of their progressive partners contain the data of the

Page 7: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–24 21

pool entries with per vertex data to the client.

Furthermore, the server transmits the first nt–n0t entries

of the Pool Triangle Array and of all pool entries with

per facet data. Because of the progressive data, the

server is not in need to send the information in one data

package. Instead the information can be transferred step

by step. After receiving theM0 the client is able to refine

the mesh with each incoming package (Fig. 6).

A remarkable property of the Pool Vertex Array, the

Pool Triangle Array, and all other affected pool entries

is, that these data structures in comparison to the

pointer arrays do not contain any holes or deleted items,

respectively. So the pool entries always provide a

consistent representation of the mesh, even during

the simplification and the refinement process.

Thinking of output render pipelines such as OpenGL

and its interleaved array implementation, the pool

entries are always renderable, because the system only

has to pass the references to the pool entries’ data to the

pipeline.

2.6. Client scene reconstruction

As mentioned in Section 2.1, not only single elements

should be transmitted but complete 3D scenes. Fig. 2

illustrates the scene representation on the server side,

which avoids data redundancies by using the pool

concept. Furthermore multiple elements can share the

Fig. 6. The simplified Stanford bunny with texture, texture coordinat

68 000 triangles, the second model has ca. 34 000 triangles, the third m

Element Eu

Client Cu1,..., Cun

Element Ev

Client Cv1,..., Cvn

Element Ew

Client Cw1,..., Cwn

ElementGra

Client Cx1,.

ElementGra

Client Cy1,.

Fig. 7. The server creates a dependency graph in order

same Element Graph. This representation has to be

restored on the client side.

According to the view positions and view directions of

the clients the server selects the elements, which have to

be transmitted to a specific client. If an element already

exists on the client, then it is ignored. Each selected

element gets a priority, which depends on the element’s

distance to the client’s view point. As mentioned before

the pools contain the complete visual information of the

3D scene. With the help of the selected elements’

Element Graphs the server is able to identify the pool

entries, which are shared by multiple elements. These

pool entries have to be transferred only once. For that

reason the server creates a dependency graph, which is

illustrated in Fig. 7. This dependency graph is processed

by the server from left to right: At first the elements’

specific data such as the ID is transmitted to the

appropriate clients. After that the modified Element

Graphs including the progressive nodes are send. Since

some elements may share the same Element Graph, the

Element Graph is marked as SENT for a specific client

after its first transmission. In the next step the pool

entries referenced by the Element Graphs are trans-

mitted as explained in Section 2.5. Analog to the

Element Graphs shared pool entries are marked as

SENT. If a client does not support the content of a pool

entry (e.g. a mobile device may not support textures),

then this pool entry is ignored by the server. In addition

es, and normals. The first model represents the original with ca.

odel ca. 7000 triangles, and the fourth model ca. 1300 triangles.

ph Gx

.., Cxn

ph Gy

.., Cyn

PoolEntry Pa

Client Ca1,..., Can

PoolEntry Pb

Client Cb1,..., Cbn

PoolEntry Pc

Client Cc1,..., Ccn

PoolEntry Pd

Client Cd1,..., Cdn

PoolEntry P

Client Ce1,..., Cen

to avoid unnecessary transmission of information.

Page 8: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–2422

the according nodes inside of the Element Graph are

skipped during the transmission of the Element Graph.

Receiving the incoming element information (e.g. the

ID) the client creates a similar dependency graph. If an

Element Graph is received, then the client traverses the

Element Graph and creates for each base mesh node a

pool entry inside of the appropriate pool. With the help

of the progressive nodes the client identifies the pool

entries, which are expecting progressive data. After

traversing the Element Graph all progressive nodes are

removed from the Element Graph, in order to improve

the rendering efficiency of the Element Graph.

3. Implementation

The complete software is implemented in Cþþ usingOpenGL for the graphical output and the adaptive

communication environment (ACE) library for all

multi-threading and network aspects. Because the pools

are implemented as key-value-maps, each pool entry

needs a key or an ID, respectively. This ID is generated

by a 64bit CRC checksum algorithm, which processes

the data of a pool entry before the simplification. If an

entry’s ID is identical to the ID of another entry, then

the data of these entries is supposed to be equal. For

that reason the server can check for data redundancy

very fast. Besides the ID, each pool entry contains a list

of codec identifications, which are set by the designer of

the element. Codecs are responsible for encoding and

decoding the data of a specific media type and are used

for the transmission of the data. The codecs are

implemented as dynamic link libraries (DLL), which

can be loaded by the system dynamically. The topology

entries are processed by a codec, which is implemented

with the help of the zlib [30]. Because the zlib does not

work well with floating point values, all entries, which

contain floating point values, are handled by a codec

with a slightly modified zlib algorithm. Typical pool

entries of this category are all geometry and color

entries. Textures are encoded and decoded with a

wavelet transformation, provided by a modified DjVu

[31] library (Fig. 8).

Fig. 8. The left image is the original texture with 650� 300� 24 BPPtexture’s lowest LOD with ca. 1 KB (modified DjVu format).

4. Results

Software systems concerning distributed 3D graphics

can be divided into two basic approaches: applications

of the first approach render the scene on the server side

and transmit the resulting frames via video streaming to

the clients. Systems of the second approach transfer the

3D scene’s data including geometry, colors, and textures

to the clients, where the data is visualized. Another

possibility is to combine these two basic approaches into

a hybrid solution. Since the presented technology

belongs to the second approach, it is compared to

systems of the video streaming and the hybrid solution.

The video streaming approaches are represented by

SGI’s OpenGL Vizserver and the hybrid systems by the

solution of Teler et al. [20].

4.1. SGI Vizserver

Using SGI’s Vizserver the application’s processing is

transferred from the client side to the server. For that

reason the user is able to start applications, which

exceed the capabilities of his device. Since the client is

only required to decode and visualize the video stream,

the resulting feedback is of high graphics quality even on

weak computer systems such as mobile devices. As

another advantage the application’s data has not to be

adapted to the client’s capabilities but only the quality

and resolution of the video stream. Furthermore, this

principle is not restricted to the visualization of 3D

scenes. It is applicable to almost all kinds of software

systems. Because the server has to compute, render, and

visualize the application’s results for several clients

simultaneously, SGI offers a super computer (e.g. the

Origin-3900-Server, 4.5 million Euro) in combination

with the software. One of the Vizserver’s biggest

drawbacks is, that the server’s load increases with

each registered client dramatically. Why should the

clients do not process tasks of the application

within their constraints? It is unsatisfying, if a

high-end-client is used in the same matter as a Palmtop,

namely for the rendering of 2D bitmaps. Another

point are the restricted interaction possibilities of

and ca. 572 KB (TGA format). The right image represents the

Page 9: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–24 23

these 2D bitmaps in comparison to transferred

3D models.

4.2. Teler et al.

In opposite to the Vizserver software Teler et al. only

render the lowest LOD of an element (a so called

impostor) on the server and transmit the resulting image

to the client, while the other LODs are transferred as 3D

models. The idea is, that the rendering time of the lowest

LOD plus the streaming time of the rasterized image

results in faster response times as if transmitting a 3D

model. Similar to the video streaming approach this

solution is in need of a graphics-able server. Since the

visualization of even the lowest LOD can become very

expensive if applied for several elements and multiple

clients simultaneously, the server’s work can hardly be

done by a standard personal computer. Another

problem is, that the images of the lowest LOD look

rather inhomogeneous in comparison to the elements’

3D model visualizations. Because Teler et al. concen-

trate on the precalculation and the determination of the

user’s visual area of interest, they do not mention any

scene representation.

4.3. Presented approach

As an example of the second basic approach, the

presented client–server-system has to adapt the data

effort of the 3D scenes to the client’s capabilities.

Consequently, the visualization’s quality on devices with

less performance and graphics support is significantly

lower as it is in case of the Vizserver scenario (e.g. the

visualization on palmtops is almost restricted to wire-

frame models). On the other hand, the server of the

presented approach is not in need of rendering images

for multiple clients. So in the opposite to the solutions

above, the server has not to be graphics-able. Because

the transmitted elements are represented by 3D models

and not by 2D images, the clients can handle user

interactions such as transformation requests themselves.

Table 1

An overview of the approaches in table form

SGI Vizserver Teler et al.

Technique Server rendered frames Geometry and

Costs SGI super computer Graphics-able

Accuracy No adaption Progressive me

Compression Image compression Not supported

Image quality High quality on all devices Inhomogeneou

Load balancing Server based Server and clie

Load increasing proportional to client number Not proportio

Response time Depends on frame Depends on im

Interaction 2D images Impostors and

Additionally, the presented approach provides the

following novelties or benefits:

* A memory efficient progressive representation not

only of single elements but of complete 3D scenes,

which supports the adaption of the scene’s data in

concern to the clients’ capabilities accurate to a

triangle.* The server does not provide a few precalculated levels

of detail, but a continuously resolution of the scene’s

elements depending on their number of vertices and

triangles. Since the simplification process works very

fast because of the new metric (the Stanford bunny

requires ca. 1 s on an AMD Athlon 1333 MHz), this

resolution can be generated as a precalculation step

or on-the-fly. For that reason it is possible to add

new Element Graphs to the scene or to modify even

the existing meshes.* The simplification does not generate one progressive

stream for a single element including vertices, colors,

normals, etc., but preserves the separation of the pool

entries and their according media types. So it is

possible to process each media type with a specific

codec, which results in a better encoding and

decoding efficiency. If a client does not support a

media type, the server just ignores the appropriate

pool entries. In consequence there is not any

unnecessary transmission of unsupported data. Fi-

nally, the separation allows the restoration of the

memory efficient representation on the client side.* The system always provides a consistent representa-

tion of the scene on server and client, even during the

simplification and the refinement process. So the

elements can be modified and rendered simulta-

neously [32]. It is not necessary to convert the

elements’ representations between progressive sim-

plification data structures and render data structures

as in other algorithms. Furthermore, the refinement

process is very fast (ca. 100 000 triangles/s on a AMD

Athlon 1333 MHz), so it is applicable even on weak

devices (Table 1).

Presented Approach

impostors Geometry

server Personal computer

shes Accurate to a triangle

but integrable Media specific codecs

s impostors Low quality on weak devices

nts Server and clients

nal (except for the impostors) Not proportional

postors Depends on elements’ M0

3D models 3D models

Page 10: Efficient representation and streaming of 3D scenes

ARTICLE IN PRESSJ. Sahm et al. / Computers & Graphics 28 (2004) 15–2424

5. Conclusion

In this paper, a new 3D scene presentation was

introduced in order to transmit the visual information

from a server to a client. The scene representation

provides a memory efficient data management and

includes a progressive data format, which is generated

by a progressive simplification algorithm. This algo-

rithm separates the information of a scene element into

several streams according to the element’s data types. As

explained in Section 4 these streams can be handled in a

very flexible way. So it is possible to adapt the data

effort to the clients’ capabilities precisely. In the

opposite to the Vizserver software and the approach of

Teler et al the presented system does not make use of

server based image rendering and video streaming. For

that reason the server has not to be graphics-able and

can be represented by a standard personal computer.

Acknowledgements

This work was funded by the Heinz-Nixdorf-Founda-

tion.

References

[1] Active worlds, http://www.activeworlds.com.

[2] Kaon, http://www.kaon.com.

[3] O2c, http://www.o2c.de.

[4] Ryzom, http://www.ryzom.com/.

[5] Hesnia G, Schmalstieg D. A network architecture for

remote rendering. Proceedings of Second International

Workshop on Distributed Interactive Simulation and

Real-Time Applications; Montreal, Canada; 1998.

p. 88–91.

[6] Cohen-Or D, Chrysanthou Y, Silva C. A survey of

visibility for walkthrough applications. EURO-

GRAPHICS 2000, Course Notes; 2000.

[7] Klein R. Multiresolution representations for surfaces

meshes. Technical report, Wilhelm-Schickard-Institut,

GRIS, Universitt T .ubingen, Germany, 1997.

[8] Macedonia M, Zyda M, Pratt D, Barham P, Zesswitz S.

NPSNET: a network software architecture for large scale

virtual environments. Presence 1994;3(4):265–87.

[9] Watsen K, Zyda M. Bamboo—supporting dynamic pro-

tokols for virtual environments. Image Conference; Scott-

sdale, Arizona, USA; 1998.

[10] Broll W. Distributed virtual reality for everyone—a

framework for networked VR on the internet. IEEE

Virtual Reality Annual International Symposium 1997

(VRAIS’97), Albuquerque, NM, USA; 1997.

[11] Carlsson C, Hagsand O. DIVE—a Multi-user virtual

reality system. IEEE Virtual Reality Annual Symposium;

Seattle, USA, 1993.

[12] Waters R, Anderson D, Barrus J, Brogan D, Casey M,

McKeown S, Nitta T, Sterns I, Yerazunis W. Diamond-

Park and spline: a social virtual reality system with 3D

animation, spoken interaction and runtime modificability.

Technical Report TR-96-02a. Mitsubishi Electronic Re-

search Laboratory; 1996.

[13] Bricken W, Coco G. The VEOS Project. Presence 1994;

1(2):111–29.

[14] Snowdon D, West A. The AVIARY VR-system. A

prototype implementation. Sixth ERCIM Workshop,

Stockholm, Sweden; 1994.

[15] Lea R, Honda Y, Matsuda K, Hagsand O, Stenius M.

Issues in the design of a scalable shared virtual environ-

ment for the internet. Proceedings of the HICSS’97;

Hawaii; 1997.

[16] Shaw C, Green M, Liang J, Sun Y. Decoupled simulation

in virtual reality with the MR toolkit. ACM Transactions

on Information Systems 1993;11(3):287–317.

[17] MacIntyre B, Feiner S. Language level support for

exploratory programming of distributed virtual environ-

ments. Symposium on User Interface Software and

Technology, ACM UIST’96; Seattle, WA, USA; 1996.

[18] Funkhouser T. RING: a client-server system for multi-user

virtual environments. ACM Symposium on 3D Graphics;

Monterey, CA, USA; 1995. 85–92.

[19] Schneider B, Martin I. An adaptive framework for 3D

graphics over networks. Computers and Graphics

1999;23(6):867–74.

[20] Teler E, Lischinski D. Streaming of complex 3D scenes for

remote walkthroughs. EUROGRAPHICS 2001; Manche-

ster, UK; 2001;20(3).

[21] Schiffner N, Ruehl C. HOUCOM framework for colla-

borative environments. SPIE International Symposium on

Voice, Video and Data Communications, Hynes Conven-

tion Center Boston, USA; 1999.

[22] Sense8, ‘‘WorldToolKit’’. http://www.sense8.com, 1997.

[23] Silicon Graphics Inc., ‘‘OpenGL Vizserver 3.1’’, http://

www.sgi.com/software/vizserver, 2003.

[24] Benford S, Greenhalgh C, Rodden T, Pycock J. Colla-

borative virtual environments. Communications of the

ACM 2001;44(7):79–85.

[25] Meehan M. Survey of multi-user distributed virtual

environments. Course Notes: Developing Shared

Virtual Environments, SIGGRAPH’99; Los Angeles,

CA, USA; 1999.

[26] Strauss PS, Carey R. An object-oriented 3D graphics

toolkit. SIGGRAPH’92; Chicago, IL, USA; 1992.

341–9.

[27] IRIS Performer, http://futuretech.mirror.vuurwerk.net/

performer.html.

[28] Chang AY. A survey of geometric data structures for ray

tracing. Technical Report TR-CIS-2001-06, CIS Depart-

ment, Polytechnic University; 2001.

[29] Hoppe H. Progressive meshes. SIGGRAPH 1996. New

York: ACM; 99–108.

[30] zlib, http://www.gzip.org/zlib.

[31] DjVu, http://www.djvuzone.org/.

[32] Birthelmer H, Soetebier I, Sahm J. Efficient representation

of triangle meshes for simultaneous modification and

rendering. Proceedings of International Conference of

Computational Science 2003 (ICCS 2003), Springer,

Berlin, Heidelberg; 2003. 925–34.