17
* Corresponding author. Tel.: #49-381-4024-110; fax: #49- 381-4024-199. E-mail addresses: obimber@egd.igd.fhg.de (O. Bimber), men- carna@crcg.edu (L. Miguel Encarnac 7 a 8 o), stork@igd.fhg.de (A. Stork). Computers & Graphics 24 (2000) 851}867 Calligraphic Interfaces A multi-layered architecture for sketch-based interaction within virtual environments Oliver Bimber!,*, L. Miguel Encarnac 7 a 8 o", Andre H Stork# !Fraunhofer Institute for Computer Graphics, Joachim-Jungius-Strasse 11, 18059 Rostock, Germany "Fraunhofer Center for Research in Computer Graphics (CRCG), 321 S. Main St., Providence, RI 02903, USA #Fraunhofer Institute for Computer Graphics, Rundeturmstrasse 6, 64283 Darmstadt, Germany Abstract In this article, we describe a multi-layered architecture for sketch-based interaction within virtual environments. Our architecture consists of eight hierarchically arranged layers that are described by giving examples of how they are implemented and how they interact. Focusing on table-like projection systems (such as Virtual Tables or Responsive Workbenches) as human-centered output-devices, we show examples of how to integrate parts or all of the architecture into existing domain-speci"c applications * rather than realizing new general sketch applications * to make sketching an integral part of the next-generation human}computer interface. ( 2000 Elsevier Science Ltd. All rights reserved. Keywords: Sketch-recognition; Sketch-interpretation; Virtual reality; Human}computer interface; Software architecture 1. Introduction Leonardo da Vinci's drawings of machines and other objects illustrate one of the most fundamental purposes of sketches: the ability to communicate design and func- tionality to others. Nowadays, it is widely accepted that sketching is a form of critical, re#ective dialog that handles communication on one or more di!erent levels of abstraction simultaneously [1]. Various approaches have been taken to support this kind of dialog between humans and computers, and to build human}computer interfaces that are able to interpret such freehand sketches for di!erent purposes. In this context, the creation or reconstruction of 3D objects from 2D sketches is of major concern in many application areas. This so-called `pencil-and-papera approach is used for rapidly designing approximate three-dimensional scenes. While some systems analyze the orthographic or perspective projections to recon- struct 3D shapes that, based on psychological assump- tions, are most plausible to the human observer, others interpret 2D gestures while the objects are sketched. Within the last decade, the conceptual design phase has been increasingly supported by sketch systems that allow the expression of ideas on a computer-aided, but still human-centered basis. However, putting an empha- sis on sketching, most of these systems are sealed o! from real-world applications rather than being generally applicable as components. To prevent a separation between sketch systems and real-world applications, we propose the integration of current results into an architectural pattern that o!ers existing applications an individual utilization of sketch- ing within their user interface. Multi-layered architec- tural patterns are widely employed in many areas of soft- and hardware engineering. They o!er multiple levels of abstraction, component reuse, exchangeability and encapsulated reengineering of single components, the individual combination of components, and extendibility through pre-de"ned interfaces. With this article, we want to introduce a multi-layered architecture for sketch-based interaction within virtual 0097-8493/00/$ - see front matter ( 2000 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 7 - 8 4 9 3 ( 0 0 ) 0 0 0 8 8 - 1

A multi-layered architecture for sketch-based interaction within virtual environments

Embed Size (px)

Citation preview

Page 1: A multi-layered architecture for sketch-based interaction within virtual environments

*Corresponding author. Tel.: #49-381-4024-110; fax: #49-381-4024-199.

E-mail addresses: [email protected] (O. Bimber), [email protected] (L. Miguel Encarnac7 a8 o), [email protected](A. Stork).

Computers & Graphics 24 (2000) 851}867

Calligraphic Interfaces

A multi-layered architecture for sketch-based interactionwithin virtual environments

Oliver Bimber!,*, L. Miguel Encarnac7 a8 o", AndreH Stork#

!Fraunhofer Institute for Computer Graphics, Joachim-Jungius-Strasse 11, 18059 Rostock, Germany"Fraunhofer Center for Research in Computer Graphics (CRCG), 321 S. Main St., Providence, RI 02903, USA

#Fraunhofer Institute for Computer Graphics, Rundeturmstrasse 6, 64283 Darmstadt, Germany

Abstract

In this article, we describe a multi-layered architecture for sketch-based interaction within virtual environments. Ourarchitecture consists of eight hierarchically arranged layers that are described by giving examples of how they areimplemented and how they interact. Focusing on table-like projection systems (such as Virtual Tables or ResponsiveWorkbenches) as human-centered output-devices, we show examples of how to integrate parts or all of the architectureinto existing domain-speci"c applications* rather than realizing new general sketch applications* to make sketchingan integral part of the next-generation human}computer interface. ( 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Sketch-recognition; Sketch-interpretation; Virtual reality; Human}computer interface; Software architecture

1. Introduction

Leonardo da Vinci's drawings of machines and otherobjects illustrate one of the most fundamental purposesof sketches: the ability to communicate design and func-tionality to others. Nowadays, it is widely accepted thatsketching is a form of critical, re#ective dialog thathandles communication on one or more di!erent levels ofabstraction simultaneously [1]. Various approacheshave been taken to support this kind of dialog betweenhumans and computers, and to build human}computerinterfaces that are able to interpret such freehandsketches for di!erent purposes.

In this context, the creation or reconstruction of 3Dobjects from 2D sketches is of major concern in manyapplication areas. This so-called `pencil-and-paperaapproach is used for rapidly designing approximate

three-dimensional scenes. While some systems analyzethe orthographic or perspective projections to recon-struct 3D shapes that, based on psychological assump-tions, are most plausible to the human observer, othersinterpret 2D gestures while the objects are sketched.

Within the last decade, the conceptual design phasehas been increasingly supported by sketch systems thatallow the expression of ideas on a computer-aided, butstill human-centered basis. However, putting an empha-sis on sketching, most of these systems are sealed o! fromreal-world applications rather than being generallyapplicable as components.

To prevent a separation between sketch systems andreal-world applications, we propose the integration ofcurrent results into an architectural pattern that o!ersexisting applications an individual utilization of sketch-ing within their user interface. Multi-layered architec-tural patterns are widely employed in many areas of soft-and hardware engineering. They o!er multiple levels ofabstraction, component reuse, exchangeability andencapsulated reengineering of single components, theindividual combination of components, and extendibilitythrough pre-de"ned interfaces.

With this article, we want to introduce a multi-layeredarchitecture for sketch-based interaction within virtual

0097-8493/00/$ - see front matter ( 2000 Elsevier Science Ltd. All rights reserved.PII: S 0 0 9 7 - 8 4 9 3 ( 0 0 ) 0 0 0 8 8 - 1

Page 2: A multi-layered architecture for sketch-based interaction within virtual environments

environments to bene"t from the inherent advantagesmentioned above. We describe each layer of the architec-ture using examples illustrating its implementation.Furthermore, we present a variety of domain-speci"capplications of sketching within virtual environmentsbased on our architecture instead of implementing yetanother application for sketching.

2. Previous work on sketching

Brown University's Sketch system [2] is an example ofan early development that processes 2D strokes whilethey are sketched on the "lm plane to create prede"ned3D primitives.

Since Sketch supports the creation of simple CSG-likeprimitives, its concept has been extended towardsfreeform modeling. Teddy [3], for instance, is anotherdesktop-based system that allows the creation of3D polygonal freeform surfaces from sketched 2D silhou-ettes.

STILTON [4] is yet another desktop-system thatallows the construction of three-dimensional geometryfrom 2D-perspective (orthographic) straight lines. Thesystem can also be used to create approximated 3Dscenes on top of a photograph of a real environment (bydrawing over it) that contains minimal geometricinformation (such as the ground plane) or an existingVRML model.

Sketching three-dimensional scenes on a two-dimen-sional basis forces the user to arti"cially mediate thecorrect view and perspective in terms of giving an impres-sion of the missing third dimension. This causes ambiguitywhile interpreting geometric properties of the sketchedobjects, such as type, position, alignment, size, etc.

In Sketch, this problem is solved by using a defaultparallel projection and by constraining the user. Forinstance, various aspects of most prede"ned gesturalprimitive representations can only be sketched axis-alig-ned with respect to existing objects and to the currentview of the 3D scene. Aligning new objects with existingones throughout the sketching process is supportedwhenever possible.

Similar to Sketch, STILTON forces the user to sketchover existing objects in terms of using the alignmentinformation for sketch interpretation. STILTON,however, does not create prede"ned primitives (as Sketchdoes) but uses a set of straight-line strokes to constructany geometry. This is achieved by making heuristicassumptions that are represented by a linear combina-tion of objective functions (e.g. face planarity, minimalstandard deviation of angles, face alignment, etc.). Thecombined objective functions are then minimized usinggenetic algorithms (GAs).

Teddy [3] only allows users to create and modifysingle objects, which must be topological equivalents to

a sphere. The drawn 2D silhouette of the object isautomatically in#ated in both the negative and the posit-ive Z-direction to a size that depends on the distancebetween the neighboring regions on the silhouette. Thus,wide areas become fat and narrow areas become thin.The object can then be modi"ed by using the supportedgeometry operations (see below).

Besides object creation, freehand-sketches are alsoused to perform other tasks, such as object selection,interaction and manipulation, and system control.Sketch, for instance, gives users direct interaction withexisting objects, indicated by click-and-drag actions. Inaddition, the transformations can be restricted by sketch-ing corresponding constraints in advance. Teddy allowsthe application of speci"c geometry operations (such ascutting, extrusion, smoothing, and mesh transforma-tions) to the freeform objects that are indicated bysketches; scribbled strokes are used to erase objects.Extracted from the underlying photograph, STILTONo!ers the possibility of automatically mapping textureonto the corresponding geometry.

While the systems described above support strictlytwo-dimensional desktop environments (e.g. screen,mouse or pen/pad-like devices), large and immersiveprojection systems* virtual workbenches, virtual walls,surround screen projection systems (e.g. CAVEs orRAVEs), augmented environments, etc. * o!er three-dimensional interaction.

Sachs' 3-Draw System [5] is one of the pioneeringworks that o!ered sketching directly within the 3D free-space. Supporting two-handed interaction, Sachs useda tracked pad and a tracked stylus to outline objectcontours with three-dimensional curves (either freeform,constrained, or re#ected). The virtual object was attachedto the pad, thus the pad served as an interactive sketch-and object palette. In addition to the drawing of curves,the editing and "tting of linked curves was supported. Incontrast to later approaches (see below), the 3-Drawsystem consisted of a non-immersive desktop workspace(a CRT screen) that neither supported head-tracking norstereoscopic projection.

With ErgoSketch [6], Brown University adapted theirSketch system to an ActiveDesk, a variant of the Respon-sive Workbench. It supports two-handed interactiontasks, such as 3D modeling with 2D gesture lines (sup-ported by the Sketch interface), non-dominant hand cam-era control, object-in-hand metaphor, tool glasses [7]and magic-lens interaction [7,31]. Sketching is still per-formed in a non-stereoscopic (monoscopic 2D) mode onthe desk's surface, using a lightpen. However, to interactwith the scene in 3D, a stereoscopic (non-head-tracked)mode is automatically activated, triggered by the type oftool being used. One interaction tool is a spatiallytracked physical prop that serves as a proxy for virtualobjects (i.e. the virtual objects are attached to it andfollow its six-degrees-of-freedom (6DOF) motions). In

852 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 3: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 1. Multi-layered architecture for sketch-based interaction within virtual environments.

addition, a trackball is used (e.g. to support cameracontrol), and speech recognition is applied to "re simplecommands (e.g. to activate a color-picker).

Surface Drawing is an approach by Schkolne et al. [8]that supports the creation of three-dimensional freeformgeometry by hand gestures at a table-like rear-projectiondisplay. The user wearing a data-glove can sketch surfa-ces freely in 3D space. While doing that, the systemsamples the position data and generates a mesh surfaceusing a fast triangulation scheme.

Forsberg et al. [9] also address 3D curve creation ata table-like display device, supported by sketching. Al-though there is a variety of related work that has notbeen mentioned, the referenced systems represent a goodstate-of-the-art cross-selection and will be used for com-parison throughout this article.

3. The architecture

Our architecture (cf. Fig. 1) consists of eight hierarchi-cally arranged layers, which are described below bygiving examples of how they are implemented. Each layercan interact with its direct upper or lower neighbor,whereby every layer can be deactivated, making the nextactivated layer a direct neighbor. This modularizationconcept o!ers applications the opportunity for individualutilization of the required functionality.

The core layers (emphasized in Fig. 1) contain theintelligent parts of the architecture, which are beingwidely implemented by applying methods of arti"cialintelligence.

Each layer (or single component within the layers) canbe updated by alternative and improved versions, thus,an adaptation of the architecture to an ongoing evolu-tion of the components (e.g. caused by technologicaldevelopments) can be supported.

The limited dependencies between layers and compo-nents also allow for distributed processing. In contrast to

a centralized approach, a distributed modality process-ing, for instance, o!ers an extensive speedup of theapplication, as well as the utilization of heterogeneoussoftware and hardware. In our example, both appliedmodalities (speech and gestures) are independentlyanalyzed on di!erent processors (as described in Sections3.3 and 3.5), and are merged on a central node (asdescribed in Section 3.6.)

For our discussion, we will assume the architecture ontop of the following technology: A workbench-like pro-jection system (a BARCO Baron Virtual Table [10])serves as output device in our current setup. In addition,we make use of stereoscopic viewing supported by shut-ter-glasses and head tracking using an electromagnetictracking-device. Our aim is to make sketching function-ality usable for interaction within virtual environments.

3.1. Interaction device

The interaction device represents the lowest level thatsits right on top of the hardware-technology. It stronglyin#uences the &interaction techniques' layer above it.However, it is also related to the applied hardwareunderneath. Featuring two-handed interaction, we usea transparent Plexiglas pad and a pen as interactiondevices. The 3D graphics that are projected on theVirtual Table's display is used to augment the pad[11,12]. Fig. 2 illustrates how to use our setup as atranslucent sketchpad [13] to draw two-dimensionalfreehand-sketches on it.

Both devices are tracked within their 6DOF and canbe used separately or in combination. In combination,the pad can provide tactile feedback, while simulta-neously taking advantage of the user's innate ability ofknowing precisely where both hands are relative to eachother (kinesthetic). A similar setup has been applied bySachs [5] for the same reasons. However, due to a di!er-ent projection technology (non-stereo desktop-display),he used an opaque pad. Other opaque pad and pen

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 853

Page 4: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 2. Translucent sketchpad.

Fig. 3. 3D freehand sketch attached to the pen's tip.

1 IMGR is available for non-commercial usage. Please con-tact the authors for further information.

combinations can also be found at some immersive orsee-through head-mounted-display (HMD) based sys-tems. The Virtual Notepad [14] is an example of a sys-tem that o!ers handwriting in immersive virtualenvironments. The personal interaction panel (PIP) [15]is yet another example of an opaque pad and pen that areused with an augmented reality application, calledStudierstube [16].

3.2. Low-level interaction

Based on the utilized interaction devices, this layerimplements several interaction techniques that supportthe sketching process within the 3D-freespace. Sincetwo-dimensional or three-dimensional virtual objectsand virtual sketches should coexist, we aim at providingequivalent interaction techniques for all of them. Notethat the term &low-level interaction' does not mean&simple-to-realize interaction', but refers to the positionof this layer within the architecture.

Holding the pen with the dominant hand and the padwith the non-dominant hand o!ers the user two-handedinteraction. In contrast to the approaches of theErgoDesk [6] to support a seamless transition between2D and 3D, we o!er an embedding of 2D in 3D providingthe possibility of constraining the sketching process tothe two-dimensional pad area. This has also beenrealized in other systems, such as the Virtual Notepad[14].

On the one hand, the 2D freehand-sketches areattached to the pad, which, therefore, serves as an inter-active sketch-palette [13]. On the other hand, sketchescan be drawn directly within the 3D-freespace (similar toSachs' 3-Draw [5]). 3D freehand sketches can either beattached to the pad or directly transformed with the penin the 3D space (Fig. 3).

Objects and sketches that are attached to the localcoordinate system of the pad can be intuitively placed at

any position within the global coordinate system of thetable via 3D drag-and-drop functionality. For more pre-cise sketching, we apply a stroke-snapping constraint,enabling the virtual representation of the pen (or tip) tosnap to any point on any previously drawn stroke. Vis-ualizing virtual representations of the interaction devices(i.e. the pen and the pad) is important to avoid con#ictscaused by tracking misalignments (distortion and calib-ration inaccuracy). In contrast to their real counterparts,the virtual representations do re#ect misalignments, thus,they can be taken into account while sketching andinteracting.

3.3. Dynamic gesture recognition

The layer &dynamic gesture recognition' is imple-mented by our interface for motion-based gesture recog-nition (IMGR).1

IMGR is implemented as a generic C## templatelibrary and o!ers the possibility of recognizing singlemultidimensional motions [17]. (Fig. 4 illustrates thehierarchically layered IMGR concept that is embeddedwithin our dynamic gesture recognition layer.) Whileproviding the necessary set of core functionality, IMGRwas designed to be completely con"gurable andextensible.

Multidimensional stroke samples (by default 2D, 3Dor 6DOF) are the basis for the recognition of dynamicgestures. While constrained sketching on the sketchpad,for instance, generates two-dimensional stroke samples,unconstrained 3D-sketches pass 3D samples to IMGR.Although the recognition of 6DOF motions is given, wehave not yet implemented an application for this.

The stroke enhancement module o!ers a palette ofsmoothing "lters (such as one-dimensional Gaussian- or

854 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 5: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 4. Multi-layered IMGR concept.

average sliding window kernels) to eliminate peaks andhigh-frequency sub-bands that are cuased by the trackingdistortion. This not only enhances and improves thesketching process, but also simpli"es classi"cation.

The "ltered samples are then used to compute ann-dimensional property space (a list of properties thatdescribe geometric or dynamic characteristics ofa stroke). The property space itself can be outlined byde"ning computation functions for each single property(e.g. geometric or dynamic properties) in advance. InIMGR, however, appropriate property spaces for 2D, 3Dand 6DOF gesture recognition have been de"ned bydefault and can be extended or re"ned by de"ning newcomputation functions through a standard interface.Examples for geometric properties are center of gravity,total length (see Eq. (1)), axial expansion and axialmotions, start and end points, etc. (see [17] for acomplete list of implemented properties). Examplesfor dynamic properties are average velocity or averageacceleration.

l"n~1+i/1

J(xi!x

i~1)2#(y

i!y

i~1)2#(z

i!z

i~1)2 (1)

(total length of a gesture stroke with n sample points).Note, that 38 computation functions of properties

have been de"ned for 3D dynamic gesture recognition inour current implementation.

Since the geometric or dynamic properties are not wellsuited for being used for classi"cation, an m-dimensionalfeature space (a list of features that describes geometric ordynamic behaviour of a stroke on a comparable level), isgenerated. To achieve good con"dence values and o!eran easy way of describing the classi"cation features, thefeature-space is build from fuzzy sets. To do so, we applythe same principle as for the property-space, and outlinethe feature space by de"ning membership functions for

the fuzzy sets through a standard interface. As for theproperty space, appropriate 2D, 3D and 6DOF featurespaces are prede"ned and can be extended and re"ned.An example for a feature is its relative straightness (seeEq. (2)). Since a stroke's length is at least as long as thediagonal of its bounding box, Eq. (2) describes thestroke's straightness on a relative (i.e. comparable) basis.Values around 1, for instance, indicate a relative straightprogression of the stroke, while values close to 0 indicatea more intricate and complex gesture.

rs"Jw2#h2#d2

l(relative straightness of a gesture stroke).

(2)

where w, h, d are the width, height, and depth of thestroke's bounding box.

Further examples for features are bounding-box ratios,relative (to bounding box) start and end points, relativestart and end directions, average #exion and overallstraightness, etc. (see [17] for a complete list ofimplemented features). Note, that about 27 computationfunctions of features have been de"ned for 3D dynamicgesture recognition in our current implementation.

The reason for di!erentiating between properties andfeatures is, that the features are much better suited forclassi"cation since they can be compared with features ofother gestures in terms of determining if they are similaror not. In contrast to properties (that express absolutevalues), features are de"ned in a relative (normalized)value-range, but have no further use, while the (mostlygeometric) properties can be accessed afterwards bylayers above the dynamic gesture recognition (e.g. tosupport sketch interpretation or high-level interaction).

We generate our knowledge base from a set of featurespaces i.e. a list of comparable gesture-speci"c compara-ble characterization criteria. Since we o!er di!erentclassi"cation methods (that can be extended as well)

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 855

Page 6: A multi-layered architecture for sketch-based interaction within virtual environments

2Please contact the authors for the complete grammar.

such as closest neighbor match (CNM) [17] or linearseparation implemented by a feed-forward perceptronneural network [18], which can also be extended orexchanged in order to satisfy di!erent recognition de-mands. We furthermore support di!erent knowledge-base representations corresponding to the classi"cationmethods accumulated gestures [17] or updated class-spe-cixc feature vectors [18].

Online and o%ine adaptive machine learning is sup-ported for all o!ered knowledge-base representations.The online training possibility enables users to train thesystem on demand (e.g. during runtime) and to specifytheir own gesture sets within the application.

The remaining component contains techniques tomaintain and improve the knowledge bases, such asprinciple component analyses (PCAs), that is imple-mented by a neural network build from linear associators[19,20] to reduce the contained redundancy.

3.4. Sketch recognition interpretation

In contrast to most of the mentioned systems (Sketch[2], ErgoSketch [6], STILTON [4], Teddy [3], and3-Draw [5]), we assume that sketches consist of asequence of dynamic gestures (i.e. freehand strokes)which can be recognized and trained on an adaptive levelby the user (i.e. no explicit programming for changing orextending the gestures is necessary), while the applicationis running.

Since every single gesture (i.e. stroke) is associated withan identi"cation number (generated by the dynamicrecognition layer) after classi"cation, we recognize repre-sentative multistroke sketches by parsing a sequence ofstrokes [21]. The sketch representations are de"nedwithin a context-free grammar (see Table 1, for example)that is used to generate a parser by applying a standardparser generator (such as Yacc or Bison [22]). The parserthat is speci"c to the sketch language (i.e. the set ofsupported representative sketches) is automatically in-tegrated into IMGR and is applied to perform sketchrecognition on a hierarchical basis (i.e. single strokes arerecognized and interpreted by IMGR and the extractedinformation * such as properties and identi"cationnumbers * are passed to the parser).

In addition, we make use of the semantic actions thatcan be de"ned for every single production within thegrammar to interpret the sketches. Since the generatedparser is embedded into IMGR, the semantic actions canaccess property-spaces that have been generated by theprevious layer to perform sketch interpretation.

Every parsed production generates a node of a parsetree that contains the interpreted information (mostlyaccumulated geometric information) of a sketch compon-ent, and passes it (using the build-in parsers stack) to theproductions that it is derived from, to combine it withothers. Thus, the parsing process assembles a parse tree

whose node information becomes more complete thehigher they travel up the parse tree. Once a representa-tive sketch can be recognized (i.e. a root production couldbe reached), it is hierarchically interpreted and thesketch-speci"c information is stored in the top node ofthe parse tree (interpreted information of the sketch com-ponents are stored in lower level nodes).

Note that the sketch recognition and interpretation isnot adaptive, and appropriate grammars have to bede"ned in advance. However, they can be dynamicallyexchanged by IMGR to simultaneously support multiplesketch languages, and since the dynamic gestures at thelowest level are adaptive, single strokes can also be ex-changed by the user during runtime.

To support look-ahead, bottom-up parsers as well asusers who must learn the representations, we de"nedsketch languages that assemble higher-level sketchesfrom lower-level ones (Fig. 5 illustrates the 3D-sketchlanguage to create primitives that is shown in listing 1.2The dashed lines represent the composition possibilitiesof the more complex primitives from less compoundones). The lowest level represents a set of basic gestures,such as points, lines, arcs, circles and freeform strokes.

Elementary strokes (written in capital letters inTable 1) and their pre-computed property spaces arepassed from the dynamic gesture recognition layer to theparser. After the stroke-speci"c property spaces are co-pied onto the parser stack, the arrival of an elementarystroke triggers the parsing process. Objects are recog-nized from a sequence of elementary strokes by parsingthem through the production rules (lines 4}48 in Table 1)until the root production (line 4) can be derived. Duringthe parsing process, each production determines the ob-ject-speci"c properties within their semantic actions (tomaintain the simplicity of Table 1, the semantic actionsare denoted as comments between the camberedbrackets), copies them onto the parser stack, assemblesa new note (that contains the property information) tothe parse-tree, and re-triggers the parsing process byreturning the derived object type. The semantic action ofthe root production (line 8 in Table 1) "nally passes therecognition results as well as the interpreted propertyinformation (i.e. the parse-tree and object properties) tothe next layer (via the parser-stack).

Note, that some production rules represent more thanone object (e.g. lines 21}23 in Table 1), thus they have todetermine the object type in addition. A single freeformstroke, for instance, can outline the contour of a planarfreeform-face or the silhouette of a body of revolution. Todetermine whether a freeform face or a body of revol-ution has been sketched, the semantic action (indicated inlines 22 and 23) computes the distance between the startpoint and end point of the freeform stroke and makes the

856 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 7: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 5. 3D-sketch language for primitive creation.

following heuristic assumption: If this distance is belowsome threshold, the stroke can be assumed as beingclosed, thus a closed freeform-face is reconstructed. If thedistance between the two points is above the threshold, itis assumed that the stroke outlines the silhouette ofa body of revolution, which can be reconstructed. Thesegeneralizations help to keep the grammar small.

The parser stack (lines 1}3 in Table 1) contains theproperty space of the current stroke, the current propertyspace of the derived object, and the current parse tree.The single stroke properties contain geometric or dy-namic gesture information, as discussed in Section 3.3,while the object properties contain object-speci"c para-meters (such as type, position, orientation, size, etc.) thatare needed for reconstruction. Additionally, the parse-tree stores history-information (i.e. the hierarchically de-rived properties of sub-objects).

3.5. Other modalities

Speech is used in our setup as secondary modality,mainly to complement the gestural information. For this,we apply an o!-the-shelf software package for continu-ous speech recognition (IBM's ViaVoice). As for sketchrecognition, a grammar has to be de"ned in advance tooutline the recognizable speech language (see Table 2, forexample). To avoid confusion with non-speech com-mands (e.g. caused by conversations with other users thatare received by the speech engine), we o!er speech inputover a regular telephone (the audio data is transmitted tothe speech engine via the phone line). This enables us toput the receiver away any time, if we do not want to inputspeech, and, by taking advantage of a distributed modal-ity processing, it supports a completely remote recogni-tion of verbal commands.

Another solution to this problem is to de"ne keywordsat the beginning of the grammer (e.g. &listen') that expli-citly trigger the speech recognition process (see line 1 inTable 2). Although this is not very intuitive, it allows usto use a more ergonomic headset instead of picking upthe telephone, every time we want to give a speechcommand.

With respect to the de"ned grammar, speech com-mands can be complete sentences, phrases or singlewords. The simple example grammar that is shown inTable 2 de"nes commands for selecting material andtexture information from prede"ned categories and forbrowsing within a selected category, commands to switchbetween handwriting recognition and sketch-recogni-tion, and a command for initiating the measuring ofcreated objects. Note, that the free form speech commandthat is passed to the sketch parser (see line 21 in Table 1)has also been de"ned here (see line 3 in Table 2).

3.6. Modality-merge and context knowledge

This level merges the di!erent information extractedfrom the single modalities, as well as the provided contextknowledge. With respect to the active-tool context,ErgoSketch [6], for instance, applies speech recognitionand sketching independently. Therefore, spoken com-mands or corresponding sketches are used to switchbetween modes, to select items or to input commandsthat are usually typed with the keyboard. In addition tothis, we merge the verbal and gestural information ina complementary manner. To merge the di!erent modali-ties, we pass recognized speech information as tokensand information nodes (compatible to the sketch parsetree) into the sketch parser (see line 21 in Table 1 and line3 in Table 2 for example) and process them as described

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 857

Page 8: A multi-layered architecture for sketch-based interaction within virtual environments

Table 1Simple BNF-grammar to de"ne the sketch language, illustrated in Figs. 5 and 6

1: parser+stack "MSTROKE}PROPERTIES stroke}properties2: OBJECT}PROPERTIES object}properties3: PARSE}TREE parse}treeN

4: (sketched+object' "(point'D(line'D(circle'D(two}line}face'D(three}line}body'D(cylinder'

5: D(two}circle}body'D(cone'D(two}line}point}body'D(four}line}body'6: D(single}freeform}stroke}shape'D(extruded}freeform}face'7: D(freeform}extruded}std}face'D(freeform}extruded}freeform}face'8: Mreturn object}type, object}properties and parse}treeN

9: (freeform+extruded+freeform+face' "(single}freeform}stroke}shape'(single}freeform}stroke}shape'10: Mreturn type (freeform extruded freeform face) and compute11: object}properties, assemble parse}treeN

12: (freeform+extruded+std+face' "(two}line}face'(single}freeform}stroke}shape'13: Mdetermine and return object type ( freeform extruded triangle or freeform14: extruded rectangle) and compute object}properties, assemble parse}treeN15: D(circle'(single}freeform}stroke}shape'16: Mreturn object type ( freeform extruded circle) and compute object}properties, assemble17: parse}treeN

18: (extruded+freeform+face' "(single}freeform}stroke}shape'(line'19: Mreturn object type (extruded freeform face) and compute object}properties, assemble20: parse}treeN

21: (single+freeform+stroke+shape' 5FREEFORM22: Mdetermine and return object type (freeform face or body of revolution) and compute23: object}properties, assemble parse}treeN

24: (four+line+body' " (two}line}face'(two}line}face'25: Mdetermine and return object type (truncated pyramid or truncated tri-pyramid)26: and compute object}properties, assemble parse}treeN

27: (two+line+point+body' 5(two}line}face'( point'28: Mdetermine and return object type (pyramid or tri-pyramid) and compute29: object}properties, assemble parse}freeN

30: (cone' "(circle'(point'31: Mreturn object type (cone) and compute object}properties, assemble parse}treeN

32: (two+circle+body' 5(circle'(circle'33: Mdetermine and return object type (sphere or truncated cone) and compute34: object}properties, assemble parse}treeN

35: (cylinder' 5(circle'(circle'36: Mreturn object type (cylinder) and compute object}properties, assemble parse}treeN

37: (three+line+body' "(two}line}face'(line'38: Mdetermine and return object type (cube, prism or tri-prism) and compute39: object}properties, assemble parse}treeN

40: (two+line+face' 5(line'(line'41: Mdetermine and return object type (rectangle or triangle) and compute object}properties,42: assemble parse}treeN

43: (circle' "CIRCLE44: Mreturn object type (circle) and compute properties, assemble parse}treeN

45: (line' "LINE46: Mreturn object type (line) and compute object}properties, assemble parse}treeN

47: (point' 5POINT48: Mreturn object type (point) and compute object}properties, assemble parse}treeN

858 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 9: A multi-layered architecture for sketch-based interaction within virtual environments

Table 2Simple BNF-grammar for a speech language

1: (STATEMENT'" listen (COMMAND'.2: (COMMAND'" new texture (TEXTURE+CATEGORY' D new material ( MATERIAL+CATEGORY'

3: D texture o! D free form D measure D draw D next D previous D (WRITE'.4: (WRITE' " write numbers D write characters.5: (TEXTURE+CATEGORY' " assignable textures D miscellaneous D stones D surfaces D6: swirls D textiles D re#ect D gallery.7: (MATERIAL+CATEGORY'"assignable materials D autumn D rococo D sheen D glass D8: metal D neon D silky D spring D summer D tropical D winter.

Fig. 6. Simple freeform objects created from representative 3Dsketches.

in Section 3.4. (The only di!erence here is that the in-formation is forwarded from the speech recognition layerrather than from the dynamic gesture recognition layer.)However, the verbal input does not have to consist neces-sarily of single words. If it is a complete phrase, weextract the prede"ned keywords and pass their corre-sponding tokens to the IMGR sketch parser. This allowsus to provide further information to the sketch recogni-tion and interpretation process that, for instance, cannotbe sketched or can only be sketched with di$culties(e.g. material information, such as color, texture, etc.).

Sketching freeform strokes is a good example for usingboth modalities: It is possible to di!erentiate freeformstrokes from the regular elementary strokes within thedynamic gesture recognition layer if their total featuredeviation to the best match is above some threshold.However, it is di$cult to classify the strokes if they aresimilar (i.e. if their total feature deviation is below thethreshold). To avoid classi"cation con#icts and to sup-port the sketching of any type of freeform stroke, werequire an explicit noti"cation in form of a speech com-mand before the freeform stroke is performed. While thisverbal token is used for sketch classi"cation, the para-meters of the freeform stroke are passed to the gestureparser in form of an information node to support sketchidenti"cation.

Fig. 6 illustrates some simple freeform objects that canbe created with a mix of gestural input for elementarystrokes (e.g., lines, points, circles, arcs, etc.) and verbalinput for freeform strokes (see line 21 in Table 1 and line3 in Table 2). Note that object creation follows the sameprinciples as described in Section 3.4 and is derived fromthe same sketch language that is illustrated in Fig. 5.

Although our architecture supports a co-verbal ges-ture interpretation, it does not o!er an interpretation oftime-stamped verbal and gestural input that occur at thesame time, such as described in Bolt's pioneering &Put-That-There'-Study [23] or other works, such as [24,25].Even though we think that alternating sequential multi-modal information streams are well suited for objectcreation and interaction within virtual environments, weare interested in extending this level to support parallelmultimodal information streams and to evaluate the hu-man factors that are related to this. Prede"ned or dynam-

ically growing context knowledge is accessible within thislayer and by the sketch parser. It can be used within thecontext-free grammar that de"nes the sketch language, aswell as after the sketch recognition and interpretationprocess to derive the intended action.

The context knowledge (implemented as rule-basedsystem) represents application-speci"c information thathas to be known to interact with or to create objectsusing sketches. It can be generated dynamically (e.g. bysketches themselves, such as the geometric constraintsthat can be established by sketching them with theSketch system [2]), or it can be prede"ned. Examples forpartially prede"ned context knowledge are constructionrules for pipes that we modeled as a rule-based system:A piping demonstrator allows pipe components to beconnected by sketching single assembly steps. How toconnect pipes according to the sketched information (theresult from the sketch recognition and interpretationlayer), and if they can be connected at all, is determinedby inferring the construction rules. A geometric con-straint solver "nally animates the outlined assembly stepand establishes new constraints (i.e. dynamically gener-ated context-knowledge) if possible.

Fig. 7 illustrates a sketched assembly step, the resultafter inferring the construction rules and solving the

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 859

Page 10: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 7. Sketched assembly step, resulting construction, and constrained interaction.

geometric constraints, and a constrained interaction out-lined by sketches.

3.7. High-level interaction

As stated for the low-level interaction layer, the nameof this layer refers to its position within our architecture.High-level interaction techniques are sketch-based tech-niques that are implemented within the scope of thelower layers that are described above.

In the following, we want to introduce some examplesfor high-level interaction techniques and discuss howthey are related to the other layers. Object Creation issupported by 2D sketches [13] or by 3D sketches [21].Sketching on a two-dimensional basis requires a refer-ence plane that can be provided by our transparentpad-like input device [11,12]. The outlined representa-tive sketches are recognized and interpreted before theyare used to create standard object primitives or applica-tion speci"c objects. The same principle is applied forunconstrained three-dimensional strokes that aresketched directly within the 3D free-space. The advant-age of 3D sketches is that their higher information con-tent (in contrast to 2D sketches, 3D sketches providedepth information) allows us to fully interpret them andcompletely reconstruct the outlined objects [21]. This isnot possible when 2D sketches are used to create 3Dobjects. Direct sketch manipulation and stroke snappingare the supported low-level interaction possibilities incombination with 3D sketching. However, three-dimen-sional sketches can also be attached to the local coordi-nate system of the translucent pad, introducing a three-dimensional sketchpad.

To o!er sketch-based Object Interaction, 3D sketchescan be used to outline basic transformations (e.g. transla-tions, rotations, and scaling). The interpreted sketchesreveal information, such as type of transformation, se-lected object, target position for translation, rotationangle, etc. In addition to basic transformations, contextsensitive transformations (i.e. transformations that re-quire context knowledge) can also be sketched, as theassembly example in Section 3.6 shows. Single objects or

groups of objects can be selected by circling them anddeleted by scribbling them out, with both 2D and 3Dsketches.

System Control is also supported on a two-dimensionalor three-dimensional basis. In combination with thesketchpad, 2D sketches can be used to switch betweenapplication speci"c modes, such as context-sensitivemenus, coloring mode, window tools, "sh net selectionwith the pad, or object creation. An example for systemcontrol with 3D sketches is that the virtual scene can beilluminated by drawing appropriate light sources withinthe 3D free-space. Type, position, orientation and open-ing apex can be fully recognized and interpreted from thesketches, while speech information can be used in addi-tion to de"ne color and brightness.

Since speech recognition is still too unreliable to inputtext without being constrained to a prede"ned grammar(e.g. object names or "lenames, measurements, etc.), wesupport Text Input by recognizing handwriting, similar towhat is used with the Virtual Notepad [14]. Handwritingis a strictly two-dimensional task and requires a referenceplane, which the translucent pad can o!er. Since there isno di!erence between recognizing strokes that belong tocharacters or strokes that belong to sketches, we canapply IMGR to manage this. For our system, we decidedto train the uni-stroke code of Gra$ti [26] (implementedin many of the palm-sized devices, such as Palm-Pilot,WordPad, etc.) to IMGR and to realize the Gra$ti "nitestate machine. We have chosen to do this for two reasons:Gra$ti's increasing level of use and the simplicity of theuni-stroke characters. Uni-stroke characters do not re-quire users to de"ne a context-free grammar, and thecharacters can simply be handled by the dynamic gesturerecognition layer. Thus, fully adaptive handwriting rec-ognition can be realized. In contrast to standard Gra$ti,we analyze the user's writing behavior during runtime tosupport an automatic adaptation to it. Thus, startingwith a pre-trained version of the character set, an implicitand seamless training is facilitated. Note that multipleusers have to activate their individual pro"les beforethey use the system. This can be triggered via speechcommands.

860 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 11: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 8. Freeform sketching: creating a Coons-patch.

Sketching also allows users to generate freeform surfa-ces from hand gestures in free space. Conventionally,freeform surfaces require deep knowledge about theirunderlying concepts on the user's side (e.g., the user hasto know the meaning of control points, knot vectors, u-and v-direction and the impacts of changing any of them).Sketching curves and surfaces into free space providesmuch more intuitive interface than control-point manip-ulation. For this reason we explored the possibilities ofsketching for freedom surface generation. The aim was tosupport a seamless creation of well-known freeform sur-face concepts, such as Coons patches [27], skinned surfa-ces [28], and net surfaces [29]. Therefore, sketchingtechniques that do not place any prerequisites on theuser's side were developed. The mapping of the inputdata onto the requirements of di!erent freeform surfacetypes is done by the system in performing some reasoningwhere necessary (see [30] for details). By using the trans-parent pad as a virtual mirror, one can generate symmet-ric surfaces (Fig. 8).

3.8. Applications

The applications that are described in this section o!ersketching by embedding our architecture partially or asa whole. Since the applications were developed indepen-dently on top of di!erent hard- and software platforms(implementing their existing, non-sketch-based user-interface), they respectively apply their own interactiondevice layer (making use of di!erent interaction devices,implementing speci"c device drivers, etc.). However, sincethe interface to the low-level interaction layer (i.e. asequence of stroke samples) is given, the sketch-basedinterface to the application layer can be provided by thearchitecture (cf. Fig. 1). Even the low-level interactionlayers were application speci"c, the integration processedsmoothly, within a few working days. Each of theapplications has an individual and domain-speci"c pur-

pose that is supported by the possibility of using sketchesas an additional, human-centered interaction method.Although the applications do not focus on sketching, thelatter enlarges the users' interaction capability * espe-cially within immersive virtual environments.

3.8.1. CADesk * using 2D sketching for generationand manipulation of solid geometries

The Virtual Table presents stereoscopic graphics toa user in a workbench-like setting. We have developeda user interface and new interaction techniques for thisdevice based on the transparent props described above* a tracked hand-held pen and a pad [11,12]. Theseprops, particularly the pad, are augmented with 3Dgraphics from the Virtual Table's display that can serveas a palette for tools and controls, as well as a window-like see-through interface, a plane-shaped and through-the-plane tool, supporting a variety of new interactiontechniques. This section describes an extension of thisuser-interface design space, which uses the described ges-tural input to create and control solid geometries forCAD and conceptual design [13]. We have anecdotalevidence, that this new interaction paradigm greatly in-creases the Virtual Table's suitability for design tasks,especially since traditional CAD dialogue can be com-bined with intuitive rapid sketching of geometry on thepad. Additionally, the resulting events and objects can beassociated with scene details below the translucenttablet. For creative CAD applications, informal userstudies employing the talk-aloud protocol with studentsfrom the Rhode Island School of Design con"rm thisnotion.

The 2D sketches that are used for object creation weredeveloped to be as intuitive as possible, to facilitate easymemorization. In addition, since the user looks throughthe transparent pad onto the scene that is displayedon the Virtual Table, the representative sketches havebeen designed to follow the contours of the top-down

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 861

Page 12: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 9. Creating of solid objects using 2D sketching on a tablet:left, a truncated cone, right, a torus.

Fig. 10. Supported sketches for the creation of solid objects.

projection of the corresponding solid geometries as close-ly as possible.

The currently implemented uni-strokes are concep-tually structured in a hierarchical order. A stroke's begin-ning and end are de"ned by pressing and releasinga button on transparent pen. Sketches may consist ofseveral pen strokes that are performed close to the pad.These tools are used much like pen and paper, exceptinstead of actually drawing a shape, the computer scansthe strokes made on the pad. The strokes' proximity tothe pad determines whether or not they contribute to thegesture to be recognized.

The sketches support the generation of 3D objects(cf. Fig. 9) with circular base surfaces, contours (e.g.,sphere, cone, truncated cone, cylinder, torus), or rectan-

gular shapes (e.g., cube, pyramid, truncated pyramid). Ingeneral, the objects are created by "rst de"ning their basesurfaces or contours (cf. Fig. 10). Afterwards, a strokede"nes either the depth (e.g. for cube) or the height(e.g. for cone). Sketches for truncated solids resembletheir non-truncated equivalent in that the height stroke ismerely extended by a horizontally cutting "nishing-stroke (cf. Fig. 10). Obviously, special solutions must bedeveloped for cylinder, sphere, and torus generation,since these objects would be created using ambiguoussketches.

Our solutions for creating these shapes are de"ned asfollows: the cylinder by two parallel lines that indicate itsside view, the torus by two circular strokes, and thesphere by a circular stroke and an arc stroke that indicatethe sphere's curvature in all dimensions. Although somesketches show close correspondences, the recognitionrate is generally between 95 and 100% (cf. Fig. 10 for thecorresponding trained gesture set).

The de"ned sketches for object manipulation and con-trol are currently limited to the selection and deletion ofobjects (cf. Fig. 11). Additional control sketches are avail-able that perform mode changes, thus relieving the userinterface apparent on the pad from unnecessary dialoguebuttons. Fig. 12 shows the di!erent sketches for objectcontrol and mode changes. Although several sketches inthis group also show close correspondences, the recogni-tion rate is once again between 95 and 100% (cf. Fig. 12for the corresponding trained gesture set).

To facilitate intuitive interaction and support easyrecollection, objects are selected by circling their projec-ted images that are viewed through the pad. (Note: Thisfunctionality was actually already supported by the sys-tem described in [11], without using the motion-basedgesture recognition presented here.) In a similar way,objects are deleted by `crossing them outa on the pad.

862 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 13: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 11. Sketching for object control. From left to right: object selection, object deletion and undo operation.

Fig. 12. 2D Gestures for object control and mode changes.

Undo is represented by a `scribblinga on the pad, thusresembling the erasure of mistakes on common paper.

3.8.2. Virtual mission planningThis application takes exactly the same approach as

the CADesk, in that it uses 2D sketches on a translucentpad to sketch unit symbols of the services (cf. Fig. 13).The di!erence in this application is that symbols and notcertain geometries are retrieved and conceptually organ-ized in a hierarchical sketch language. These symbols aredi!erentiated in visual properties (e.g., texture map orpattern) rather than in geometrical ones. The applicationallows users inexperienced with VR applications to intu-

itively use the system for strategic planning and VirtualDiplomacy purposes.

This system has so far been demonstrated to divisionsof the Navy that are involved in developing new tech-nologies to ease the control of the littoral battlespace aswell as Air Force research labs that focus on virtualreality applications to command and control applica-tions. In both cases, the rapidness of interaction and thespeed of getting familiar with the system were majorcriteria that helped in acquiring research projects for thefurther development of the system. It was often notedthat the event of PDA technology and the similarity ofthe sketching and handwriting recognition provided by

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 863

Page 14: A multi-layered architecture for sketch-based interaction within virtual environments

Fig. 13. Virtual mission planning * sketching is used to generate and position service symbols on the #y.

Fig. 14. ARCADE: freeform-sketched bath tub- and car design.

the presented system greatly helped to make the rightassociations between everyday tools and VR interactiontechnology.

3.8.3. ARCADE * advanced realism CAD environmentARCADE is a 3D modeling system and testbed to

explore new possibilities in human}computer intera-ction, such as 3D input, sketching and gesture recogni-tion in the context of 3D modeling tasks. It supportsmodeling operations, such as 2D/3D primitives, freeformsurfaces, sweeping and Boolean operations in combina-tion with 3D input devices (pad and pen) and immersiveoutput (primarily a Virtual Table). In addition toa menu-based preselection, 3D freehand sketching iso!ered for object creation (primitives and freeformsurfaces). While preselection forces the user to alternatebetween design- and menu space, sketching within thedesign space allows the user to focus on the design task(Fig. 14).

Although we are just at the beginning of exploring thepossibilities of sketching in 3D space, the techniquesdeveloped (especially for sketching freeform models) sofar cause stunning reactions and interest especially fromthe car industry, e.g. Opel and Porsche. After demon-strating the system, we made it a habit to hand over theinteraction devices to our audience. Even persons, whohave never seen or used a CAD system before, are able tosketch objects within minutes. The natural behaviour ofthe objects, the immediate visual feedback, and the corre-spondence between hand movements and visual informa-tion are reported as the most important advantages overtraditional approaches.

4. Conclusion and discussion

In this article, we have described a multi-layered archi-tecture for sketch-based interaction within three-dimen-

864 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 15: A multi-layered architecture for sketch-based interaction within virtual environments

sional virtual environments. We have demonstrated thata broad palette of high-level interaction techniques, suchas object creation, object interaction, freeform modeling,text input, and environment control can be realizedby o!ering sketching. Rather than developing generalsketching applications, these techniques were integratedinto existing domain-speci"c applications to extend theirinteraction functionality (while maintaining their existinguser interfaces) and to evaluate the techniques' applica-bility.

In contrast to non-immersive desktop approaches,such as Sketch, STILTON, Teddy, and even ErgoSketch(since sketching in supported in a monoscopic mode,only), immersive or semi-immersive three-dimensionalenvironments o!er a less constrained sketching (mainlydue to the possibility of one- or two-handed 3D interac-tion, 3D navigation and stereoscopic visual perception).Workbench-like output systems together with pen andpad combinations turned out to be well suited tools,since they support both a constrained 2D, as well as anunconstrained 3D sketching process on an intuitive basis,while representing common and well known everydayitems, such as drafting boards, sketch pads, clip boards,and real pens. In the application area of 3D modelling,sketching within free space is a consequent step towardssupporting the user in his behavioural and perceptualpossibilities. Over the last several thousand years man-kind was forced to either physically build any 3D objector to map its shape into a two-dimensional drawing,3D sketching relieves the users from such encumbers.

In terms of being #exible enough for being integratedwith di!erent domain-speci"c applications, wemodularised our architecture * making it possible tosolely use required components, to exchange them (whilemaintaining the interfaces between components * ase.g., illustrated in Figs. 1 and 4), or to distribute them. Bynot o!ering a modularised architecture, current sketchapplications (such as Sketch, ErgoSketch, STILTON,Teddy, 3-Draw, etc.) are lacking the possibility to ex-change components. Being able to exchange single com-ponents or sub-components o!ers an adaptation toa continuously evolving basis technology. This includes,for instance, property and feature descriptions to supportdi!erent degrees of freedom, classi"cation and trainingmethods to satisfy di!erent recognition demands(e.g. sketch recognition or handwriting recognition),grammars to feature multiple sketch and speech lan-guages, or, on a higher level of abstraction, includescomplete architectural layers (such as speech or gesturerecognition).

Another major di!erence to the mentioned relatedwork is that an order-free online or o%ine adaptivetraining of single dynamic gestures is supported. Newgestures can be trained at runtime, without changing theapplication's source code. This allows applications ortheir users to de"ne their own gesture sets in 2D, 3D,

6DOF or any other order (e.g. if gloves are used as inputdevices), without requiring deep knowledge on gesturerecognition. In the case of handwriting recognition, forinstance, this features an automatic adaptation to indi-vidual writing behaviors of users by the system. Gesture-bases or sketch languages can also be dynamically ex-changed at runtime (automatically by the application orinteractively by the user) to support task-speci"c interac-tion. Other approaches (such as the mentioned relatedwork) that employ hard-coded numerical evaluation ofgestures lack in #exibility. In these systems it will bedi$cult to make the sketch functionality available toa variety of di!erent domains and the associated softwareapplications.

Using grammars as an abstraction for de"ning speechlanguages has been proven to be worthwhile in many ofthe major speech-recognition packages (such as IBM'sVia Voice, Microsoft's Speech SDK and others). Itturned out that this approach is also e$cient for gesturerecognition (primarily for sketch recognition). Usingtools that generate parsers from prede"ned domain-speci"c grammars helps to widely encapsulate the sketch-based interaction metaphors from the application. Notsupporting this kind of encapsulation is yet anotherdrawback of hard-coded methods. Furthermore, gram-mars allow a systematic fusion of multiple modalities andcontext knowledge* especially, if grammars are alreadyused to de"ne rules for the single modalities (e.g., speechlanguages and gesture languages).

Sketch-based interaction has not reached the requiredmaturity for many application domains, yet (as it is thecase for other new technologies, such as virtual reality).In the future, however, these natural I/O techniques willbecome more prevalent for use with software applica-tions to solve domain-speci"c problems. Sketching willmainly be used for virtual reality-aided design (VRAD)tasks, but also to enhance other domains. Together withother input possibilities, such as speech, sketching willbecome an important part of multimodal interaction.Although we have not carried out formal user studies sofar, we received positive feedback from domain expertswho are familiar with the interaction possibilities oftraditional applications, and who experimented with theintroduced applications as well as with the featuredsketch-based interaction methods (see Sections3.8.1}3.8.3). We believe that this can be led back to thefact that a human-centred and natural human}computerinteraction (supported by the application of everydaytools and by the execution of habitual everyday tasks) ispreferred over the traditional machine-centred human}computer interaction* if (at least) the same results canbe achieved.

Overall, by dissociating from WIMP (Windows, Icons,Menus, Pointers) interfaces and moving towards hu-man-centered next-generation user interfaces, sketch-rec-ognition and interpretation will improve, leave the

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 865

Page 16: A multi-layered architecture for sketch-based interaction within virtual environments

laboratories and be introduced to the workspaces ofarchitects, designers, analysts, managers, and artists. As itis the case in other engineering disciplines, the develop-ment of sketch-based interaction interfaces can* amongothers* bene"t from architectural patterns, such as theones described in Section 3.

Besides the evaluation of the proposed architecture,a main aspect of our future work will be the design anddevelopment of a multimodal agent platform that o!ersmore e$cient distribution possibilities and a smoothintegration of our architecture into existing and newapplications.

References

[1] Cross N. Natural intelligence in design. Elsevier DesignStudies* The International Journal for Design Researchin Engineering, Architecture, Products and Systems1999;20(1):25}39.

[2] Zeleznik RC, Herndon KP, Hughes JF. Sketch: an inter-face for sketching 3D scenes. Computer Graphics(Proceedings of SIGGRAPH'96, Annual ConferenceSeries) 1996;30:163}70.

[3] Igarashi T, Matsuoka S, Tanaka H. Teddy: a sketchinginterface for 3D freeform design. Computer Graphics (Pro-ceedings, Annual Conference Series, ACM SIGGRAPH)1999;409}16.

[4] Turner A, Chapman D, Penn A. Sketching a virtual envi-ronment: modeling using line-drawing interpretation. Pro-ceedings of ACM Symposium on Virtual Reality Softwareand Technology (VRST'99), 1999. p. 155}61.

[5] Sachs E, Roberts A, Stoops D. 3-draw: a tool for designing3D shapes. IEEE Computer Graphics and Applications1991;6:18}26.

[6] Forsberg AS, LaViola JJ, Zeleznik RC. ErgoDesk:a framework for two- and three-dimensional interaction atthe ActiveDesk. Proceedings of the Second InternationalImmersive Projection Technology Workshop, Ames, IA,May 11}12, 1998.

[7] Bier E, Stone M, Pier K, Buxton W, DeRose T. Tool-glasses and magic lenses: the see-through interface.Proceedings of SIGGRAPH'93, 1993. p. 73}80.

[8] Schkolne S, Schroeder P. Surface drawing. Technicalreport CS-TR-99-03, Caltech Department of ComputerScience.

[9] Forsberg A, LaViola J, Markosian L, Zeleznik R. Seamlessinteraction in virtual reality. IEEE Computer Graphics& Applications 1997;17(6):6}9.

[10] Barco, Inc., BARON, URL: http://www.barco.com/projec-ti/products/bsp/baron.htm, 1997.

[11] Schmalstieg D, Encarnac7 a8 o LM, SzalavaH ri ZS. Usingtransparent props for interacting with the virtualtable. Proceedings of the ACM SIGGRAPH Sympo-sium on Interactive 3D Graphics (13DG'99), 1999.p. 147}153.

[12] Coquillart S, Wesche G. The virtual palette and the virtualremote control panel: a device and an interaction para-digm for the responsive workbench (TM). In: RosenbaumL, Astheimer P, Teichmann D, editors. Proceedings of the

1999 IEEE Virtual Reality Conference, March 13}17,Houston, TX, Silver Spring, MD: IEEE Computer SocietyPress, 1999. p. 213}6.

[13] Encarnac7 a8 o LM, Bimber O, Schmalstieg D, Chandler SD.A translucent sketch-pad for the virtual table exploringmotion-based gesture recognition. Computer andGraphics Forum (Proceedings of EUROGRAPHICS'99)1999;19(3):277}85.

[14] Poupyrev I, Tomokuza N, Weghorst S. Virtual notepad:handwriting in immersive VR. Proceedings of IEEEVRAIS'98, 1998. p. 126}32.

[15] SzalavaH ri ZS, Gervautz M. The personal interaction panel* a two-handed interface for augmented reality. Com-puter Graphics Forum (Proceedings of EURO-GRAPHICS'97) 1997;16(3):335}46.

[16] Schmalstieg D, Fuhrmann A, SzalavaH ri ZS, Gervautz M.Studierstube * an environment for collaboration inaugmented reality. Proceedings of Collaborative VirtualEnvironments '96, and Virtual Reality Systems *Development and Applications, vol. 3 (1), 1996. p. 37}49.

[17] Bimber O. Continuous 6D gesture recognition: a fuzzy-logic approach. Proceedings of the Seventh InternationalConference in Central Europe on Computer Graphics,Visualization and Interactive Digital Media (WSCG'99),vol. 1, 1999. p. 24}30.

[18] Rumelhart D, Zipser D. Feature discovery by competitivelearning. Parallel distributed processing. Cambridge, MA:MIT Press, 1986.

[19] Oja E. A simpli"ed neuron model as principle componentanalyzer. Journal of Mathematical Biology 1982;15:267}73.

[20] Sanger T. Optimal unsupervised learning on a single-layer feed-forward neural network. Neural Networks1989;2:459}73.

[21] Bimber O. Rudiments of a 3D freehand sketch basedhuman-computer interface for immersive virtual environ-ments. Proceedings of Virtual Reality Systems andTechnology (VRST'99), 1999. p. 182}3.

[22] Aho AV, Sethi R, Ullman JD. Compilers: principles, tech-niques, and tools. Reading, MA: Addison-Wesley, 1986,ISBN: 0-201-10194-7.

[23] Bolt RA. Put-that-there: voice and gesture at the graphicsinterface. Computer Graphics (Proceedings of SIG-GRAPH'80) 1980;14(3):262}70.

[24] Bolt RA, Herranz E. Two-handed gesture in multi-modalnatural dialog. Proceedings of the ACM Symposium onUser Interface Software and Technology (USIT'92), 1992.p. 7}14.

[25] Sparrell CJ, Koons DB. Interpretation of coverbal depic-tive gestures. AAAI Spring Symposium on IntelligentMulti-Modal Multi-Media Interface Systems, 1994.

[26] Blinkenstrofer CH. Gra$ti. Pen Computing 1995;30}1.[27] Coons SA. Surfaces for computer-aided design of space

forms. Technical report, MIT, 1967.[28] Piegl L, Tiller W. The NURBS book. Berlin: Springer,

1997.[29] ACIS geometric modeler application guide. Co: Spatial

Technology Inc., 1996.[30] Stork A, Schimpke O, de Amicis R. Sketching freeforms in

semi-immersive environments. 2000 ASME Design Engin-eering Technical Conferences & Conference and Informa-

866 O. Bimber et al. / Computers & Graphics 24 (2000) 851}867

Page 17: A multi-layered architecture for sketch-based interaction within virtual environments

tion in Engineering Conference, DETC 2000, 2000 Sept10}13; Baltimore, MD. Electronic Proc. on CD-ROM,ASME International 2000.

[31] Viega J, Conway M, Williams G, Pausch R. 3D magiclenses. Proceedings of ACM USIT'96. New York: ACMPress, 1996. p. 51}8.

O. Bimber et al. / Computers & Graphics 24 (2000) 851}867 867