Optically based direct manipulation for augmented reality

*Corresponding author.E-mail addresses: [email protected] (G. Klinker),

[email protected] (D. Stricker), [email protected] (D. Reiners)

Computers & Graphics 23 (1999) 827}830

Augmented Reality

Optically based direct manipulation for augmented reality

G. Klinker!,*, D. Stricker", D. Reiners"

!Moos 2, 85614 Kirchseeon, Germany"Fraunhofer Institute for Computer Graphics (FhG-IGD), Rundeturmstr. 6, 64283 Darmstadt, Germany

Abstract

Augmented reality (AR) constitutes a very powerful three-dimensional user interface for many `hands-ona applicationscenarios. To fully exploit the AR paradigm, the computer must not only augment the real world, but also acceptfeedback from it. In this paper, we present an optical approach for collecting such feedback by analyzing video sequencesto track users and the objects they work with. Our system can be set up in any room after quickly placing a few knownoptical targets in the scene. We present two demonstration scenarios to illustrate the overall concept and potential of ourapproach and then discuss the research issues involved. ( 1999 Elsevier Science Ltd. All rights reserved.

Keywords: Direct manipulation; Augmented reality; Video sequences

1. Introduction

Augmented reality (AR) constitutes a very powerfulthree-dimensional user interface for many `hand-onaapplication scenarios in which users cannot sit at a con-ventional desktop computer. To fully exploit the ARparadigm, the computer must not only augment the realworld but also accept feedback from it. Actions or in-structions issued by the computer cause the user to per-form actions changing the real world * which, in turn,prompt the computer to generate new, di!erent augmen-tations. Several prototypes of two-way human}computerinteraction have been demonstrated. In the space frameconstruction system of Feiner et al., selected new strutsare recognized via a bar code reader, triggering the com-puter to update its visualizations [1]. In a mechanicalrepair demonstration system, Breen et al. use a magneti-cally tracked pointing device to task for speci"c augmen-tations regarding information on speci"c components ofa motor [2]. Klinker et al. use speech input to control

stepping through a sequence of illustrations in a door-lock assembly task [3]. Ishii's metaDESK system usesgraspable objects to manipulate virtual objects likeb-splines and digital maps of the MIT campus [4].

In this paper, we present an approach which usescomputer vision-based techniques to analyze and trackusers or real objects. Our demonstrations can be ar-ranged in any room after quickly placing a few knownoptical targets in the scene, requiring only moderatecomputing equipment, a miniaturized camera, and ahead-mounted display.

2. Demonstrations

The subsequent two scenarios illustrate the overallconcept and potential of optically based direct manipula-tion interfaces for AR applications.

2.1. Mixed virtual/real mockups

Many industries (e.g., architecture, automotive design)use miniature models of a designed object. AR providesthe opportunity to use mixed mockups, combining phys-ical mockups with virtual models for new components.

0097-8493/99/$ - see front matter ( 1999 Elsevier Science Ltd. All rights reserved.PII: S 0 0 9 7 - 8 4 9 3 ( 9 9 ) 0 0 1 0 9 - 0

Fig. 1. (a) Manipulation of virtual and real objects. (b) Manipulation of a model of St. Paul's Cathedral via a piece of card board.

Fig. 2. Augmented Tic Tac Toe. (a) Placement of a new stone. (b) End of user action.

The "rst demonstration shows a real toy house andtwo virtual buildings. Each virtual house is representedby a special marker in the scene, a black square with anidenti"cation label. By moving the markers, users cancontrol the position and orientation of individual virtualobjects. A similar marker is attached to the toy house.The system can thus track the location of real objects aswell. Fig. 1a shows an interactively created arrangementof real and virtual houses. Fig. 1b shows a VRML-modelof St. Paul's Cathedral being manipulated similarly viaa piece of cardboard with two markers.

The system provides users with intuitive, physicalmeans to manipulate both virtual and real objects with-out leaving the context of the physical setup. The systemkeeps track of all virtual and real objects and maintainstheir occlusion relationships.

2.2. Augmented Tic Tac Toe

More elaborate interaction schemes can be shown inthe context of a Tic Tac Toe game (Fig. 2). Users sit infront of a Tic Tac Toe board and some play chips.A camera on their head-mounted display records thescene, allowing the AR-system to track head motionswhile also maintaining an understanding of the current

state of the game, as discussed in Sections 4 and 5. Usersand computer alternately place real and virtual stones onthe board (Fig. 2a). After "nishing a move, users wavetheir hands past a 3D `Goa button (Fig. 2b) to inform thecomputer that they have decided on their next move. Thecomputer then scans the image. If it "nds a new stone, itplans its own move and places a virtual cross on theboard. If it could not "nd a new stone or if it found morethan one, it asks the user to correct his placement ofstones.

3. The system

The AR-system works both in a monitor-based anda HMD-see-through setup. It runs on a low-end graphicsworkstation (SGI O2). It receives images at video rateeither from a minicamera that is attached to a head-mounted display (Virtual IO Glasses) (see Fig. 1b) orfrom a user-independent camera installed on a tripod.The system has been run successfully with a range ofcameras including the high-quality Sony 3CCD ColorVideo Cameras, color and black-and-white mini camerasand low-end cameras that are typically used for videoconferencing applications (e.g., an SGI IndyCam). The

828 G. Klinker et al. / Computers & Graphics 23 (1999) 827}830

resulting augmentations are shown on a workstationmonitor, embedded in the video image and/or ona head-mounted display (HMD). In the HMD, thegraphical augmentations can be seen in stereo withoutthe inclusion of the video signal (`see through modea).

At interactive rates, our system receives images andsubmits them to several processing steps. Beginning witha camera calibration and tracking step, the system deter-mines the current camera position from special targetsand other features in the scene. Next, the image is scan-ned for moving or new objects which are recognizedaccording to prede"ned object models or specialmarkers. Third, the system checks whether virtual 3Dbuttons have been activated, initiating the appropriatecallbacks to modify the representation or display of vir-tual information. Finally, visualizations and potentialanimations of the virtual objects are generated and integ-rated into the scene as relevant to the current interactivecontext of the application. (Details in [3]).

4. Live optical tracking of user motions

The optical tracker operates on live monocular videoinput. To achieve robust real-time performance, we usesimpli"ed scenes, placing back rectangular markers witha white boarder at precisely measured 3D locations (seeFig. 1a and Fig. 2a, b). In order to uniquely identify eachsquare, the squares contain a labeling region with a bi-nary code. Any subset of two targets typically su$ces forthe system to "nd and track the squares in order tocalibrate the moving camera at approximately 25 Hz [3].

5. Detection of scene changes

To search the image for mobile objects, we eithersearch for objects with special markers or we use model-based object recognition principles.

1. When unique black squares are attached to all mobilereal and virtual objects and if we assume that themarkers are manipulated on a set of known surfaces,we can automatically identify the marks and deter-mine their 3D position and orientation by intersectingthe rays de"ned by the positions of the squares in theimage with the three-dimensional surfaces on whichthey lie.

2. Real objects can also be tracked with a model-basedobject recognition approach, e.g., to "nd new pieceson the Tic Tac Toe board. From the image calib-ration, the locations of the game board and of thealready placed pieces are known. The system can thencheck very quickly and robustly which tiles of theboard are covered with a new red stone, contrastingwell against the white board. Error handing can con-

sider cases in which users have placed no new stone ormore than one new stone * or whether they haveplaced their stones on top of one of the computer'svirtual stones.

Both approaches have their merits and problems. At-taching markers to a few real objects is an elegant way ofkeeping track of objects even when both the camera andthe objects move. The objects can have arbitrary texturesthat don't even have to contrast well against the back-ground* as long as the markers can be easily detected.Yet, the markers take up space in the scene; they must notbe occluded by other objects unless the attached objectbecomes invisible as well. Furthermore, this approachrequires a planned modi"cation of the scene which gener-ally cannot be arranged for arbitrarily many objects.Thus it works best when only a few, well-de"ned objectsare expected to move. In a sense, the approach is in anequivalence class with other tracking modalities for mo-bile objects which require special modi"cations, such asmagnetic trackers or barcode readers.

Using a model-based object recognition approach isa more general approach since it does not require scenemodi"cations. Yet, the detection of sophisticated objectswith complex shape and texture has been a long-standingresearch problem in computer vision, consuming signi"-cant amounts of processing power. Real-time solutionsfor arbitrarily complex scenes still need to be developed.

Thus, the appropriate choice of algorithm depends onthe application requirements. Hybrid approaches includ-ing further information sources such as stationary over-head surveillance cameras that track mobile objects aremost likely to succeed.

6. Virtual GUIs in the real world

Rather than replicating a 2D interface on a wearablemonitor, we embed GUI widgets into the 3D world. Suchan approach has a tremendous amount of virtual screenspace at its disposal: by turning their heads, users canshift their attention to di!erent sets of menus. Further-more, the interface can be provided in the three-dimen-sional context of tasks to be performed. Users may thusremember their location more easily than by pullingdown several levels of 2D menus.

As a "rst step, we demonstrate the use of 3D buttonsand message boards. When virtual 3D buttons becomevisible in an image, the associated image area becomessensitive to user interaction. By comparison with a refer-ence image, the system determines whether major pixelchanges in the area have occurred due to a user wavinga hand across the sensitive image area. Such an approachworks best for stationary cameras or small amounts ofcamera motion, if the button is displayed in a relativelyhomogenous image area.

G. Klinker et al. / Computers & Graphics 23 (1999) 827}830 829

3D GUIs are complementary to other input modalitiessuch as spoken commands and gestures. Sophisticateduser interfaces will o!er combinations of all user inputschemes.

7. Scene augmentation accounting for occlusionsdue to dynamic user hand motions

To integrate the virtual objects correctly into the scene,occlusions between real and virtual objects must be con-sidered. We use a 3D model of the real objects in thescene to initialize the z-bu!er. During user interactions,the hands and arms of a user are often visible in theimages, covering up part of the scene. Such foregroundobjects must be recognized because some virtual objectscould be located behind them and are thus occluded bythem. We currently use a simple change-detection ap-proach to determine foreground objects, comparing thecurrent image to a reference frame while the cameradoesnot move. Z-bu!er entries of foreground pixels arethen set to a "xed foreground value. In the Tic Tac Toegame, this algorithm allows users to occlude the virtual`Goa-button during a hand-waving gesture (Fig. 2b).

8. Summary

How will AR actually be used in real applications oncethe most basic technological issues regarding high-pre-cision tracking, fast rendering and mobile computinghave been solved? In this paper, we have presented twodemonstrations illustrating the need for a new set ofthree-dimensional user interface concepts which requirethat computers be able to track changes in the real worldand react appropriately to them. We have presented

computer vision-based approaches addressing the prob-lems to track mobile real objects in the scene, to providethree-dimensional means for users to manipulate virtualobjects, and also to present three-dimensional sets ofGUIs. Furthermore, we have discussed the need to detectforeground objects such as a user's hands. Our demon-strations illustrate the overall 3D human}computer inter-action issues that need to be addressed. Building uponthese approaches towards more complete solutions willgenerate the basis for exciting AR applications.

Acknowledgements

This research is "nancially supported by the EuropeanCICC project (ACTS-017). The model of St. Paul'sCathedral is from Platinum Pictures (http://www.3dcafe.com).

References

[1] Webster A, Feiner S, MacIntyre B, Massie W, Krueger T.Augmented reality in architectural construction, inspection,and renovation. ASCE 3, Anaheim, CA, 1996, p. 913}9.

[2] Rose E, Breen D, Ahlers KH, Crampton C, Tuceryan M,Whitaker R, Greer D. Annotating real-world objects usingaugmented reality. Computer graphics: developments invirtual environments. New York: Academic Press, 1995.

[3] Klinker G, Stricker D, Reiners D. Augmented reality: a bal-ance act between high quality and real-time constraints. 1.International Symposium on Mixed Reality (ISMR '99). In:Ohta Y, Tamura H, editors. Mixed reality * merging realand virtual worlds. March 9}11, 1999.

[4] Ullmer B, Ishii H. The metaDESK: models and prototypesfor tangible user interfaces. UIST '97. Ban!, Alberta,Canada, 1997, p. 223}32.

830 G. Klinker et al. / Computers & Graphics 23 (1999) 827}830

Documents

Optically based direct manipulation for augmented reality