5
0097-8493/99/$ - see front matter ( 1999 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 7 - 8 4 9 3 ( 9 9 ) 0 0 1 0 8 - 9 Computers & Graphics 23 (1999) 821 } 825 Augmented Reality A distributed device diagnostics system utilizing augmented reality and 3D audio q Reinhold Behringer*, Steven Chen, Venkataraman Sundareswaran, Kenneth Wang, Marius Vassiliou Rockwell Science Center, 1049 Camino Dos Rios, Thousand Oaks, CA 91360, USA Abstract Augmented reality (AR), combining virtual environments with the perception of the real world, can be used to provide instructions for routine maintenance and error diagnostics of technical devices. The Rockwell Science Center (RSC) is developing a system that utilizes AR techniques to provide `X-ray visiona into real objects. The system can overlay 3D rendered objects, animations, and text annotations onto the video image of a known object. An automated speech recognition system allows the user to query the status of device components. The response is given as an animated rendition of a CAD model and/or as auditory cues using 3D audio. This diagnostics system also allows the user to leave spoken annotations attached to device modules as ASCII text. The position of the user/camera relative to the device is tracked by a computer-vision-based tracking system using "ducial markers. The system is implemented on a distributed network of PCs, utilizing standard commercial o!-the-shelf components (COTS). ( 1999 Elsevier Science Ltd. All rights reserved. Keywords: Augmented reality; Devise diagnostics; 3D audio 1. Context and related work During the past few years, rapid progress in several key areas has enabled the development of information sys- tems using augmented reality (AR) technology. AR pro- vides means of intuitive information presentation for enhancing situational awareness and perception by ex- ploiting the natural and familiar human interaction mo- dalities with the environment. The concept of AR has been demonstrated in many applications [1] such as airplane manufacturing [2] and guided assembly [3]. Critical for the success of an AR application is solving the registration problem, i.e. aligning the user's perception of the real world with the information display. Vision-based q Compressed version of talk presented at the 5th Euro- graphics Workshop on Virtual Environments (Vienna, June 1999) * Corresponding author. Tel.: #1-805-373-4435; fax: #1-805- 373-4862. E-mail address: rwbehringer@rsc.rockwell.com (R. Behringer) tracking can potentially provide a very high tracking accuracy, although the general problem of reliable 3D motion estimation from image features is largely an un- solved problem in computer vision. However, by restricting to the sub-problem of easily identi"able landmarks, the motion estimation problem can be solved (e.g., [4]). The Rockwell Science Center (RSC) is developing and integrating components into a system using AR tech- niques for visualization and auralization during mainten- ance and error diagnostics procedures. It can overlay textual information and animations, registered to real objects, onto a live video of the actual device to be diagnosed, and ultimately by directly overlaying this display into the "eld of view of the user. 3D audio rendering techniques are used to indicate the location of objects which are currently not in the user's "eld of view. To employ a non-tethered human}computer interface, the system is operated by speaker-independent speech recognition. We have developed a visual "ducial marker tracking module for real-time registration which is based on the mathematical formalism of visual servoing. One

A distributed device diagnostics system utilizing augmented reality and 3D audio

Embed Size (px)

Citation preview

0097-8493/99/$ - see front matter ( 1999 Elsevier Science Ltd. All rights reserved.PII: S 0 0 9 7 - 8 4 9 3 ( 9 9 ) 0 0 1 0 8 - 9

Computers & Graphics 23 (1999) 821}825

Augmented Reality

A distributed device diagnostics system utilizing augmentedreality and 3D audioq

Reinhold Behringer*, Steven Chen, Venkataraman Sundareswaran,Kenneth Wang, Marius Vassiliou

Rockwell Science Center, 1049 Camino Dos Rios, Thousand Oaks, CA 91360, USA

Abstract

Augmented reality (AR), combining virtual environments with the perception of the real world, can be used to provideinstructions for routine maintenance and error diagnostics of technical devices. The Rockwell Science Center (RSC) isdeveloping a system that utilizes AR techniques to provide `X-ray visiona into real objects. The system can overlay 3Drendered objects, animations, and text annotations onto the video image of a known object. An automated speechrecognition system allows the user to query the status of device components. The response is given as an animatedrendition of a CAD model and/or as auditory cues using 3D audio. This diagnostics system also allows the user to leavespoken annotations attached to device modules as ASCII text. The position of the user/camera relative to the device istracked by a computer-vision-based tracking system using "ducial markers. The system is implemented on a distributednetwork of PCs, utilizing standard commercial o!-the-shelf components (COTS). ( 1999 Elsevier Science Ltd. Allrights reserved.

Keywords: Augmented reality; Devise diagnostics; 3D audio

1. Context and related work

During the past few years, rapid progress in several keyareas has enabled the development of information sys-tems using augmented reality (AR) technology. AR pro-vides means of intuitive information presentation forenhancing situational awareness and perception by ex-ploiting the natural and familiar human interaction mo-dalities with the environment. The concept of AR hasbeen demonstrated in many applications [1] such asairplane manufacturing [2] and guided assembly [3].Critical for the success of an AR application is solving theregistration problem, i.e. aligning the user's perception ofthe real world with the information display. Vision-based

qCompressed version of talk presented at the 5th Euro-graphics Workshop on Virtual Environments (Vienna, June1999)

*Corresponding author. Tel.: #1-805-373-4435; fax: #1-805-373-4862.

E-mail address: [email protected] (R. Behringer)

tracking can potentially provide a very high trackingaccuracy, although the general problem of reliable 3Dmotion estimation from image features is largely an un-solved problem in computer vision. However, byrestricting to the sub-problem of easily identi"ablelandmarks, the motion estimation problem can be solved(e.g., [4]).

The Rockwell Science Center (RSC) is developing andintegrating components into a system using AR tech-niques for visualization and auralization during mainten-ance and error diagnostics procedures. It can overlaytextual information and animations, registered to realobjects, onto a live video of the actual device to bediagnosed, and ultimately by directly overlaying thisdisplay into the "eld of view of the user. 3D audiorendering techniques are used to indicate the location ofobjects which are currently not in the user's "eld of view.To employ a non-tethered human}computer interface,the system is operated by speaker-independent speechrecognition. We have developed a visual "ducial markertracking module for real-time registration which is basedon the mathematical formalism of visual servoing. One

Fig. 1. Schematic representations of "ducial ring marker used for visual feature tracking.

of the novel features of our concept is the integrationof computer vision, speech recognition, AR visual-ization, and 3D audio in a distributed networked PCenvironment.

2. System components

2.1. Head tracking: visual servoing with xducial markers

The "ducial markers (each with its unique ID) used bythe visual tracking system (Fig. 1) have been designed foreasy detectability in clutter and under a wide range ofviewing angles. Circular markers with concentric ringshave a high degree of symmetry, which allows the ap-plication of a simple viewpoint-invariant detection algo-rithm. Similar markers are used by Neumann [5], whodeveloped a color-code scheme for ring marker identi-"cation, and Sharma [3] who uses the con"gurationpattern of a set of markers for detecting ID and orienta-tion of the marker set. The method which we are using fordetecting motion of the observing camera, is visual servo-ing. It is a well-developed theory for robotic vision [6}8].Our application of the visual servoing approach has beenoutlined in [9] and basically tries to minimize the errorbetween actual position of image features and the pre-dicted positions.

2.2. The automatic speech recognition (ASR) server

Rockwell Science Center's automatic speech recogni-tion (ASR) server software provides an easy way to rap-idly prototype speech-enabled applications, regardless ofthe computing platform(s) on which they execute. TheASR capability of speech recognition and text-to-speechsynthesis (TTS) is obtained through abstraction of acommercially available o!-the-shelf speaker-independentspeech recognition technology, IBM ViaVoiceTM. A clientapplication connects to the ASR server over an IP net-work using TCP sockets. Speech audio data are currentlyacquired locally through a sound card installed in the PCon which the ASR Server executes, although an add-on

option for uncompressed streaming audio over IP net-works has been developed to allow the microphone toreside on another computer. Recognition results are re-ported to the client application immediately (per word)and/or upon the completion of a whole utterance (sen-tence). In the device diagnostics system, described inthis paper, speech is used both to control the operation(initialization, mode switching), and to ask questionsabout device modules, such as the query `Where is thepower supply?a. Moreover, the dictation recognitionmode is exploited to attach textual `virtual notesa toselected objects in the virtual environment. The recogni-tion is started by a trigger button pressed by the user.

2.3. AR visualization

The AR visualization provides visual rendering (per-formed by Sense8/EAI World-Toolkit library) of the de-vice which is to be diagnosed, and its components. Thefollowing items can be overlaid in real-time on the livevideo image: a CAD wireframe model of the outer deviceshape, CAD models of the interior components of thedevice, and textual annotations, attached to componentsof the device. The CAD models are overlaid onto thevideo image of the device and provide a kind of `X-rayvisiona into the interior of the device. They are animatedto blink between shaded and wireframe display renderingin order to highlight the corresponding module of thedevice (see Fig. 2).

2.4. 3D audio auralization

A three-dimensional (3D) audio system provides anauditory experience in which sounds appear to emanatefrom locations in 3D space. Head-related transfer func-tions (HRTF) [10] are basically "lters incorporating thee!ects of the sound signal re#ecting o! the listener's head,shoulders, and pinnae (outer ears). In our device diagnos-tics system, we use the commercial aureal semiconductoro!-the-shelf (COTS) 3D audio system. It comprisesa software application programming interface (API)based on the microsoft direct sound/direct sound 3D

822 R. Behringer et al. / Computers & Graphics 23 (1999) 821}825

Fig. 2. Left: Concept of the current implementation of the device diagnostics system. Right: Registered wireframe and shaded overlayonto the video image.

standard as well as a PC sound card including a chipdesigned and manufactured by Aureal. The 3D audio isoutput over a pair of headphones or two speakers. Inorder to provide 3D audio capability to non-platform-speci"c applications, a TCP/IP sockets server (the RSC3DA server) was developed. The sound source signals arestored as wave "les on the 3DA server host PC. The 3DAserver operates at 30 fps, and current COTS 3D audiosound cards support up to three 44.1 kHz-sampled soundsources.

3. System integration

We used a standard tower PC as the example devicewhich was to be `diagnoseda. A CAD model was hand-coded to describe the outer geometry and a few innercomponents of the PC: CD ROM drive, network card,power supply, and structural components. The AR vis-ualization is implemented on a 200 MHz PC, runningunder Windows NT. This PC is equipped with an Imag-ing Technology color framegrabber, connected to a Cohu2200 CCD camera. The following device errors in the PC

being `diagnoseda were simulated: CD ROM failure,power supply overheating, and network card failure. Thespeech recognition server (ASR server) and the 3D audioserver are both running on another PC (166 MHz) underWindows 95. The user can query by voice command thelocation and status of the CD ROM drive, network card,and power supply. A #ashing animation, overlaid on thevideo, visualizes the location. The user can also query thelocation of the printer and an uninterrupted power sup-ply (UPS). Their location is indicated by a 3D audio cue:spatialized sounds `movea in the direction of the locationand guide the user.

4. Experimental results

The visual AR overlay is rendered with a frameratebetween 6}10 fps, depending on the system load. Theframerate is slowed down by the image transfer imple-mentation in Windows, which is currently not optimizedfor speed. Figs. 3 and 4 show various display modalitiesof the visual overlay on the video image. Fig. 3 showsthe wireframe overlay from two di!erent directions.

R. Behringer et al. / Computers & Graphics 23 (1999) 821}825 823

Fig. 3. Overlay of wireframe onto video stream from di!erent directions.

Fig. 4. Overlay of error indication onto video image: CD ROM and power supply.

The overlay matches very well with the real object undera wide viewing angle range. Not surprisingly, the regis-tration error depends on the number (and the location) ofthe recognized "ducial markers. The alignment/registra-tion is slightly o! for more extreme viewing angles whereonly markers in one plane are clearly visible. Themarkers in the other plane are too slanted for robustrecognition. However, the range for acceptable registra-tion is quite large, and the error is acceptable. In Fig. 4,the device status is shown by a #ashing volumetric ren-dering of the relevant PC components.

5. Summary and conclusion

We have demonstrated the capability of a distributed,networked system for device diagnostics as a novel,

tetherless interface for human}computer interaction. Thedistributed architecture of the system ultimately allowsfor scalability and enables the user to be equipped withonly a light, wearable computer system that provideswearable display capabilities. One day, such a system canreplace service manuals by providing direct overlay ofmaintenance instructions or error diagnosis onto theview of a real object.

References

[1] Azuma RT. A survey of Augmented Reality. Presence:Teleoperators and Virtual Environments 1997;6(4):355}85.

[2] Mizell D. Virtual reality and Augmented Reality in aircraftdesign and manufacturing. In Proceedings of WesconConference, Anaheim, CA, September 1994, p. 91!.

824 R. Behringer et al. / Computers & Graphics 23 (1999) 821}825

[3] Sharma R, Molineros J. Computer vision-based Aug-mented Reality for guiding manual assembly. PRES-ENCE: Teleoperators and Virtual Environments1997;6(3):292}317.

[4] Koller D, Klinker G, Rose E, Breen D, Whitaker R,Tuceryan M. Real-time vision-based camera tracking forAugmented Reality applications. In Proceedings of theVRST '97, Lausanne, Switzerland, September 1997.

[5] Neumann U, Cho Y. Multi-ring "ducial systems for scala-ble "ducial Augmented Reality. In Proceedings of theVRAIS '98, Atlanta, March 1998.

[6] Espiah B, Chaumette F, Rives P. A new approach to visualservoing in robotics. IEEE Transactions on Robotics andAutomation 1992;8(3):313}26.

[7] Feddema J, Mitchell O. Vision-guided servoing with fea-ture-based trajectory generation. IEEE Transactions onRobotics and Automation 1989;5(5):691}700.

[8] Sundareswaran V, Bouthemy P, Chaumette F. Exploitingimage motion for active vision in a visual servoing frame-work. International Journal of Robotics Research1996;15(6):629}45.

[9] Sundareswaran V, Behringer R. Visual servoing-basedAugmented Reality. In Proceedings of the First Inter-national Workshop on Augmented Reality (IWAR) '98,San Francisco, CA, November 1998.

[10] Blauert J. Spatial hearing: the psychophysics ofhuman sound localization. Cambridge, MA: MIT Press,1997.

R. Behringer et al. / Computers & Graphics 23 (1999) 821}825 825