10
Interactive Multi-Marker Calibration for Augmented Reality Applications Gregory Baratoff, Alexander Neubeck, Holger Regenbrecht DaimlerChrysler Research & Technology Virtual and Augmented Environments P.O. Box 2360, 89013 Ulm, Germany E-mail: {Gregory.Baratoff|Alexander.Neubeck|Holger.Regenbrecht}@dcx.com Abstract Industrial Augmented Reality (AR) applications require fast, robust, and precise tracking. In environments where conventional high-end tracking systems cannot be applied for certain reasons, marker-based tracking can be used with success as a substitute if care is taken about (1) calibration and (2) run-time tracking fidelity. In out-of-the-laboratory environments multi-marker tracking is needed because the pose estimated from a single marker is not stable enough. The overall pose estimation can be dramatically improved by fusing information from several markers fixed relative to each other compared to a single marker only. To achieve results applicable in an industrial context relative marker poses need to be properly calibrated. We propose a semi- automatic image-based calibration method requiring only minimal interaction within the workflow. Our method can be used off-line, or preferably incrementally online. When used online, our method shows reasonably good accuracy and convergence with workflow interruption of less than one second per incremental step. Thus, it can be interactively used. We illustrate our method with an industrial applica- tion scenario. 1 Introduction Augmented Reality (AR) is becoming the most promis- ing technique to support three-dimensional tasks within to- day’s working environments [18, 19, 15, 16]. Much re- search in this field is going on around the world to provide productive AR systems in the near future. For this reason researchers and developers are focusing on the main chal- lenges of AR at this time to get them solved within a very short time frame because significant benefits and profits are expected[14]. In this context, the importance of accurate and robust tracking of the user’s head and of additional in- teraction devices is widely acknowledged [3, 4]. There are several approaches to tracking, and they can be roughly di- vided into computer-vision based inside-out tracking and (mechanic, ultrasonic, magnetic, optic) outside-in tracking methods. Especially outside-in tracking is very well-known from Virtual Reality (VR) systems and is used in many AR setups. In either case one main aspect of the applicability of a particular tracking system is the amount and kind of in- strumentation of the real world required. Sensors, emitters, cameras, or artificial features have to be placed in the envi- ronment to enable tracking. This instrumentation is a main concern especially with applications intended to be used in industrial environments. Although there are some ap- proaches in research to use the appearance or specific prop- erties of the real environments for allowing tracking without the need of instrumentation (e.g. marker-less tracking using pictures of the real environment as natural features[20, 11]), no actual usable solution exists at this point in time. The research presented here addresses a problem which arose in a specific industrial scenario. This scenario comes from the aeroplane industry, but it is highly representative of the overall problem of instrumentation, tracking fidelity, and calibration effort. We are working with the airplane man- ufacturer Airbus[1] on a system which allows the interpre- tation of Computational Fluid Dynamics (CFD) data in the passenger seating area in a real airplane cabin. The data is provided by a special CFD simulation software and needs to be displayed spatially correctly aligned with the real world (cabin). The engineer interprets the data (e.g. air stream velocities surrounding the seats) by simply looking at the augmented data using a video-see-through head mounted display with head-tracking. Some interaction with the sys- tem (mainly system control) is needed, too, but it is not the focus of the research presented here. We have two target environments for our scenario: 1. a cabin substitute (mock-up) especially used for data visualization and interpretation of 3D CAD or CFD Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

[IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

  • Upload
    h

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

Interactive Multi-Marker Calibration for Augmented Reality Applications

Gregory Baratoff, Alexander Neubeck, Holger RegenbrechtDaimlerChrysler Research & Technology

Virtual and Augmented EnvironmentsP.O. Box 2360, 89013 Ulm, Germany

E-mail: {Gregory.Baratoff|Alexander.Neubeck|Holger.Regenbrecht}@dcx.com

Abstract

Industrial Augmented Reality (AR) applications requirefast, robust, and precise tracking. In environments whereconventional high-end tracking systems cannot be appliedfor certain reasons, marker-based tracking can be used withsuccess as a substitute if care is taken about (1) calibrationand (2) run-time tracking fidelity. In out-of-the-laboratoryenvironments multi-marker tracking is needed because thepose estimated from a single marker is not stable enough.The overall pose estimation can be dramatically improvedby fusing information from several markers fixed relative toeach other compared to a single marker only. To achieveresults applicable in an industrial context relative markerposes need to be properly calibrated. We propose a semi-automatic image-based calibration method requiring onlyminimal interaction within the workflow. Our method canbe used off-line, or preferably incrementally online. Whenused online, our method shows reasonably good accuracyand convergence with workflow interruption of less than onesecond per incremental step. Thus, it can be interactivelyused. We illustrate our method with an industrial applica-tion scenario.

1 Introduction

Augmented Reality (AR) is becoming the most promis-ing technique to support three-dimensional tasks within to-day’s working environments [18, 19, 15, 16]. Much re-search in this field is going on around the world to provideproductive AR systems in the near future. For this reasonresearchers and developers are focusing on the main chal-lenges of AR at this time to get them solved within a veryshort time frame because significant benefits and profits areexpected[14]. In this context, the importance of accurateand robust tracking of the user’s head and of additional in-

teraction devices is widely acknowledged [3, 4]. There areseveral approaches to tracking, and they can be roughly di-vided into computer-vision based inside-out tracking and(mechanic, ultrasonic, magnetic, optic) outside-in trackingmethods. Especially outside-in tracking is very well-knownfrom Virtual Reality (VR) systems and is used in many ARsetups. In either case one main aspect of the applicabilityof a particular tracking system is the amount and kind of in-strumentation of the real world required. Sensors, emitters,cameras, or artificial features have to be placed in the envi-ronment to enable tracking. This instrumentation is a mainconcern especially with applications intended to be usedin industrial environments. Although there are some ap-proaches in research to use the appearance or specific prop-erties of the real environments for allowing tracking withoutthe need of instrumentation (e.g. marker-less tracking usingpictures of the real environment as natural features[20, 11]),no actual usable solution exists at this point in time. Theresearch presented here addresses a problem which arosein a specific industrial scenario. This scenario comes fromthe aeroplane industry, but it is highly representative of theoverall problem of instrumentation, tracking fidelity, andcalibration effort. We are working with the airplane man-ufacturer Airbus[1] on a system which allows the interpre-tation of Computational Fluid Dynamics (CFD) data in thepassenger seating area in a real airplane cabin. The data isprovided by a special CFD simulation software and needs tobe displayed spatially correctly aligned with the real world(cabin). The engineer interprets the data (e.g. air streamvelocities surrounding the seats) by simply looking at theaugmented data using a video-see-through head mounteddisplay with head-tracking. Some interaction with the sys-tem (mainly system control) is needed, too, but it is not thefocus of the research presented here. We have two targetenvironments for our scenario:

1. a cabin substitute (mock-up) especially used for datavisualization and interpretation of 3D CAD or CFD

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 2: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

Figure 1. Airbus cabin with volume data set.

data, and

2. the real airplane environment with a a partial or com-plete interior built for the customer.

We have to provide solutions for both environments. Forthe first environment (mock-up) we have solved the track-ing problem by using a commercially available outside-in optical tracking system[2] with high accuracy and ro-bustness. The system uses three external cameras rigidlymounted in the environment, and tracks the user’s head viaa retro-reflective rigid body mounted to a helmet. The hel-met also holds the head-mounted display and a mini cam-era for video-see-through AR. With this setup we achievevery good tracking results, allowing the engineer’s taskwithin this environment to be successfully performed (seefigure 1). However, since we can not instrument an airplanecabin with cameras, it was not possible to transfer the ARapplication to the second environment (real airplane). Wehad to apply a different tracking method that would allowthe task to be completed by the engineer. The possibili-ties of instrumenting an airplane in any manner are verylimited, for safety and for aesthetical reasons. It is almostimpossible to permanently place extra elements within anairplane cabin for tasks like the one described here. Afterevaluating different technologies we decided on the use ofmarkers which are temporarily placed by the engineer in thecabin. The markers are attached to walls, seats, the floor,head racks, etc., using double-sided tape on the back, andcan be removed at any time without damaging the cabin in-terior. The resulting workflow for the engineer now consistsof the following steps:

1. Bringing the (portable) AR system to the cabin,

2. attaching markers in the space to be evaluated,

3. calibrating the markers with respect to each other todefine one mutual reference coordinate system,

4. calibrating the transformation to fit/match the virtualworld coordinate system with the real world,

5. displaying and interpreting the CFD data, and

6. removing the markers and leaving the cabin.

The most crucial part of this procedure is step (3): multi-marker calibration. This is the case for most applicationsusing multiple markers. Often these applications use othertracking technologies because of the difficulties in properlycalibrating the markers. While keeping in view the wholeprocess, this paper focuses on a solution for the fast andreliable calibration of multiple markers, especially for headtracking.

We first motivate the need for automatic calibration froma user’s point of view and discuss relevant related work. Insection 3, we introduce the pose estimation methods usedfor tracking, a fast linear one for initialization and an itera-tive refinement strategy for enforcing nonlinear constraints.Then, in section 4 we present our marker calibration methodwhich is used to compute a consistent global marker-in-world model. In section 5 we present the results of severalexperiments that show the accuracy and the efficiency ofour calibration scheme. Finally, in section 6 we discuss themerits of our approach and outline further extensions.

2 Motivation

As already indicated, Augmented Reality applicationsrequire accurate and robust tracking in real-time in orderto seamlessly integrate virtual with real content. Marker-based visual tracking, as exemplified by the AR-toolkit soft-ware [12], has become one of the most popular trackingtechniques because it features real-time operation, poten-tially high accuracy, and low system costs (requiring onlycamera, frame-grabber, and some markers in addition to acurrent-day PC). To effectively and efficiently use this tech-nology we need to calibrate the whole system with care.These calibration tasks include:

1. Camera calibration (intrinsic camera parameters),

2. Single marker calibration in size, appearance (here pat-tern), and visual robustness (marker material, light-ing),

3. Calibration of multiple markers to form a single coor-dinate system for head tracking, and

4. Matching of marker and real world coordinates.

The multi-marker calibration which we focus on in this pa-per addresses point 3 above, and can be examined usingdifferent technologies. It consists in estimating the relativeposes (translation plus rotation) between all markers. The

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 3: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

first and most obvious method is measurement by hand. Us-ing a ruler and a protractor or similar utensils one can mea-sure the poses. This method is neither fast and comfort-able nor accurate. Especially angles in space are difficultto measure accurately. A second, very accurate, method isthe use of some kind of digitizer: a device which can mea-sure positions in 3D space, such as a conventional track-ing system or a laser scanner. Unfortunately, we cannot ap-ply this method because of the heavy instrumentation of thereal world and because the amount of time one has to spendfor one setup is unacceptable(see scenario above). Anotherway is to take pictures of the scene and feed them intosome (commercially available) photogrammetry softwarefor post-processing. This yields quite accurate results, butthe workflow of the engineer is interrupted because of thevery time-consuming task of photogrammetric computation(which is usually done in a semi-automatic way on a com-puter outside the AR environment). The fourth and mostcomfortable way for the engineer is the online application ofphotogrammetric computation as part of the workflow. Thesame camera already used for video-see-through augmenta-tion and for marker detection is used to capture on-the-flythe pictures needed to compute the relative marker poses.By inserting the task of taking appropriate snapshots of thescene with markers we allow the engineer to remain withinhis workflow. In our system, the estimation of the markerpose model is started when the first snapshot is taken andproceeds incrementally, refining the current poses by inte-grating additional snapshots as soon as they become avail-able. In this way, the system has a (partial) estimate of themarker poses at any given time. This interactivity and on-line availability is invaluable for the engineer, since it en-ables him to judge the quality of the calibration on the spot.If a particular CFD visualization and interpretation task isto be performed later again, it is also possible to furtheroptimize the calibration off-line where time constraints arenot as relevant. This kind of procedure is of general im-portance for a majority of industrially oriented applicationsusing multiple markers and is not limited to our scenario.

3 Model-based Pose Estimation

In our AR system, we employ square markers of knownsize for tracking. These markers are used for tracking dif-ferent kinds of objects : small interaction devices with asingle marker attached, larger objects with several mark-ers attached, and finally head tracking by means of markersrigidly attached in the environment. For tracking of indi-vidual markers we solve the pose estimation problem foreach quadruple of corner points associated with the samemarker. We rely on the AR-toolkit software for marker de-tection and for the extraction of the image corners. Theredundancy provided by using multiple markers per object

can improve tracking performance in both qualitative andquantitative ways. Firstly, objects are tracked even if someof the markers are occluded, as long as one marker per ob-ject is visible. Secondly, the accuracy and stability is po-tentially increased by the fusion of information from sev-eral markers. For this purpose, the pose of each markerneeds to be known in some object (or, in the case of head-tracking, in the world) coordinate system. The individual3D model features of each marker are first transformed tothe object/world coordinate system :

P4m+i = RmPm,i + tm, i = 1, .., 4 (1)

where Pm,i is the ith model point of marker m, (Rm, tm)is the marker’s pose, and P4m+i is the model point in ob-ject/world coordinates. Then, the pose of this compoundmodel - consisting of 4M points, where M is the num-ber of detected markers - is estimated as a whole. Thisfeature-level fusion is much more accurate than computingthe pose from each marker individually followed by a fu-sion of the poses. Marker calibration, i.e. the computationof the marker poses (Rm, tm) in the object/world coordi-nate system is the main focus of this paper. Our multi-viewmethod used for doing so is described in the next section.However, since it is a more general case of the single-viewpose estimation problem, we first describe the latter.

The model-based pose estimation problem for a singleimage consists in finding the pose (translation t and rotationR) that maps a set of known 3D model points Pi ∈ R3 tothe set of observed image points pi ∈ R2, i = 1, .., N :

pi = Π(RPi + t), i = 1, .., N (2)

where Π(x) = (x)1..2(x)3

is the projection operator. In practicethe image measurements are noisy, and one usually mini-mizes some form of error, e.g. the image reprojection error :

E(R, t) =N∑

i=1

‖pi − Π(RPi + t)‖2 (3)

Pose estimation is an old problem in the photogrammetryand computer vision field, and many methods have beenproposed to solve it. Minimal methods [8, 10] give closed-form solutions for the minimal case of N = 3, resp. N = 4points. These methods have the disadvantage that they cannot easily be extended to a larger number of points, exceptfor computationally expensive hypothesis-and-test strate-gies [7]. Linear methods solve the pose estimation prob-lem using fast linear algebra computations. This is madepossible by relaxing some constraints (e.g. that the recov-ered transformation should be a rigid motion) and by min-imizing an algebraic error instead of the more meaningfulgeometric error [9]. Also, linear methods usually requiremore than the minimum number of points. For example the

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 4: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

direct linear transform (DLT) method requires a minimumof 6 points, making it unsuitable for pose estimation fromsquare markers. However, in contrast to the minimal meth-ods linear methods can easily incorporate any number of ad-ditional points above that limit. Finally, nonlinear iterativemethods allow the pose to be determined from any numberof points, while minimizing the meaningful geometric er-ror. Nonlinear methods require an initial solution which isobtained by applying either a minimal or a linear method.They yield the most accurate results but are also the mostcomputationally expensive.

Pose estimation is used in two places in our system :during on-line tracking and during off- or on-line calibra-tion. During tracking, the camera pose with respect to themarkers has to be estimated in real-time, while during cal-ibration the relative marker poses have to be determined.Marker calibration has to be as accurate as possible, sinceit leads to higher accuracy and robustness during tracking.Thus, we need both a fast method for use during trackingand as an initial solution for calibration, and a more accu-rate method for calibration. In fact, as we will show in sec-tion 5, the calibration method we propose is fast enough toallow interactive use.

For the initial solution we use Fiore’s linear method [6].It is more efficient than the DLT, and only requires 4 pointsin the planar case (as opposed to 6 for the DLT), making itsuitable for tracking square markers. In order to improvethe accuracy and to enforce the rigid motion constraint, wesubsequently apply an iterative method to minimize the (ge-ometric) image error. Let the current pose estimate be de-noted by (Rk, tk), with the initial pose (obtained by thelinear method) corresponding to (R0, t0). One step of themethod consists in finding an improved estimate

Rk+1 = Rk∆Rk (4)

tk+1 = tk + ∆tk, (5)

where (∆Rk,∆tk) is the pose increment with respect tothe previous estimate. For notational convenience we dropthe superscript k for the increment. However, it should bekept in mind that at each iteration a different increment iscomputed. If we replace (R, t) by (Rk+1, tk+1) in (3),we obtain a nonlinear error function in (∆R,∆t). In orderto obtain a linear solution, we apply two approximations.The first one is the small angle approximation (as in [13])∆R = I + [v]×, where

[v]× =

0 −vz vy

vz 0 −vx

−vy vx 0

(6)

is the cross-product matrix, i.e. [v]×u = v × u. Next, weexpand the expression for the transformed point :

P′i(v,∆t) = Rk+1Pi + tk+1 (7)

= Rk(Pi + v × Pi) + tk + ∆t (8)

= (RkPi + tk) − Rk[Pi]×v + ∆t (9)

which reveals its linear dependence on the pose increment(v,∆t). We now substitute the above linearized expressionin (3). The second approximation consists in multiplyingeach term by P′

i(v,∆t)3/P′i(0,0)3, which is acceptable

for small increments. This gives us the following approxi-mate image error expression :

N∑i=1

‖ 1P′

i(0,0)3(P′

i(v,∆t)3qi − (P′i(v,∆t))1..2)‖2 (10)

The advantage of this formulation is that the error isquadratic in (v,∆t), and thus its derivative is linear. Weobtain the pose increment by solving a linear equation andfinish the iteration by updating the pose estimate. This pro-cedure is repeated until convergence. The required numberof iterations depends on the goodness of the initial solutionand on the tightness of the convergence criterion. For on-line tracking we have found that one or two iterations aresufficient.

4 Marker Calibration

In the previous section we described how object- orhead-tracking from multiple markers is performed. Here,we describe a method for determining the marker poses inthe object/world coordinate system from multiple views.Theoretically, a single view would suffice if all markerswere seen in it. However, in practice it is usually hard tofind a position from which all markers are seen. Also, theaccuracy of the pose is much higher if a marker is seen fromseveral views. As in the single-view case, we first computean initial solution in closed form, and then refine this es-timate using an iterative method that minimizes the imagereprojection error over all marker points in all views. Forthe refinement step we try to keep the simplicity of the lin-earized method used for the single-image case.

Let Mj be the set of markers seen in view j. The markerdetection and corner extraction step yields the image points{pj

m,i}i=1..4 for each visible marker m ∈ Mj , with cor-responding 3D model points {Pm,i}. From this, we com-pute the marker-in-camera poses (Rj

m, tjm) using the linear

method from section 3.If we want to compute the marker-in-world poses, which

we denote by (Rm, tm), at least one of them needs to beknown, since from the images we can only determine themarker-in-camera poses, and - by chaining two together -the relative marker-to-marker poses. We resolve this unde-terminacy by designating one marker the ”world marker”.For simplicity, this will be the marker with index 0. Its poseis given by R0 = I, t0 = 0 and need not be computed.

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 5: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

In addition to the marker poses, our marker calibrationapproach also requires us to compute the camera-in-worldposes, one for each view. These poses are of course not ofinterest as a final result, but are so-called ”nuisance” param-eters that need to be estimated. In fact, we will estimate notthe camera-in-world poses, but their inverses. This simpli-fies the notation in the error expression given below. Wedenote the world-in-camera poses by (Rj , tj), j = 1, .., N ,with a bar to distinguish them from the marker poses.

The total (squared) image reprojection error to be mini-mized is :

E =N∑

j=1

∑m∈Mj

4∑i=1

Ejm,i (11)

Ejm,i = ‖pj

m,i − Π(Rj(RmPi + tm) + tj)‖2 (12)

This error function consists of the 4∑N

j=1 |Mj | individual

error terms Ejm,i, and it depends on M marker poses and N

camera poses, for a total of 6(M + N) parameters. For theevaluation of the accuracy the total error is not very usefulsince the number of data items and the number of unknownsgrow with each new image taken. It is more meaningful todefine a normalized error which gives the average error perimage point. This normalized image reprojection error isgiven by

E =

√2

E

8∑N

j=1 |Mj | − 6(M + N)(13)

i.e. by the total error divided by the number of data termsminus the number of unknowns. In section 5 we use thiserror to measure the accuracy of the calibration.

4.1 Initial Solution

The computation of the initial solution can best be de-scribed by considering the graph structure defined by theset of markers and cameras. The markers and the cam-eras are the nodes of this graph, and its edges are given bythe marker-in-camera poses {(Rj

m, tjm)}, j = 1, .., N,m ∈

Mj , that are directly measured in the images. Initially, onlythe world marker is marked as having a valid pose, namely(R0 = I, t0 = 0). The poses for the other markers andfor the cameras are determined by propagating the infor-mation along the edges, first from the valid markers to thecameras in which these markers are seen, and then from thevalid cameras to the markers which are seen by them. Thus,for all cameras that see a valid marker, the world-in-cameraposes can be determined by concatenating the inverse of themarker-in-camera pose of a seen marker with that marker’smarker-in-world pose :

Rj = (Rmj )T Rm

tj = (Rmj )T tm − (Rm

j )T tmj .

Then, for all markers that are seen by a valid camera, themarker-in-world pose can be computed by concatenatingthe marker-in-camera pose and the camera-in-world pose :

Rm = Rmj Rj

tm = Rmj tj + tm

j .

By alternating these two steps, and by marking the visitednodes as valid, all markers and cameras are eventually vis-ited and assigned a world pose.

During expansion there will often be several paths to anode. The simplest strategy is a greedy one where onlythe first path to a node is expanded and where valid nodesare not revisited. More sophisticated strategies that take the’best’ path or fuse paths are of course possible. However,we have found that the greedy strategy produces a goodenough initial solution. This is especially so in the interac-tive calibration mode where the iterative refinement methoddescribed in the next section is used to optimally fuse the in-formation from all currently available views, and where thegreedy strategy is only used to propagate the world pose in-formation to the new camera node and to any markers seenfor the first time.

4.2 Iterative Refinement

In order to minimize the multi-image reprojection error(11), we refine the current estimate iteratively in a similarway to the single-view case. Since we want to keep thesimplicity of a linearized solution as obtained in the single-view case, we have to split each iteration in two steps. Inthe first step we update the cameras, and in the second stepwe update the markers. In both steps, we apply the sametwo approximations as in the single-view case and we endup with linear solutions for the pose increments. We alter-nate between updating the cameras and the marker posesand repeat this until convergence.

The asymptotic computational complexity of one itera-tion step is O(

∑Nj=1 |Mj |), which corresponds to the num-

ber of instances of detected markers over all images (i.e.each marker is counted as many times as it appears in theset of images). This is basically the cost for computing thecoefficients of the N (resp. M ) linear systems of equationin the camera (resp. marker) update step. There are N + Mof these systems of equation, and since the cost of solvingeach of these equations is constant, the contribution of thiscomputation is dominated by the costs of setting up the sys-tems of equation.

5 Experiments

In this section we show the results of applying our cali-bration method to the Airbus cabin mockup scenario. The

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 6: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

Figure 2. Calibrated Airbus mockup. Shown are the first eight views of the calibration sequence.Result obtained using 23 images and 50 refinement iterations per image.

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 7: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

Figure 3. Convergence of marker calibration.The three graphs show the evolution over 50refinement iterations of the normalized error(in pixels) for the 10th, 15th, and 20th image,respectively.

mockup consists of two rows of two seats and the adjoiningcabin wall. We attached 12 markers to the seats and wallsand took 40 images of the scenario from different view-points. Of the 40 images we used 23 for the marker cali-bration. The remaining 17 were used for testing purposes(see below). Figure 2 shows the first eight of the 23 viewsused for calibration. The result was obtained using the in-cremental method described in section 4. We performed50 refinement iterations per additional image, a number weexperimentally found to be sufficient to guarantee conver-gence. Figure 3 shows the normalized error (in pixels) as afunction of the number of iterations at different points of theimage sequence. Several results can be read from this graph.First of all, it clearly shows that the method converges fairlyquickly to an average image error slightly above one pixel.Furthermore, one observes that for the 10th image the con-vergence is gradual, whereas for the 15th and 20th imagethe convergence is very quick, basically within two to threeiterations. This is an extremely important fact, since witheach additional image the complexity of each iteration stepincreases, as computed in section 4.2. Recall that the com-plexity is dominated by the number of marker instances de-tected in the images. In Figure 4 we have plotted the evolu-tion of this number for two different sequences of our imageset. One can see that it rises monotonically, roughly linearlywith the number of images. Thus, the two factors determin-ing the response time of the system during calibration be-have reciprocally. Since our goal is to use the calibration ininteractive mode (i.e. with fixed response times in the orderof one second), this could be a very important result. Moreexperiments are however necessary to substantiate it.

One potentially problematic aspect of the above argu-ment is that the number of iterations depends heavily on theparticular sequence in which the images are seen. In fact,

Figure 4. Evolution of the complexity of oneiteration step, given by

∑Nj=1 |Mj |, the number

of marker instances detected.

in the example above we used a ”well-behaved” sequence.Nevertheless, we will show that there exists a very simpleheuristic that allows the engineer to always generate such”well-behaved” sequences. To illustrate the worst case, wewill now contrast this first sequence with a ”bad” sequenceof the same image set.

Figure 5 shows the evolution of the normalized error forthese two sequences. In the set of images there were severaloverall views (such as the ones in the top half of Figure 2).All other images had between two and four markers each.In the well-behaved sequence (called ”far to near”) the over-all views were at the front of the sequence, whereas in theother sequence (called ”near to far”) they were at the end.One can see that in the ”far-to-near” sequence the error risesuntil all markers are seen at least once, and then decreases.This behavior is pretty much the same whether 15, 20, 30, or50 iterations are used. Figure 6 shows the number of mark-ers detected up to a given image. We see that all 12 markershave been seen by the 9th, respectively 17th, image. For the”near-to-far” sequence the convergence behavior is muchdifferent, because overall information is only provided laterin the sequence when cycles in the graph are closed. Atthose points (e.g. at images 10 and 13) the error increasessubstantially, requiring more iterations for the subsequentimages. The ”far-to-near” sequence therefore represents amuch better behavior from the point of view of an interac-tive system, since it converges in fewer iterations than the”near-to-far” sequence. Thus, we can formulate a simpleheuristic for the engineer to follow when taking snapshotsof the scene, namely to first take overview shots and to thenzoom in on sets of two to four markers in order to obtainbetter precision.

5.1 Testing

Using about 20 images turned out to produce sufficientaccuracy for the Airbus setup with 12 markers, as the error

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 8: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

Figure 5. Normalized error for two sequencesof same image set. Top: "far-to-near" se-quence with two overall views as first two im-ages. Bottom: "near-to-far" sequence withtwo overall views as last two images. Thefive graphs correspond from top to bottom to10, 15, 20, 30, and 50 iterations per image.

Figure 6. Evolution of the number of markersdetected for the two images sequences. Allmarkers are found by the 9th, resp. 117th,image.

plot in Figure 5 shows. However, this does not prove thatthe error will be small in views taken from different per-spectives than those of the 23 calibration images. In orderto test whether the accuracy extends to other views, we eval-uated the error for 17 additional images that were not usedin the calibration phase. Figure 7 shows four such imageswith the markers overlaid. The average normalized cornerpoint error was 0.75 pixels. Three of the test images had er-rors of 2 pixels, the others well below one pixel. The errorscan be traced back to two causes : (a) errors in the markerdetection stage, and (b) errors due to insufficient number ofviews of a set of at least three markers. The latter was thecase for the ”checked” marker, which is seen in the lowerright image of Figure 7. In our setup it corresponds to a”bridge” point connecting the markers on the back row ofseats and the ones on the front row of seats.

6 Discussion and Conclusions

We have presented an AR-based system which supportsa whole application task using marker-based tracking. Theproblem of on-the-fly calibration of multiple markers issolved using a novel approach. The main advantage liesin the integration of the calibration phase into the work-flow of the engineer using the AR technology. While pre-vious systems and approaches rely on time-consuming cal-ibration methods external to the actual task, our method issuitable for integration into almost any multi-marker appli-cation. Due to the fast computation of the relative markerposes one could call it an interactive real-time calibrationmethod. The results can be summarized as follows:

1. We have achieved a quality of marker calibrationwhich is sufficient for our task, as well as for the ma-jority of tasks in similar applications.

2. The marker calibration is fast enough to be performedon-the-fly (computation time about one second).

3. We have identified the simple heuristic ”first overview,then detail” to be used by the engineer that insures con-vergence at interactive rate.

4. We use the same equipment for calibration as we usein the actual AR application. In particular, using samecamera avoids potential errors in the transformationbetween different camera models or parameters.

5. The calibration result can be further optimized off-linefor applications requiring more precision.

One main lesson learned is the importance of a very goodcamera calibration. For the example setup we achieve an av-erage image error of somewhat less than a pixel. Althoughthis is quite satisfactory, we think that even better results can

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 9: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

Figure 7. Test images. Shown are four of the 17 test images with markers overlaid.

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE

Page 10: [IEEE Comput. Soc IEEE and ACM International Symposium on Mixed and Augmented Reality - Darmstadt, Germany (30 Sept.-1 Oct. 2002)] Proceedings. International Symposium on Mixed and

be obtained with improved camera calibration. The secondcrucial part is the calibration of the single markers used inthe multi-marker setup. Little errors here can lead in theworst case to an inconsistent and therefore not usable over-all calibration. Although our solution is already usable inan industrial context there is room for improvements. In thefuture we are going to:

• Comprehensively integrate the approach into our in-house VR software package DBView [17], includingdocumentation, tutorial and GUI development,

• Make our optimization more robust and tune it forfaster convergence

• Add checks for outlier detection and removal, so thatthe system can be used interactively by really anybody,

• Provide a more comfortable tool and method to matchthe real world (coordinate system) to the marker co-ordinate system using natural features within the envi-ronment [5].

• Extend the system to natural feature tracking whencomputational power and algorithms have reached areal-time level.

References

[1] http://www.airbus.com.

[2] advanced real-time tracking (ART) GmbH.http://www.ar-tracking.com.

[3] R. Azuma. A survey of augmented reality. Presence,6(4):355–385, 1997.

[4] R. Azuma, Y. Baillot, R. Behringer, S. Feiner,S. Julier, and B. MacIntyre. Recent advances in aug-mented reality. IEEE Computer Graphics and Appli-cations, 21(6):34–47, 2001.

[5] P.A. Bayerl and G. Baratoff. An interactive vision-based tool for model-based scene calibration of aug-mented reality environments. In Proc. WSCG’2002,pages 55–62, February 2002.

[6] P. Fiore. Efficient linear solution of exterior orienta-tion. Trans. on Pattern Analysis and Machine Intelli-gence, 23(2):140–148, 2001.

[7] M. A. Fischler and R. C. Bolles. Random sample con-sensus: a paradigm for model fitting with applicationsto image analysis and automated cartography. Com-munications of the ACM, 24(6):381–395, 1981.

[8] R. Haralick, H. Joo, C. Lee, X. Zhuang, V. G. Vaidya,and M. B. Kim. Pose estimation from correspondingpoint data. IEEE Transactions on Systems, Man andCybernetics, 19(6):1426–1446, 1989.

[9] R. Hartley and A. Zisserman. Multiple View Geome-try in Computer Vision. Cambridge University Press,Cambridge, United Kingdom, 2000.

[10] R. Horaud, B. Conio, O. Leboulleux, and B. Lacolle.An analytic solution for the perspective 4-point prob-lem. Computer Vision, Graphics, and Image Process-ing, 47:33–44, 1989.

[11] B. Jiang and U. Neumann. Extendible tracking by lineauto-calibration. In Proc. ISAR’2001, pages 97–103,2001.

[12] H. Kato and M. Billinghurst. Marker tracking andhmd calibration for a video-based augmented realityconferencing system. In Proc. 2nd Int. Workshop onAugmented Reality (IWAR’99), pages 85–94, 1999.

[13] R. Kumar and A.R. Hanson. Robust methods for esti-mating pose and a sensitivity analysis. CVGIP : ImageUnderstanding, 60(3):313–342, 1994.

[14] The ARVIKA project. http://www.arvika.de.

[15] H. Regenbrecht, G. Baratoff, and M. Wagner. A tan-gible ar desktop environment. Computer & Graphics,25(5):755–763, 2001.

[16] H. Regenbrecht, M. Wagner, and G. Baratoff. Mag-icmeeting - a collaborative tangible augmented realitysystem. Virtual Reality - Systems, Development andApplications, 6(3):in press, 2002.

[17] J. Sauer. Virtual reality in der produktentwicklung[virtual reality in product development]. Nr. 1614,VDI-Berichte [Reports], 2001.

[18] D. Schmalstieg, A. Fuhrmann, Z. Szalavari, andM. Gervautz. Studierstube- collaborateive augmentedreality. In Proc. Collaborative Virtual Environments’96, 1996.

[19] D. Schmalstieg, A. Fuhrmann, Z. Szalavari, andM. Gervautz. Bridging multiple user interface di-mensions with augmented reality systems. In Proc.ISAR’2000, pages 20–29, 2000.

[20] G. Simon, A. Fitzgibbon, and A. Zisserman. Marker-less tracking using planar structures in the scene. InProc. ISAR’2000, pages 137–146, 2000.

Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR’02) 0-7695-1781-1/02 $17.00 © 2002 IEEE