2
Photo-based Industrial Augmented Reality Application Using a Single Keyframe Registration Procedure Pierre Georgel * CAMP TU M ¨ unchen Selim Benhimane METAIO urgen Sotke CAMP TU M ¨ unchen Nassir Navab CAMP TU M ¨ unchen ABSTRACT In the recent years, many Industrial Augmented Reality (IAR) ap- plications are shifting from video to still images to create a mixed view. This new type of application is called Photo-based Aug- mented Reality. In order to guarantee the success of these appli- cations, a simple and efficient registration method is required. We present a new method to register an image to a CAD model using a single keyframe. This registration is based on sparse 3D informa- tion from the model linked to the keyframe during its offline regis- tration. We demonstrate this method in our in-house IAR software for Visual Inspection and Documentation: VID. 1 I NTRODUCTION Typical AR applications focus on augmenting the reality with vir- tual data. The link to reality can be made using head mounted dis- plays, head-up, or camera mounted tablet PC. These systems how- ever do not always integrate well in industrial workflow. Therefore, a new trend appeared where the augmentation is not continuous but rather based on still images acquired with a high-resolution digital camera. These methods are refereed as: Photo-based AR. Augmentation of still images have been investigated in the past, for example Stricker and Navab [6] augment industrial pictures with CAD information to help architectural planning. For the registra- tion, the user manually selects corresponding points between the image and the model. Appel et al. [1] use photographs of plants to re-engineer a CAD model and improve the documentation. They use correspondences between technical drawings and images to per- form the registration. Pentenrieder et al. [4] are using photo-based AR for factory planning. Georgel et al. [3], use natural landmarks to perform the registration and use AR to find discrepancies be- tween the CAD model and the actual plant. In this paper, we present a new method to automatically reg- ister an image to a CAD model using a single keyframe because pose estimation using CAD model is rarely automatic and hardly handles missing or wrong data. We use state-of-the-art computer vision techniques to estimate a relative pose from a keyframe to the image to register. Unfortunately, this relative pose lacks one degree of freedom, the length of the baseline preventing it from being used for augmentation. We estimate the length of this baseline automat- ically by using template matching to propagate 2D-3D correspon- dences established within the keyframe. Our method automatically estimates a full pose which can be used for direct augmentation. Keyframes are commonly used in AR applications to increase efficiency and reliability. Keyframes are still images that have been pre-registered to the model. Vacchetti et al. [7] for exam- ple uses several keyframes and local bundle adjustment to obtain a full pose for the current frame. Stricker and Navab [6] use their previously registered image to estimate the pose and the change in * e-mail: [email protected] e-mail: [email protected] focal length/zoom of the camera. They solve the scale problem by interactively selecting corresponding point between the image and the model. Platonov et al. [5] uses a set of pre-registered keyframes to estimate a relative pose for the current frame. The extension to a full pose is then automatically computed using the 3D model. The real-scale is estimated based on the depth of triangulated feature points projected onto the model. Georgel et al. [2] also use a sin- gle keyframe to estimate the relative pose of the target image and extend it to a full pose by matching reconstructed planes from the feature points to planar structures in the model. The novelty of our approach lies in the fact that it only uses a single keyframe and limited 3D information: a sparse set of points. 2 THEORETICAL BACKGROUND The motion between two calibrated cameras (i.e. cameras with known internal parameters) is described by the essential matrix E, which relates points p from the keyframe to point q on the epipolar lines l in the target image q > K -> t EK -1 s p = 0, where K s (resp. K t ) is the matrix of intrinsic parameters for the keyframe (resp. target). This matrix can be decomposed into the product of a skew matrix and a rotation matrix as follows: E =[t] × R. Each matrix E leads to four possible decompositions. This am- biguity is solved using the features correspondences, because only the physically correct set of rotation and translation triangulates the image points in front of the cameras. Hence, we will from now on suppose that we have access to the correct decomposition. From the decomposition formula, we can already grasp the problem we solve within this paper: t can only be determined up to scale. Therefore, we suppose that t is a unit vector for which we have to find the cor- rect scaling. The initial essential matrix is usually estimated using the 8-point algorithm and is then refined using a nonlinear scheme which minimizes a quadratic geometric cost function, the so-called re-projection error. Note that this method does not estimate the true scale of the observed structure because this cost is invariant to changes in scales. Standard methods recovering the scale s are based on manually specification of a known 3D point or a known 3D distance. 3 OUR METHOD We henceforth assume that we have access to a number of corre- spondences in the keyframe with the model expressed as (c, C). These pairs of 2D point c and 3D point C are usually estimated during the registration of the keyframe. Let l be the epipolar line induced by the point c in the target image. All points c 0 on l corre- spond to a unique scale s. This bijective relation is deduced using RC + st K -1 t c 0 = m 0 as follows: c 0 l , m 0 × (RC + st)= 0 s = - ( [m 0 ] × t ) > [m 0 ] × RC [m 0 ] × t 2 , (1) this relation is true for all points that satisfy [m 0 ] × t 6= 0. Furthermore, if we suppose that C is locally planar and that n (knk = 1) is a normal vector to this plane (which can be obtained 187 IEEE International Symposium on Mixed and Augmented Reality 2009 Science and Technology Proceedings 19 -22 October, Orlando, Florida, USA 978-1-4244-5419-8/09/$25.00 ©2009 IEEE

Photo-based Industrial Augmented Reality Application Using ...users.cis.fiu.edu/~yangz/to_read/2009_ISMAR/05336468.pdf · applicability in the context of an industrial photo-based

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Photo-based Industrial Augmented Reality Application Using ...users.cis.fiu.edu/~yangz/to_read/2009_ISMAR/05336468.pdf · applicability in the context of an industrial photo-based

Photo-based Industrial Augmented Reality Application Using a SingleKeyframe Registration Procedure

Pierre Georgel∗

CAMPTU Munchen

Selim Benhimane†

METAIOJurgen Sotke

CAMPTU Munchen

Nassir NavabCAMP

TU Munchen

ABSTRACT

In the recent years, many Industrial Augmented Reality (IAR) ap-plications are shifting from video to still images to create a mixedview. This new type of application is called Photo-based Aug-mented Reality. In order to guarantee the success of these appli-cations, a simple and efficient registration method is required. Wepresent a new method to register an image to a CAD model using asingle keyframe. This registration is based on sparse 3D informa-tion from the model linked to the keyframe during its offline regis-tration. We demonstrate this method in our in-house IAR softwarefor Visual Inspection and Documentation: VID.

1 INTRODUCTION

Typical AR applications focus on augmenting the reality with vir-tual data. The link to reality can be made using head mounted dis-plays, head-up, or camera mounted tablet PC. These systems how-ever do not always integrate well in industrial workflow. Therefore,a new trend appeared where the augmentation is not continuous butrather based on still images acquired with a high-resolution digitalcamera. These methods are refereed as: Photo-based AR.

Augmentation of still images have been investigated in the past,for example Stricker and Navab [6] augment industrial pictures withCAD information to help architectural planning. For the registra-tion, the user manually selects corresponding points between theimage and the model. Appel et al. [1] use photographs of plantsto re-engineer a CAD model and improve the documentation. Theyuse correspondences between technical drawings and images to per-form the registration. Pentenrieder et al. [4] are using photo-basedAR for factory planning. Georgel et al. [3], use natural landmarksto perform the registration and use AR to find discrepancies be-tween the CAD model and the actual plant.

In this paper, we present a new method to automatically reg-ister an image to a CAD model using a single keyframe becausepose estimation using CAD model is rarely automatic and hardlyhandles missing or wrong data. We use state-of-the-art computervision techniques to estimate a relative pose from a keyframe to theimage to register. Unfortunately, this relative pose lacks one degreeof freedom, the length of the baseline preventing it from being usedfor augmentation. We estimate the length of this baseline automat-ically by using template matching to propagate 2D-3D correspon-dences established within the keyframe. Our method automaticallyestimates a full pose which can be used for direct augmentation.

Keyframes are commonly used in AR applications to increaseefficiency and reliability. Keyframes are still images that havebeen pre-registered to the model. Vacchetti et al. [7] for exam-ple uses several keyframes and local bundle adjustment to obtaina full pose for the current frame. Stricker and Navab [6] use theirpreviously registered image to estimate the pose and the change in

∗e-mail: [email protected]†e-mail: [email protected]

focal length/zoom of the camera. They solve the scale problem byinteractively selecting corresponding point between the image andthe model. Platonov et al. [5] uses a set of pre-registered keyframesto estimate a relative pose for the current frame. The extension to afull pose is then automatically computed using the 3D model. Thereal-scale is estimated based on the depth of triangulated featurepoints projected onto the model. Georgel et al. [2] also use a sin-gle keyframe to estimate the relative pose of the target image andextend it to a full pose by matching reconstructed planes from thefeature points to planar structures in the model.

The novelty of our approach lies in the fact that it only uses asingle keyframe and limited 3D information: a sparse set of points.

2 THEORETICAL BACKGROUND

The motion between two calibrated cameras (i.e. cameras withknown internal parameters) is described by the essential matrix E,which relates points p from the keyframe to point q on the epipolarlines l in the target image q>K−>t EK−1

s p = 0, where Ks (resp. Kt )is the matrix of intrinsic parameters for the keyframe (resp. target).This matrix can be decomposed into the product of a skew matrixand a rotation matrix as follows: E = [t]×R.

Each matrix E leads to four possible decompositions. This am-biguity is solved using the features correspondences, because onlythe physically correct set of rotation and translation triangulates theimage points in front of the cameras. Hence, we will from now onsuppose that we have access to the correct decomposition. From thedecomposition formula, we can already grasp the problem we solvewithin this paper: t can only be determined up to scale. Therefore,we suppose that t is a unit vector for which we have to find the cor-rect scaling. The initial essential matrix is usually estimated usingthe 8-point algorithm and is then refined using a nonlinear schemewhich minimizes a quadratic geometric cost function, the so-calledre-projection error. Note that this method does not estimate thetrue scale of the observed structure because this cost is invariantto changes in scales. Standard methods recovering the scale s arebased on manually specification of a known 3D point or a known3D distance.

3 OUR METHOD

We henceforth assume that we have access to a number of corre-spondences in the keyframe with the model expressed as (c,C).These pairs of 2D point c and 3D point C are usually estimatedduring the registration of the keyframe. Let l be the epipolar lineinduced by the point c in the target image. All points c′ on l corre-spond to a unique scale s. This bijective relation is deduced usingRC+ st ∝ K−1

t c′ = m′ as follows:

∀c′ ∈ l ,[m′

]× (RC+ st) = 0⇒ s =−

([m′]× t

)> [m′]×RC∥∥[m′]× t∥∥2 , (1)

this relation is true for all points that satisfy [m′]× t 6= 0.Furthermore, if we suppose that C is locally planar and that n(‖n‖ = 1) is a normal vector to this plane (which can be obtained

187

IEEE International Symposium on Mixed and Augmented Reality 2009Science and Technology Proceedings19 -22 October, Orlando, Florida, USA978-1-4244-5419-8/09/$25.00 ©2009 IEEE

Page 2: Photo-based Industrial Augmented Reality Application Using ...users.cis.fiu.edu/~yangz/to_read/2009_ISMAR/05336468.pdf · applicability in the context of an industrial photo-based

Baseline Direction

Warped Templates

Target Image

Template

Keyframe

Scale Samples

Figure 1: Scale from one propagated 2D-3D correspondence: the3D point C projects on c in the keyframe and c maps to the epipolarline l in the target image. The template matching is performed be-tween the template around c and warped templates on l. The warpis parametrized using the plan πC and the scale samples.

from a CAD model), each point C induces a set of homographies

H(s,πC) = R− stn>

d, (2)

between the keyframe and the image to register, with πC =[n>,d

]>the plane around C and d being the distance between the

point X and camera center of the keyframe.For each homography, we have a one to one mapping betweenneighbors of c and the neighbors of c′ . Therefore, it is possibleto define an intensity based criterion to match c to the right c′. Ourtemplate matching score f (s) is defined as follows:

f (s) = NCC(S ,H−1 (s,πC)(T )

), (3)

with S and T being two image patches and NCC being the Nor-malized Cross-Correlation. This is made possible because (1) guar-antees a unique s for each points of l. So finding the scale s canbe summarized as computing f for each c′ ∈ l and looking for themaximum of the function. A schematic of the search is shown inFigure 1.

4 INDUSTRIAL APPLICATION

We implemented the presented method within our Industrial Aug-mented Reality Software. The goal of the software is to provide abetter documentation of the CAD model of power-plants and to helpperforming a discrepancy check (i.e. verify correctness of the builtitem compared to the planned CAD model). Giving access to im-ages of the built plant offers insights about undocumented features(for example electrical wires), misplaced or modified components.This visualization can be achieved in any CAD software having tex-ture capabilities to display the images, but such software requireswell designed methods to register images to the coordinate systemof the CAD model. This was one of the main motivation to developan automatic method to register images using a single keyframe.

The keyframes were registered using anchor-plates [3]. Anchor-plates are metallic plates embedded in concrete structures (walls,ceiling and floor) of rectangular shape used to fix different compo-nents. The corners of these anchor-plates and their correspondingimage points are used as 2D-3D correspondences for the method.The relative pose is estimated using SIFT features, RANSAC andthe Gold Standard algorithm in order to obtain geometrically opti-mal solution. In order to register an image the user has to select akeyframe; it should have enough overlap to obtain a relative pose.Some results are visible in figure 2.

Figure 2: Industrial Application: (top) matching and propagation re-sults: the propagated 2D-3D correspondences in pink (left keyframe,right target), matched features in green; (bottom) augmentation usingthe obtained full pose.

5 CONCLUSION AND FUTURE WORK

In this paper, we present an automatic method to extend a relativepose to a full pose. The relative pose is sufficient for many Com-puter Vision applications. However, in Augmented Reality the fullpose is needed to correctly superimpose the virtual object onto thereal view of the world. In such applications, relative pose is oflimited use. The method introduces a homographic warp that isparametrized by the translation length. We have demonstrated itsapplicability in the context of an industrial photo-based augmentedreality application. Not requiring multiple pre-registered images ormultiple 2D-3D correspondences greatly broaden the application ofkeyframe for Photo-based Augmented Reality.

Future works should include intensive test of the behavior of theapproach in presence of different error coming from the pose of thekeyframe or the keypoints detection process. The development ofa cost function with no gauge freedom for the full pose estimationshould also be considered in order to refine the current estimate.

ACKNOWLEDGEMENTS

We would like to thanks our industry partners, from Siemens CT,Mirko Appel and, from Areva NP, Ralf Keller and Stefan Schroeterfor their continuous supports.

REFERENCES

[1] M. Appel and N. Navab. Registration of Technical Drawings and Cali-brated Images for Industrial Augmented Reality. MVA, 2002.

[2] P. Georgel, P. Schroeder, S. Benhimane, M. Appel, and N. Navab. Howto Augment the Second Image? Recovery of the Translation Scale inImage to Image Registration. ISMAR, 2008.

[3] P. Georgel, P. Schroeder, S. Benhimane, S. Hinterstoisser, M. Appel,and N. Navb. An Industrial Augmented Reality Solution For Discrep-ancy Check. ISMAR, 2007.

[4] K. Pentenrieder, C. Bade, F. Doil, and P. Meier. Augmented Reality-based Factory Planning - an Application Tailored to Industrial Needs.ISMAR, 2007.

[5] J. Platonov, H. Heibel, P. Meier, and B. Grollmann. A mobile marker-less AR system for maintenance and repair. ISMAR, 2006.

[6] D. Stricker and N. Navab. Calibration Propagation for Image Augmen-tation. IWAR, 1999.

[7] L. Vacchetti, V. Lepetit, and P. Fua. Stable Real-time 3d Tracking usingOnline and Offline Information. IEEE Trans. PAMI, 26(10), 2004.

188