artoolkit3

8/2/2019 artoolkit3

1/7

Project reportAugmented reality with ARToolKit

FMA175 Image Analysis, ProjectMathematical Sciences, Lund Institute of Technology

Supervisor: Petter Strandmark

Fredrik Larsson ([email protected])

December 5, 2011

1

8/2/2019 artoolkit3

2/7

1 Introduction

Augmented reality (AR) is the the concept of enhancing real physical world with an extralayer of information. Additionally, this should be done in real-time and also provide somemeans of interaction. In a computer application this can be achieved by analyzing a videocapture feed using image analysis and computer vision algorithms and then renderingsome object on top of the video image. Determining where and how to render the objectscan be done in numerous ways. It is possible to use positioning systems such as GPS,gyroscopic sensors or different image analysis and computer vision algorithms to detectmarkers in the video feed. The latter is the approach discussed in this report. The mainproblem, and what is common for all approaches, is how to determine where the vieweris positioned and oriented in the real physical world.

The goal of this project is to explore the capabilities and limitations of a software li-brary called ARToolKit. Using this library a demo application has also been produced.

This demo application is written in the C programming language with GNU/Linux beingthe target platform.

2 ARToolKit

ARToolKit is a software library that aids in the development of AR applications. It is writ-ten in C, and is free for non-commercial use under the GNU General Public License. Amore production-ready and better supported version is also available for non-free use.The software was originally developed by Dr. Hirokazu Kato but is currently maintainedby the Human Interface Technology Laboratory at the University of Washington [1]. Sinceits initial release in the late 1990s it has undergone a rewrite and the current incarnationof the toolkit was released in 2004. After that a few sporadic releases has occurred up untilits most recent version (2.72.1) which was released in 2007. At this time, not much seemsto be going on in terms of further development of the library, at least if judging by theprojects official web site.

The software library aims to be cross-platform and runs on most common operat-ing systems, including Microsoft Windows, GNU/Linux and MacOS X. Several ports andbindings exist for other languages and platforms, such as Java and Android [2].

2.1 Detection algorithm

The primary functionality of the ARToolKit library is to detect markersin a captured videoframe. These markers typically consist of a black and white pattern with a thick frame. Anumber of sample patterns is bundled with the library, but it is also possible to create

custom patterns. An example pattern is displayed in figure 1. This pattern is also usedby the demo application developed during this project. The toolkit supports detectingmultiple markers in the same image frame.

The algorithm used to detect the pattern uses a few basic concepts of image analysis.As a first step, the captured image is filtered through a thresholding filter yielding a binaryimage. The threshold value is one of the few parameters that can be set by the user ofthe library. The binary image is then passed through a connected-component labelingalgorithm. The results of this pass is a labeling of the different regions of the image andthe goal is to find big regions of the image, such as the wide black border shown in figure1. From the information acquired from the labeling, the algorithm proceeds by detectingthe contours of the pattern, from which one can extract the edges and corners of the

2

8/2/2019 artoolkit3

3/7

Figure 1: An example of a pattern that ARToolKit can detect.

pattern. This finalizes the detection algorithm, and the obtained information can be usedin the next step which computes the camera transform [3].

2.2 Computer vision

After detecting a pattern in the video a number of transformations is performed in orderto be able to render a three-dimensional object on top of the frame. The mathematicalmodel provided by the pinhole camerais simple and convenient, but does not correspond

fully with the physical camera used to capture the image. It is however possible to idealizethe camera using an affine transformation. This transformation is the 33-matrix

K=

cot u0

0

sinv0

0 0 1

,

which contains what is called the cameras intrinsic parameters[4]. and are the mag-nification factors in the x and y directions respectively, expressed in pixel units. Theparameter is the skew factor, or the angle between the axes, which should ideally beequal to 90, but may not be. Finallyu0 and v0 are the location of the principal point, inpixel units, which is the point where the optical axis intersects the image plane. After the

normalization, the detected pattern can be matched against a number of templates todetermine which pattern that has been detected.

Next, using the lines and corners from the detection algorithm, a projective transfor-mation is computed. The projective transformation maps the image plane onto itself

with the perspective taken into account. An important property of this transformation isthat a line maps to a line, with cross-ratios preserved. And finally, at this point the cameratransformcan be computed, which is a mapping between the cameras coordinate systemand the worlds.

These computations needs to be done at every frame because the transformations de-pend both on the real world position of both markers and camera. The intrinsic parame-ters however only change if the focal length of the camera changes, e.g. when zooming.

3

8/2/2019 artoolkit3

4/7

2.3 Computer graphics

ARToolKit is tightly integrated with the OpenGL graphics pipeline which is used for theactual rendering. OpenGL has, put simply, three different spaces between which transfor-mations are done. An object that is to be rendered to the screen first has its coordinatesdefined in its own model space. In order to place this object into a scene, the worldtrans-formation is applied, and thus, the coordinates are now in world space. Finally, the objectis transformed into view space which is defined by a camera model. These transforma-tion operates on points in three-dimensions given in homogeneous coordinates, and arethus matrices of size 44. These transformations can be combined to one single transfor-mation by multiplying the matrices together, which is often referred to as the model-viewtransform.

The results of the detection and computer vision algorithms described in the previoussection can be used to set up these matrices in order for us to render graphics which

appear in the captured video frame.The rendering of a frame with ARToolKit normally starts with grabbing a frame fromthe video capture device and rendering it to a frame buffer. The previously describedalgorithms are then applied to the image in order to detect a pattern. If no marker isdetected, the frame buffer is displayed to the screen and the rendering is complete. Ifa marker is detected however, the model-view transformation matrix is computed andpassed down to the OpenGL pipeline. Next, using the standard OpenGL draw commands

whatever geometry that is desired can be rendered to the frame buffer. When the render-ing is complete, the frame buffer is displayed to the screen and the next video frame canbe grabbed from the camera.

3 Demo application

In order to test and analyze the ARToolKit a simple demonstration application was im-plemented. This application renders a four-vertex polygon, i.e. a quad, textured with animage, e.g. a photo. Additionally, in order for it to appear more realistically in the videoframe, a few adjustments are made. The demo applications uses OpenGL shaders to ap-ply these adjustments in an efficient manner, and the adjustments are described in thefollowing sections. A screen capture of the application is displayed in figure 2.

3.1 White balance adjustment

In most cases, the white balance of the captured image and the rendered image does notmatch. In an attempt to overcome this discrepancy a simple method of manual whitebalance calibration was implemented. The user of the program can manually using the

mouse select a color w= (Rw,Gw,Bw) from the captured video frame, which is then usedas the white point. In this case the colors are 8-bit RGB values, i.e. each color componentare in the range [0,255].

In order to apply the white balance adjustments a pixels color (R,G,B) is scaled intothe resulting color (R,G,B) with the transformation

R

G

B

=

255/Rw 0 0

0 255/Gw 0

0 0 255/Bw

R

G

B

.

This adjustment will make the rendered image get the same tint as the background videoframe.

4

8/2/2019 artoolkit3

5/7

Figure 2: The demo application in action.

3.2 Anti-aliasing

The discrete nature of a computer screen will lead to jagged edges (aliasing) when theobjects are drawn to it, causing a disturbing transition from the background to the ren-dered object. This is a common problem in computer graphics that has to be dealt with

if decent image quality is desired. There are many solutions to this problem, and one issupported natively by OpenGL and by recent graphics hardware. This method is based onmultisampling and requires the objects to be rendered in the correct order to work. Thereare also other methods of anti-aliasing. For instance, it is possible to, in a post-processingstep, use edge-detection algorithms to find edges and after that remove the jagged edges.

Due to the way ARToolKit renders the video feed by default, a way to incorporate thenative multisample anti-aliasing as described above was not found. However, a very sim-ple anti-aliasing filter based on alpha blending was applied so that the edge of the ren-dered photo better blends with the background. This method simply make the renderedimage slightly transparent in the edges. The method is not in any way good, and willonly work for rectangle shaped objects. For the purpose of this project, it will do the joband slightly improve the rendering quality. The results of the anti-aliasing is displayed in

figure 3.

4 Results

Augmented reality is a concept with many potential uses in many different areas. Themethod utilized by ARToolKit, by using markers, is a simple and easy to grasp way ofachieving nice effects and interactivity. However, there are many drawbacks to this method.For one thing, the pattern must be positioned so that all of it is visible in the video frame.If even the slightest part of it is covered or creased, for just a few frames, the detection

will fail. There is of course a possibility using additional algorithms to approximate thepattern location and orientation, but it is not supported by ARToolKit. Also the observa-

5

8/2/2019 artoolkit3

6/7

Figure 3: Aliasing between background image and rendered image is evident in the left figure. On

the right, results of an attempt to remove these artifacts.

tion angle of a pattern is of course limited to the hemisphere above it. The image qualityproduced by the video capture device, along with lighting conditions is yet another factorthat needs to be taken into account.

More recent research in the area have revealed new and more involved methods ofaugmented realism. One such method is Parallel Tracking and Mapping (PTAM)whichneed no markers or precomputed maps [5], and therefore offers more flexibility.

The ARToolKit is a quite dated and poorly documented piece of software. For makinga simple demo application it does the job, but in order to do more advanced rendering

a more powerful library is needed. In fact, even during the writing of the simple demoapplication in this project, its limitations was inhibiting. If one wish to get involved in theunderlying algorithms, digging around in the source code is pretty much the only option.But then, on the other hand, there is a production grade version of the library supposedlybetter supported and more stable.

It is possible to apply many other techniques to improve the appearance of the finalimage than the ones experimented with during this project. However, since the library israther limiting when it comes to accessing more modern features of the OpenGL pipeline.One idea for further improvement is to try to approximate the noise that is present inthe video frame, and then apply that to the rendered image as well. The white balancecalibration could also be done automatically by using a known white region in the videoframe instead of manual selection of a white point.

Another big issue that should be addressed for further improvements is the jittery ap-pearance of the rendered image. This is caused by approximation errors that will be dif-ferent from one frame to the next. Very often, this will cause a big enough difference inthe computations so that the position of the rendered object is changed, even though thecamera is stationary. One possible solution for this would be to use previous computa-tions and try to interpolate between them to get smoother movement.

Bibliography

[1] HIT Lab. ARToolKit Home Page. [online] Available at: http://www.hitl.washington.edu/artoolkit/ [Accessed 30 November 2011]

6

8/2/2019 artoolkit3

7/7

[2] nyatla.jp. FrontPage.en - NyARToolKit. [online] Available at: http://nyatla.jp/

nyartoolkit/wiki/index.php?FrontPage.en [Accessed 30 November 2011][3] HIT Lab. ARToolKit Documentation (Computer Vision Algorithm). [online] Available

at: http://www.hitl.washington.edu/artoolkit/documentation/vision.htm [Ac-cessed 30 November 2011]

[4] Forsyth, D.A. and Ponce, J, 2003. Computer Vision, A Modern Approach. Upper SaddleRiver, NJ: Pearson Education.

[5] Klein, G. Parallel Tracking and Mapping for Small AR Workspaces (PTAM). Availableat: http://www.robots.ox.ac.uk/~gk/PTAM/ [Accessed 30 November 2011]

7

Documents

artoolkit3