16
Real-Time Camera Calibration for Virtual Studio I n this paper, we present an overall algorithm for real-time camera parameter extraction, which is one of the key elements in implementing virtual studio, and we also present a new method for calculating the lens distortion parameter in real time. In a virtual studio, the motion of a virtual camera generating a graphic studio must follow the motion of the real camera in order to generate a realistic video product. This requires the calculation of camera parameters in real-time by analyzing the positions of feature points in the input video. Towards this goal, we first design a special calibration pattern utilizing the concept of cross-ratio, which makes it easy to extract and identify feature points, so that we can calculate the camera parameters from the visible portion of the pattern in real-time. It is important to consider the lens distortion when zoom lenses are used because it causes nonnegligible errors in the computation of the camera parameters. However, the Tsai algorithm, adopted for camera calibration, calculates the lens distortion through nonlinear optimization in triple parameter space, which is inappropriate for our real-time system. Thus, we propose a new linear method by calculating the lens distortion parameter independently, which can be computed fast enough for our real-time application. We implement the whole algorithm using a Pentium PC and Matrox Genesis boards with five processing nodes in order to obtain the processing rate of 30 frames per second, which is the minimum requirement for TV broadcasting. Experimental results show this system can be used practically for realizing a virtual studio. # 2000 Academic Press Seong-Woo Park, Yongduek Seo and Ki-Sang Hong 1 Dept. of E.E., POSTECH, San 31, Hyojadong, Namku, Pohang, Kyungbuk , 790-784, Korea Introduction In the broadcasting area, composing real objects with a graphic background has long been realized using Chromakeying [1]. Chromakeying systems, despite their usability, impose a very strong constraint upon the system: the camera cannot move. This is because the graphic background cannot be changed according to the camera’s movement, due to the difficulties in computing the parameters of the real camera. In recent years, the improvement of computing power and graphics technology has stimulated people to try and decorate the real scene with a graphic virtual background — the virtual studio [3–9]. They mix real scenes from a real camera with virtual scenes generated 1 Corresponding author. E-mail: [email protected], tel. +82-54-279-2216 1077-2014/00/120433+16 $35.00/0 # 2000 Academic Press Real-Time Imaging 6, 433–448 (2000) doi:10.1006/rtim.1999.0199, available online at http://www.idealibrary.com on

2000_Real-Time Camera Calibration for Virtual Studio

  • Upload
    src0108

  • View
    35

  • Download
    2

Embed Size (px)

DESCRIPTION

image processing

Citation preview

Page 1: 2000_Real-Time Camera Calibration for Virtual Studio

Real-Time Imaging 6, 433–448 (2000)doi:10.1006/rtim.1999.0199, available online at http://www.idealibrary.com on

Real-Time Camera Calibrationfor Virtual Studio

In this paper, we present an overall algorithm for real-time camera parameter extraction,which is one of the key elements in implementing virtual studio, and we also present a newmethod for calculating the lens distortion parameter in real time. In a virtual studio, the

motion of a virtual camera generating a graphic studio must follow the motion of the real camerain order to generate a realistic video product. This requires the calculation of camera parametersin real-time by analyzing the positions of feature points in the input video. Towards this goal, wefirst design a special calibration pattern utilizing the concept of cross-ratio, which makes it easy toextract and identify feature points, so that we can calculate the camera parameters from the visibleportion of the pattern in real-time. It is important to consider the lens distortion when zoomlenses are used because it causes nonnegligible errors in the computation of the cameraparameters. However, the Tsai algorithm, adopted for camera calibration, calculates the lensdistortion through nonlinear optimization in triple parameter space, which is inappropriate forour real-time system. Thus, we propose a new linear method by calculating the lens distortionparameter independently, which can be computed fast enough for our real-time application. Weimplement the whole algorithm using a Pentium PC and Matrox Genesis boards with fiveprocessing nodes in order to obtain the processing rate of 30 frames per second, which is theminimum requirement for TV broadcasting. Experimental results show this system can be usedpractically for realizing a virtual studio.

# 2000 Academic Press

Seong-Woo Park, Yongduek Seo andKi-Sang Hong1

Dept. of E.E., POSTECH,San 31, Hyojadong, Namku, Pohang, Kyungbuk , 790-784, Korea

Introduction

In the broadcasting area, composing real objects with agraphic background has long been realized usingChromakeying [1]. Chromakeying systems, despite theirusability, impose a very strong constraint upon the

1Corresponding author. E-mail: [email protected], tel.+82-54-279-2216

1077-2014/00/120433+16 $35.00/0

system: the camera cannot move. This is because thegraphic background cannot be changed according to thecamera’s movement, due to the difficulties in computingthe parameters of the real camera.

In recent years, the improvement of computing powerand graphics technology has stimulated people to tryand decorate the real scene with a graphic virtualbackground — the virtual studio [3–9]. They mix realscenes from a real camera with virtual scenes generated

# 2000 Academic Press

Page 2: 2000_Real-Time Camera Calibration for Virtual Studio

434 S.-W. PARK ET AL.

from a graphics machine, for which the extrinsic motionparameters — rotation and translation — of the realcamera and intrinsic parameters like focal length arenecessary to make the virtual views follow exactly thesame motion as the real camera. Thus, a cameratracking system is needed to get the intrinsic andextrinsic parameters of the real camera. Especially, thecamera tracking systems used in the virtual studio canbe divided into two main categories: electromechanicalsystems and optical systems [10]. Electromechanicalsystems were developed first and are still morecommonly used. However, even though they can yieldhighly accurate parameters from the usage of electro-mechanical sensors, they have several disadvantages:

. Calibration — accurate registration requires intimateknowledge of lens characteristics. For example, amathematical function of zoom and focal length mustbe given to determine the horizontal field of view forthe camera lens [10].

. Alignment — the initial state of the camera must beaccurately aligned.

. Number of systems — every camera needs its ownsystem.

. Operability of a camera — operators cannotarbitrarily move the camera freely in three-dimensional space. In order to do that, a hugemechanical setup is required.

The optical system — the alternative to electrome-chanical tracking — is based on pattern recognition.Despite the fact that this method can overcome theconstraints of electromechanical tracking, it is notwidely used because of the difficulties in achieving

Figure 1. Total image flow: (a) Input image, (b) extracted patternfrom an input image, and (e) resulting composed image.

camera parameters in real time with the accuracy neededfor the virtual studio. In this paper, we address ouroptical camera tracking system.

Figure 1 shows the overall image flow of our methodfrom the input image (a) to the resultant composedimage (e). In this flow, the input image (a) is passed tothe camera tracking system where the camera para-meters are calculated from the extracted special patternwe designed (see (b)). Then, using this information, thegraphics background is rendered (see (c)). Meanwhile,the input video is delayed to compensate for the delay ofthe camera tracking system, and then it is composed intothe graphics background. Figure 1(d) shows the objectsin the foreground scene.

The first thing we have to do to get cameraparameters in real time is to obtain the 2D–3D patternmatches in order to measure the camera parametersfrom the pattern in the image. Thus, we design a planarpattern so that the feature extraction, image-modelmatching and identification are fast and reliable for thereal time implementation. The pattern is designed tohave a series of the cross-ratio, which is invariant under2D projective transformation [11,12], to automaticallyidentify the feature points wherever the pattern is seenand even when only a part of it appears. In this paper wewill introduce how to design the pattern by applying theconcept of cross-ratio and how to identify the patternautomatically.

After this process of identifying the image andpattern, camera calibration is performed using theidentified feature points. The calibration algorithm we

we designed, (c) graphics background, (d) real objects extracted

Page 3: 2000_Real-Time Camera Calibration for Virtual Studio

REAL-TIMECAMERACALIBRATION 435

adopted is based on Tsai’s method, in which cameraparameters are calculated through nonlinear optimiza-tion [13]. Although exact camera parameters can beobtained by this method, it requires a large number ofcalculations due to the optimization and thus, it is notappropriate for a real-time implementation. To reducethe computational burden, the lens distortion parametermust be calculated independently so that the otherparameters involved in the nonlinear optimization maybe obtained with a linear method. Accordingly, we needa method for the calculation of the lens distortion.Camera calibration techniques considering the lensdistortion have long been studied [13–22] utilized wasthe known motion of the camera [14,18] or the featurecorrespondences of a few images [17,19,20]. Recently,the direct image mosaic method has been applied insteadof obtaining correspondences [22]. According to thespatial distribution of the features, calibration techni-ques may be classified into two categories: planar[13,16,21] and nonplanar [15,17,20]. Basically, themethods of Willson [16] and Batista [21] are groundedon the calibration technique of Tsai [13]. However, inorder to compute the lens distortion parameters, theyrely on the iterative optimization methods with respectto all the camera parameters, which are not appropriatefor a real-time system. In this paper, we propose two

Figure 2. Total flowchart for generating virtual studio.

practical methods for calculating lens distortion inde-pendently of other calibration parameters, which willgive a linear calibration algorithm for our real-timeimplementation. One method uses a focal length to lensdistortion look-up-table (or LUT) that can be con-structed in the initialization process. The other methodfinds lens distortion in real time without any initializa-tion process using the relationship between featurepoints in an image. The performance of our methodsfor computing lens distortion parameters is thenevaluated through the comparison of our results withthe results of Tsai’s optimization method.

Figure 2 depicts the overall flow for generating virtualstudio. The processes of the camera tracking system willbe explained in the following sections. In the nextsection we will review Tsai’s camera calibration modeland explain why lens distortion has to be calculatedindependently of other parameters, and also describe thecoordinate transformation between various coordinatesinvolved for generating a graphics studio. The followingsections explain the procedure for making the pattern byapplying cross-ratio, give our real-time algorithm forautomatically finding feature points in an image andidentifying them, present our new methods for calculat-ing lens distortion, explain temporal filtering and

Page 4: 2000_Real-Time Camera Calibration for Virtual Studio

436 S.-W. PARK ET AL.

describe our system from the implementation point ofview. Experimental results, including calculation of lensdistortion, and conclusions are then presented.

Camera Calibration Model

Tsai’s calibration model

The calibration model of this paper is based on Tsai’smodel for a set of coplanar points [13]. Figure 3illustrates Tsai’s camera model.

The transformation from the 3D world coordinate(xw, yw, zw) to frame coordinate (Xf, Yf) consists of thefollowing four steps.

Step 1: Rigid body transformation from the objectworld coordinate system (xw, yw, zw) to the camera 3Dcoordinate system (x, y, z):

xyz

24 35 ¼ Rxwywzw

24 35þ T , ð1Þ

where R is a 363 rotation matrix about the worldcoordinate axes (xw, yw, zw),

R ¼ RotðRxÞRotðRyÞRotðRzÞ ¼r1 r2 r3r4 r5 r6r7 r8 r9

24 35, ð2Þ

Figure 3. Tsai’s camera model.

and T is a translation vector,

T ¼Tx

Ty

Tz

24 35: ð3Þ

Step 2: Projection of the 3D camera coordinate to theundistorted image coordinate (Xu, Yu):

Xu ¼ fx

z,

Yu ¼ fy

z, ð4Þ

where f is the effective focal length.

Step 3: Calculating the distorted image coordinate (Xd,Yd) with a lens distortion coefficient, k1:

Xdð1þ �1r2Þ ¼ Xu,

Ydð1þ �1r2Þ ¼ Yu, ð5Þ

r2 ¼ X2d þ Y2

d : ð6Þ

Step 4: Transformation from the distorted imagecoordinate (Xd, Yd) to the frame coordinate (Xf, Yf):

Xf ¼ Xd � sÿ1x þ Cx,

Yf ¼ Yd � sÿ1y þ Cy, ð7Þ

where the scale factor sx, sy, and image center (Cx, Cy)are presumed to be known. In the later experiment weuse the center of expansion [24] as a constant imagecenter which can be found in the initialization process.

In Step 3, we considered only the first radial distortioncoefficient, k1, that is the most significant factor andneglected the higher order terms.

Tsai proposed a two-stage calibration method. Inthe first stage, he calculated the extrinsic cameraparameters [Tx, Ty, Rx, Ry, Rz] using RAC (RadialAlignment Constraint) representing the relationship ofOiPu==OiPd . That is, if we presume that the lens hasonly radial distortion, the direction of a distorted pointis the same as the direction of an undistorted point. Inthe next stage, by minimizing the error function with f,Tz, and k1 as unknowns, using a standard optimizationscheme such as steepest descent, the optimum f, Tz, andk1 are found. The error function for the optimization isdefined as

error ¼X½ðXu ÿ X

0

uÞ2 þ ðYu ÿ Y

0

uÞ2�, ð8Þ

Page 5: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 4. Coordinate transformation between the graphicscoordinate and the camera coordinate.

REAL-TIMECAMERACALIBRATION 437

where (Xu, Yu) is calculated from (Xf, Yf) usingequations (5)–(7) and ðX 0

u;Y0uÞ is a projected point from

the world coordinate to the image using the alreadycalculated parameters [Tx; Ty; Rx; Ry; Rz], and f, Tz asvariables. This opitmization process requires a largenumber of computations. If, in this stage, one cancalculate k1 independently of other camera parameters,the undistorted image coordinate (Xu, Yu) can becalculated from equations (5)–(7). Then, the linearequation involving f and Tz can be derived fromequations (1)–(4) as

yi ÿ Yui½ � fTz

� �¼ wiYui, ð9Þ

where

yi ¼ r4xwi þ r5ywi þ r6zwi þ Ty, ð10Þ

wi ¼ r7xwi þ r8ywi þ r9zwi ð11ÞSince rotation matrix R, and translations Tx and Ty

have all been determined by this point, yi and wi arefixed so that f and Tz can be lineary calculated from Eqn(9).

Transformation between Tsai’s camera model and thegraphics coordinate system

Figure 4 shows the coordinates involved in implement-ing virtual studio. Rotation matrix R, and translationmatrix T, calculated from Tsai’s camera model, are theorientation and position of the camera in the worldcoordinate (xw, yw, zw). In order to generate virtualstudio, the graphics coordinate (Xg, Yg, Zg) should bedefined as illustrated in Figure 4, in which graphicsobjects and a virtual graphics camera will be placed.Now we have to convert the camera parametersobtained from Tsai’s calibration, represented in theworld coordinate, to those represented in the graphicscoordinate in order for the virtual camera to generate avirtual background of graphic objects represented in thegraphics coordinate. The transformation between theworld coordinate and the camera coordiante is describedas

Xc

Yc

Zc

24 35 ¼ R1

xwywzw

24 35þ T1, ð12Þ

where the rotation matrix R1 and the translation T1

represent the orientation and position of the camera inthe world coordinate (xw, yw, zw). Let (R2, T2) be atransformation between the world coordinate and the

graphics coordinate, then the transformation betweenthe two coordinates becomes

Xg

Yg

Zg

24 35 ¼ R2

xwywzw

24 35þ T2, ð13Þ

Once the origin of the graphics coordinate is fixed inthe real studio, R2 and T2 can be given by measurements.Particularly, if the x-axes of the two coordinates Xg andxw are made parallel, R2 can simply be expressed as

R2 ¼1 0 00 cos � sin �0 ÿ sin � cos �

24 35, ð14Þ

where � denotes a slant angle of the pattern against thewall. From the above two transformations, Eqns. (12)and (13), we can derive the transformation between thegraphics coordinate (Xg, Yg, Zg) and the cameracoordinate (Xc, Yc, Zc):

Xc

Yc

Zc

24 35 ¼ R1Rÿ12

Xg

Yg

Zg

24 35ÿ R1Rÿ12 T2 þ T1: ð15Þ

From this transformation we can find the orientationand position of the camera in the graphics coordinate.The graphics generator will use R¼R1R2

71 as areference orientation and T¼7R1R2

71T2þT1 as areference position of the camera.

Pattern Design

Camera calibration requires a special pattern consistingof feature points whose locations in the world

Page 6: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 5. Cross-ratio of a pencil of lines ¼ ðx2ÿx1Þðx4ÿx3Þðx4ÿx2Þðx3ÿx1Þ¼ ðy2ÿy1Þðy4ÿy3Þðy4ÿy2Þðy3ÿy1Þ.

Figure 6. A part of the designed pattern.

438 S.-W. PARK ET AL.

coordinate system are known a priori. Since the cameramoves around in a studio with changing zoom level, thepattern should be made so that feature points on it canbe identified easily wherever it is seen from and evenwhen just a portion of it is visible. In order for thefeature points in the image to be identifiable, we applythe concept of planar projective geometric invariance—cross-ratio. In this section, we explain how cross-ratio isapplied to the pattern design and in the next section weexplain how the pattern can be identified in an imageusing this invariant.

Cross-ratio is defined for four points on a line as

Cðx1, x2, x3, x4Þ ¼ðx2 ÿ x1Þðx4 ÿ x3Þðx4 ÿ x2Þðx3 ÿ x1Þ

: ð16Þ

Figure 5 shows four lines meeting at a point (Pencil ofLines) and two lines intersecting them. In this geometry,the cross-ratio of the four lines is constant for the linesthat intersect them, e.g. cross-ratio Cl1 for the intersec-tions of line l1(x1, x2, x3, x4) and cross-ratio Cl2 for thoseof line l2 (y1, y2, y3, y4) are the same as

Cl1 ¼ Cl2 ¼ðx2 ÿ x1Þðx4 ÿ x3Þðx4 ÿ x2Þðx3 ÿ x1Þ

¼ ðy2 ÿ y1Þðy4 ÿ y3Þðy4 ÿ y2Þðy3 ÿ y1Þ

:

ð17Þ

Note that when the four lines meet at infinity (idealpoint) 7 (in this case the four lines become parallel),they also have the same property . When these lines areprojected through a pin-hole camera with any orienta-

tion and zoom, the cross-ratios obtained from theimage exhibit the same values as those in the real world.Figure 6 shows a part of the pattern designed. In orderto extract the lines easily, the pattern is made in a gridshape. In the figure the grid is black and white, but in thereal pattern, the white and black parts are actually lightblue and dark blue, respectively, for chromakeying [23].Figure 8(a) shows an image of our designed pattern usedin the real implementation. Note that the vertical linesand horizontal lines have consecutive cross-ratios, thatis to say, the vertical lines have cross-ratio C1 of x-coordinates (x1, x2, x3, x4) of the first four lines, and thesecond cross-ratio C2 is defined for the next four lines(x2, x3, x4, x5) and so on. Consequently, NV verticallines give (NV73) consecutive cross-ratios and NH

horizontal lines give (NH73) consecutive cross-ratios.Ideally, the cross-ratio of Eqn (16) has a value from 0 to1. However, the extreme values are practically notfeasible due to the actual distance between consecutivelines. For example, the cross ratio will become 1 when x4? ? with the other values fixed, and it will become 0when x4 ? x3 or x2 ? x1. Hence, we limited the range ofcross-ratios in the practical implementation. From theexperiments with our images, whose feature points hadabout a half-pixel accuracy, the standard deviation ofthe cross-ratios was found to be around 0.03. Thismeans that we have at most nine independent cross-ratios or equivalently twelve lines, when the actual rangeof the cross-ratios is limited to 0.2 7 0.7. However, asmall number of cross-ratios (lines) will confine theactual range of the camera work, and, therefore, wecompare more than two consecutive cross-ratios, utiliz-ing combinations of the cross-ratios in the identification

Page 7: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 7. Cross-ratios for the designed pattern: (a) Vertical lines (NV¼40) and (b) Horizontal lines (NH¼20).

REAL-TIMECAMERACALIBRATION 439

step. The two plots of Figure 7 show the values of thecross-ratio of the designed pattern in practice. The realpattern had 20 horizontal lines and 40 vertical lines andthus the plots have 37 and 17 consectutive cross-ratios,respectively.

Pattern Identification

In order to calibrate a camera, first we must find thefeature points in an image and identify them so as tofind their correspondence to feature points in the objectspace (the world coordinate). In real-time applications,these feature points need to be extracted and identifiedautomatically. In the initial identification process, wefirst extract and identify vertical and horizontal lines ofthe pattern by comparing their cross-ratios, and then wecompute the intersections of the lines. Theoretically withthis method, we can identify feature points in everyframe automatically, but several situations cause pro-blems in the real experiments.

Figure 8. Situations that make identification process difficult: (acamera is maximally zoomed-in.

(1) When the camera is maximally zoomed-out: thegaps of adjacent lines become too small todistinguish adjacent lines. In our experiment itbecame less than five pixels for the lens we used(Figure 8(a)).

(2) When the camera is maximally zoomed-in: in thiscase there are not enough lines in the image. Toapply cross-ratio, at least five consecutive lines (inthe case of comparing two consecutive cross-ratios)have to be visible in the image. But as the camera iszoomed-in, the number of lines becomes less thanfive (Figure 8(b)).

(3) When the pattern is occluded: line identificationbecomes harder when the lines are broken or hiddenby an object, because calculation of cross-ratiosrequires consecutive lines of the pattern. If one linein the middle of the pattern is hidden by an object,we cannot calculate cross-ratios containing the line.

To overcome these problems, we first find intersec-tions of the pattern in the initial identification processand then track them thereafter.

) When the camera is maximally zoomed-out and (b) When the

Page 8: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 9. (a) An image ¼-subsampled in the y-direction. (b) Positions of the local maxima of the x-directional convolution of (a).

Figure 10. The effect of lens distortion: due to the distortion,line l3, in the object space is shown as curve l2, and fitted toline l1, located at a position different from line l3.

440 S.-W. PARK ET AL.

Initial identification process

In the initial identification process, we set the cameralens in the middle level of its complete zoom range tofind the initial positions of the intersections in an image.Identification of the intersections is accomplished asfollows:

Extracting the pattern in an imageWhen we made the pattern, it was painted with twocolors, light blue and dark blue, so that we can extractthe pattern from an image using the blue color as a keyfor chromakeying [23]. For extracting the blue coloronly, we first convert RGB to YUV and then pick up achrominance region corresponding to the blue color.

Gradient filteringTo find the edge of the grid, a first-order Derivative ofGaussian (DoG) filter with a kernel of h ¼ [71, 77,715, 0, 15, 7, 1] is used. One might use an other type ofkernel for the gradient operation, but in our case thisfilter showed a good performance. We apply the filter toevery fourth line in an image to reduce the computationtime. x-directional convolution of an image with thefilter has local maxima at vertical edges, and y-directional convolution has local maxima at horizontaledges. Figure 9(a) shows an image 1/4-subsampled inthe y-direction, and (b) shows positions of the localmaxima of the x-directional convolution of (a). Detectedlocal maxima in the x and y directions are thenconnected and fitted to lines, as can be seen in Figure10(b).

Line fittingFor an ideal pin-hole lens having no distortion, a line inthe pattern projects to a line in an image. However, if alens has a distortion the line will not look like a line, butinstead a curve. Hence, the cross-ratio will change fromimage to image. Note that the cross-ratio is notpreserved for the frame coordinate (Xf, Yf)-positionsof the feature points in an image — or for the distortedimage coordinate (Xd, Yd). Cross-ratio is invariant onlyfor the undistorted coordinate (Xu, Yu). Figure 11explains this situation. In this figure a and b denoteundistorted points, and a’ and b’ their distorted points.If there is no lens distortion, a line connecting them willappear as line l3, but the line in an image would have ashape like l2 and be fitted to line l1. As shown in thefigure, l1 is located at a position different from l3. This

Page 9: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 11. (a) Pattern extracted from an input image and (b) the vertical and horizontal lines extracted.

REAL-TIMECAMERACALIBRATION 441

effect of lens distortion can be an obstacle in identifyingthe lines in the image. However, we can derive the lineequation of the undistorted coordinate (Xu, Yu), i.e., linel3 in Figure 11, when we have the lens distortionparameter k1. Let the line equation of undistorted points(Xu, Yu) be

Yu ¼ aXu þ b: ð18Þ

Then we can derive a distorted line equation (actually itis a curve) using Tsai’s camera model. Using therelationship between (Xu, Yu) and (Xd, Yd), the lineequation for the distorted points can be written as aquadratic form of Xd and Yd involving k1:

Yd ¼ aXd þ bð1þ �21Þÿ1

¼ aXd þ bð1þ �1ðX2d þ Y2

d ÞÞÿ1,

ð19Þ

where the distorted point (Xd, Yd) can be calculatedfrom the frame coordinate (Xf, Yf) using Eqn (7). Afterfitting the curve in the image using Eqn (19), we cancalculate the line equation for the undistorted points ofEqn (18) by setting k1¼0. By comparing cross-ratios ofthe undistorted lines with the known cross-ratios of thepattern, we can identify the curves in the image andcompute the feature points by calculating intersectionsof the vertical lines and the horizontal lines for thedistorted points of Eqn (19).

Feature point tracking

In the initial identification process, after the resultingcamera parameters become stabilized for several con-tinuous frames, the identification method is changed totracking the intersection points. In implementing virtual

studio, camera parameters are calculated at the rate of30 frames/s so that the differences between the positionsof two continuous frames is very small. Consequently,we can track the feature points by referring the previouspositions of the points. In this tracking process, we firstfind intersections of the pattern in the neighborhood ofthe previous positions using the intersection-filter H thatwe designed:

H ¼

ÿ1 0 0 0 10 ÿ1 0 1 00 0 0 0 00 1 0 ÿ1 01 0 0 0 ÿ1

266664377775: ð20Þ

The result of the 2D convolution of the intersection-filter with an image shows the local maxima or minimaat intersections of the pattern. Figure 12 shows the resultobtained for a part of an image.

When the camera is maximally zoomed-out, the gapbetween two neighboring intersections becomes so closethat the intersections are likely to be misidentified. Toreduce the possibility of misidentification we separatethe intersections into two classes by the sign of theresulting value of filtering. When we convolve an imagewith the intersection-filter H, two neighboring intersec-tions have a different sign from each other, intersection(1) in Figure 12, for example, has a local minimum(having a minus sign) and (2) has a local maximum(having a plus sign). By classifying the intersections intotwo classes we can prevent the feature points from beingmisidentified as its neighbor.

Page 10: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 12. Result of the convolution of the intersection-filter, H.

442 S.-W. PARK ET AL.

To recognize new points that were not shown in theprevious frame, we project points of the object spaceinto an image using the camera parameters calculatedfrom the previous frame, and then use the projectedpoints as reference points for the current frame.

Real-time Camera Parameter Extraction

Once the feature points at intersections are obtained,camera parameters can be calculated using Tsai’salgorithm as explained previously. As discussed in thatsection, if lens distortion is obtained independently,calibration can be achieved using a linear method, whichwill speed up the calibration a great deal, enabling thereal-time camera parameter calculation required invirtual studio. Therefore, in this section, after discussingthe method of calculating the image center, we presentnew practical methods for calculating lens distortionindependently.

Determining image center by zooming

The image center has to be found in advance because itis needed to calculate the lens distortion, as will beexplained later. We will use the center of expansion as aconstant image-center. This image-center is found in theinitialization process. To find the center of expansion,the camera is operated from its maximum zoom-out endto maximum zoom-in end, storing the feature pointswhich we find and identify in images. The center ofexpansion can be calculated as the common intersectionof the line segments connecting corresponding featurepoints found in the images. For zoom lenses, the image-centers vary as the camera zooms because the zoomingoperation is executed by a composite combination ofseveral lenses. However, when we examined the locationof the image-center, its standard deviation was about 2

pixels; thus we ignored the effect of the image-centerchange.

Calculating lens distortion coefficient

In this section, we explain two practical methods forcalculating lens distortion independently of other para-meters. One method uses a look-up table (LUT), whichis constructed in the initialization process, relating focallength to lens distortion. The other uses the invariancebetween feature points in an image without anyinitialization process. The invariance we adopted is thecollinearity which represents the property where lines inthe pattern should be shown as lines in an image if thereis no lens distortion.

LUT-based methodLens distortion is not an important factor for constant-parameter lenses, which have a constant distortioncoefficient. But zoom lenses are zoomed by a compli-cated combination of several lenses so that the effectivefocal length, f, and distortion coefficient, k1, vary duringzooming operations. In this section, to calculate lensdistortion, we construct a look-up table (LUT) using therelationship between f and k1 and then refer to this LUTduring real-time operations.

In the initialization process, we operate the camerafrom its maximum zoom-out end to maximum zoom-inend, storing the feature points which we find andidentify in images. To make the f/k1 LUT we use thenonlinear optimization method proposed by Tsai [13].We can find the optimum f/k1 LUT because thenonlinear optimization process is executed off-line.When using the coplanar pattern with small depthvariation, it turns out that focal length f and z-translation Tz cannot be separated exactly and reliablyeven with small noise. This can be easily understood

Page 11: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 13. N vertical lines (dotted lines) and two lines (solidlines) separating a line.

REAL-TIMECAMERACALIBRATION 443

when considering that we cannot tell zooming from z-directional motion by watching input video even withhuman eyes. So here we use an alternative that uses Tz/fas an index presuming a camera does not move in the z-direction. In the real-time process, camera parametersare calculated at the rate of 30 frames/s so that thechange of camera parameters between two adjacentframes is very small. Consequently we can use the Tz/f ofthe previous frame as an index for the current frame,and through iterative references we can refine them, i.e.we can refer k1 again using the index Tz/f that wascalculated from equation (9) using the k1 referred fromthe Tz/f of the previous frame.

Calculating lens distortion using collinearityThe basic idea of finding lens distortion using collinear-ity is a kind of searching for k1 which maximallypreserves collinearity. In other words, the value of k1 wewant to find is the one that makes distorted featurepoints on a line the most collinear if we are correctingthem into undistorted points. Collinearity represents aproperty where the line in the world coordinate is alsoshown as a line in the image. This property is notpreserved when the lens has a distortion.

To find the lens distortion for a certain frame, we firsttake an initial lens distortion coefficient k1 andcompensate frame coordinate (Xf, Yf) using the distor-tion coefficient in order to calculate the undistortedimage coordinate (Xu, Yu). Then we find the optimum k1that meets the collinearity.

Practically, we first identify feature points found in animage, which are located on a number of straight lines inthe pattern used for camera calibration. Then for eachline (here we assume lines are horizontal), we find threepoints (the center point and two end points) on the sameline like the small rectangles in Figure 13. The dottedlines of Figure 13 represent lines of a pattern we foundin the image. Due to the lens distortion these lines donot look straight. We break a line (a dotted line) intotwo lines (solid lines) and calculate the slope of each line.The error function is defined by the difference of theslopes between the two lines:

Eð�1Þ ¼1

N

XNn¼1

Yun1 ÿ Yun2

Xun1 ÿ Xun2ÿ Yun2 ÿ Yun3

Xun2 ÿ Xun3

���� ����2, ð21Þ

where N is the number of straight lines used, andundistorted points (Xuni, Yuni; i ¼ 1,2,3) are calculatedfrom the distorted image points (Xdni, Ydni) that are

calculated from the feature point position in image (Xfni,Yfni) using equations (5)–(7). For vertical lines, the errorfunction we use is

Eð�1Þ ¼1

N

XNn¼1

Xun1 ÿ Xun2

Yun1 ÿ Yun2ÿ Xun2 ÿ Xun3

Yun2 ÿ Yun3

���� ����2: ð22Þ

Then the lens distortion coefficient minimizing theerror function is what we want to obtain, i.e.

�1 ¼ min�1

Eð�1Þ: ð23Þ

Though this method calculates lens distortion throughnonlinear optimization, the number of calculations isvery small since it is an optimization in a 1D parameterspace.

Furthermore, when we use the k1 of the previousframe as an initial value of this optimization, we canminimize the number of iterations. Once the lensdistortion is calculated, we can execute camera calibra-tion using linear methods.

Temporal Filtering

Detected locations of feature points have noisy compo-nents even when the camera is stationary, thus resultingin noisy camera parameters. To reduce the effect of thenoise, an averaging filter is applied to the parameterscalculated, which is simple to implement and has theadvantage of having constant delay. There is a tradeoffin determining the filter size N since frame delayincreases as it increases while reducing noise more. Soit is better to choose the smallest N that reduces noise upto the point where the resulting graphic studio withstationary camera does not give the feeling of trembling.Of course, achieving a moderate value of N requires

Page 12: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 14. (a) Examples of noise reduction and (b) the time delay for filters with various lengths.

Table 1. Processing time in each processor

Processor Work Processing time

DSPprocessor

Extracting pattern 7.69 ms/frame

Image subsampling 5.11 ms/frameGradient filtering 22.03 ms/frameFinding lines andintersections

75.00 ms/frame

Transfer results to PC 3.00 ms/framePC CPU Line identification

(in initial process)11.00 ms/frame

Identification intersections(in real operation)

12.00 ms/frame

Camera calibration 3.2 ms (300 points)Predicting next positionsof intersections

4.00 ms/frame

444 S.-W. PARK ET AL.

high accuracy of feature point locations. Throughexperiments, our system turned out to have a half-pixelaccuracy and N is determined to be five. Figure 14 showsan example of noise reduction and time delay for filterswith various lengths.

Real-time Implementation

In order to make a realistic virtual studio, graphics mustbe generated at a rate of 30 frames/s, requiringcomputation of camera parameters at the same rate.Coping with the tremendous computation required, weuse a Pentium-200 PC together with three DSP boards(Genesis board of Matrox Co.), one main board withone TMS320C80 processor and one grab module, andtwo processor boards with two TMS320C80 processorsin each. Consequently, a total of five processing nodesperform independent work in parallel, synchronized bythe host PC. A TMS320C80 processor contains onemaster processor (MP), four parallel processors (PP)and a transfer controller (TC). The MP has a floating-point unit (FPU) and controls the behavior of the PP’s.The TC is in charge of transferring data betweenexternal memory and the on-chip memory of eachTMS320C80. Inside each TMS320C80 processor, inorder to process an image, the MP divides the imageinto four sections and makes each PP process one of thesections. Each PP can work independently of the otherPPs. Images grabbed by the grab module attached to themain board are broadcast through a grab channel andthe five processing nodes can grab images into theirmemory simultaneously. Data transfer between nodescan be done through the VM Channel or PCI bus. Thehost PC also transfer data to nodes through the PCIbus.

Five processing nodes work in parallel, taking framesin turn, processing them, and sending the results to thehost. Therefore, to process 30 frames/s, each node mustcomplete its work within 165 ms (33 ms65) and the hostPC within 33 ms. Processing nodes perform tasks fromextracting pattern features to finding lines and intersec-tions, while the PC takes charge in all the tasksthereafter. The border line between the tasks of theprocessing nodes and the PC is whether a task requiresfloating point operations or not because the FPU in theC80 processor is too slow to do those kinds ofoperations. Table 1 lists tasks performed by theprocessing nodes and PC and the processing time theyexpend.

Though each node ideally needs to process each framein 165 ms, actually, however, in order not to lose grab-timing they should complete their tasks in less than 160ms. Also, the PC, in order to exactly synchronize each

Page 13: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 15. Parallel processing of five nodes with a PC.

REAL-TIMECAMERACALIBRATION 445

processor, must complete its own tasks in less than30 ms.

The timing diagram in Figure 15 shows how the fivenodes and the PC process frames in parallel. Each timeinterval in this table is 33 ms (frame rate is 30 frames/s).As can be seen in the diagram, each processing nodegrabs and processes a frame in turn. Using the capabilitythat the grabbing and processing can be executedindependently in each node, we use a double-bufferingtechnique, which allocates two different buffers andmakes alternate use of them for grabbing and proces-sing. Therefore, at time t5, node 1 can grab the nextframe fg5 while still processing frame fg0.

This parallel processing of frames inevitably causestime-delay. The five processors and PC working inparallel cause a time delay of six frames. Temporalfiltering with a filter length of five adds two anotherframe delay so that the total delay becomes eight frames.In actual operation, to compensate for this time delay,real objects separated by chromakeying should bedelayed by eight frames before being composed withgraphics.

The YUV color coordinate is more efficient than theRGB color coordinate in extracting the blue patternfrom an image. Color conversion from RGB to YUVtakes about 40 ms by one DSP processor and thusrequires two additional processors. These days, newdigital cameras can produce YUV format outputeliminating such a process. In Table 1 the color

conversion time is omitted assuming that the YUVcolor format is supported.

Experiments on Calculating Lens Distortion

Here we present experimental results on the proposedmethods that calculate lens distortion. In this experi-ment, we use the center of expansion as a constantimage-center, which can be found in the initializationprocess. The image-center should be calculated for everylens because the image-center is a specific feature oflenses. The variation of the image-center during zoom-ing is less than two pixels in both x and y directions,which seems to have practically negligible effects on thelater calculations.

Lens distortion coefficient k1 varies according to thezooming operation, and the focal length uniquelyrepresents the zooming level so that there is a uniquerelationship between the focal length and the lensdistortion coefficient. Using this basic observation wemake a LUT of the focal length f and the lens distortion.In real-time processes, we presume that the differencebetween the camera parameters of two continuousframes is not large. Therefore, we can find the lensdistortion coefficient by referring to the LUT using theprevious f as an index of the current frame.

In the initialization process we use Tsai’s nonlinearoptimization method to calculate the focal length andlens distortion LUT. But in the case where a patternwith a set of coplanar points with small depth variation

Page 14: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 16. f and Tz calculated by Tsai’s nonlinear optimiza-tion for a camera with monotonously zooming.

446 S.-W. PARK ET AL.

is used, it is very difficult to reliably separate focal lengthand z-directional translation Tz because they are stronglycoupled to each other. Actually, in that case, it is noteasy to tell zooming from z-directional camera motioneven by the human eye. Figure 16 shows this phenom-enon. In this figure, f and Tz seem somewhat noisy, whiletheir ratio, Tz/f, does not, as can be seen in Figure 17.

Figure 17. Tz / f and k1 calculated with (a) the three-variable( f, Tz, k1) optimization method, (b) the Tz /f 7k1 LUT-basedmethod and (c) the collinearity method. Horizontal axisrepresents frame number.

Therefore, in this experiment, assuming a camera withfixed Tz we make a LUT using the ratio of Tz/f, insteadof f, as an index of the LUT. Figure 17(a) shows the Tz/f and k1 calculated in the initialization process usingTsai’s optimization method. We use 300 frames for 10 s(30 frames/s). As described above, we can presume thatthe difference in Tz/f between two continuous frames isso small that we can find k1 of the current frame bylooking up the Tz/f 7 k1 LUT using the previous Tz/f asan index of the current frame. As described in a previoussection, for more accurate lens distortion you can iteratethis look-up. Figure 17(b) shows the results of k1 and Tz/f calculated without an iterative look-up.

In this paper we define the undistorted projectionerror (UDPE) as a measure of the accuracy of thecalibration. The UDPE is defined by the error betweenundistorted points, which are projected from the worldcoordinate using the camera parameters calculated andthe points calculated from the frame coordinate. Toobtain the UDPE, we first calculate the undistortedmodel point (Xum, Yum) that is calculated from the worldcoordinate, and another undistorted frame point (Xuf,Yuf) from the frame coordinate (Xf, Yf) using equations(5)–(7), that is,

UDPE ¼ 1

N

XNn¼1ðð�XuÞ2 þ ð�YuÞ2Þ1=2, ð24Þ

where

�Xu ¼ ðXum ÿ Xuf Þ � sÿ1x ,

�Yu ¼ ðYum ÿ Yuf Þ � sÿ1y : ð25Þ

Figures 18(a) and (b) show the UDPE’s of the threevariable (f, Tz, k1) optimization of Tsai’s calibrationmodel and the referring LUT-based method, respec-tively. Comparing the results, these two methods shownearly the same UDPE’s. The UPDE increases as theframe number increases, because the number of featurepoints decreases as the camera zooms in.

Figure 17(c) shows the result of the experiment of thecollinearity method for the continuous 300 frames.When the previous lens distortion coefficient k1 is usedas an initial value of the optimization process, we canfind lens distortion in less than 20 iterations. Using 20lines to find the lens distortion, we can complete theoptimization process in less than 1.2 ms with a Pentium200 MHz, while the three-variable optimization of the

Page 15: 2000_Real-Time Camera Calibration for Virtual Studio

Figure 18. UDPE of (a) the three-variable (f, Tz, k1)optimization method, (b) the Tz/f 7k1 LUT-based methodand (c) the collinearity method.

REAL-TIMECAMERACALIBRATION 447

Tsai’s model takes longer than 30 ms. Therefore it canbe favorably applied to the real-time process.

Figure 18(c) also shows the UDPE of the collinearitymethod. The feature points we found have an accuracyof about a half-pixel.

Judging from the resulting UDPE, the two methods,LUT method and collinearity method, are nearlycomparable with Tsai’s nonlinear optimization method.The higher UDPE can cause greater variance of cameraparameters and consequently, the graphic backgroundtrembles more severely. In real operation, jittering of thegraphic background (we used the LUT method in realoperation) is not noticeable if we use a temporal filteringwith a length of five.

Conclusion

In this paper, we introduced a real-time cameraparameter extraction system developed for virtualstudio. In order to make a very realistic graphics studiogenerated from a graphics machine, camera parametersneeded to be calculated very accurately in real-time, i.e.30 frames/s. Toward this goal, we first designed a specialpattern which makes it easy to identify feature points

using the concept of cross-ratio. For the identification offeature points on the pattern, we applied two modes ofoperation. In the initialization mode, every feature pointin an image was detected and identified first. Then in thetracking mode, feature points were tracked by searchingin the neighborhood of every point of the previousframe. Newly appearing feature points were tracked byprojecting the pattern using camera parameters calcu-lated for the previous frame.

As for calculation of camera parameters, we applied amodified Tsai’s algorithm in which all the parameterscan be calculated by a linear method by obtaining thelens distortion independently of other camera para-meters, while the original one involves a nonlinearoptimization, which usually requires a large amount ofcomputation. In this paper, we have proposed twomethods for obtaining lens distortion separately. Onewas based on a look-up table (LUT) and the other onthe collinearity of feature points. Experiments showedaround a 30 factor improvement in computation time sothat a rather inexpensive computer like a Pentium-200PC without any DSP board can perform the computa-tion.

All the algorithms were implementated with a PC andPC-based DSP boards. Five processing nodes in threeDSP boards worked in parallel, synchronized by thehost PC, computing camera parameters at the rate of 30frames/s.

This system has been tested in connection with a real-time graphics machine (SGI Onyx-2). We could not findany noticeable trembling in the composite videogenerated from a stationary camera even when it wasmaximally zoomed-in, which means that the calculatedcamera parameters were sufficiently accurate.

References

1. Grob, B. & Herndon, C.E. (1998) Basic Television andVideo Systems (5th edition), Glencoe: McGraw Hill.

2. Gibbs, S. & Baudisch, P. (1996) Interaction in the VirtualStudio. Computer Graphics, 29–32.

3. Blond, L. et al. (1996) A Virtual Studio for LiveBroadcasting: The Mona Lisa Project. IEEE Multimedia,pp. 18–29.

4. Hayashi, M. (1998) Image Compositing Based on VirtualCameras. IEEE Multimedia, pp. 36–48.

5. Wojdala, A. (1998) Challenges of Virtual Set Technology.IEEE Multimedia, pp. 50–57.

6. CYBERSET, Orad, http://www.orad.co.il

Page 16: 2000_Real-Time Camera Calibration for Virtual Studio

448 S.-W. PARK ET AL.

7. 3DK: The Virtual Studio, GMD, http://viswiz.gmd.de/DML/vst/vst.html

8. ELSET: Accom, http://www.studio.sgi.com/Features/Vir-tualSets/accom.html

9. E & S Mindset, Evans & Sutherland, http://www.es.com./Products/DStudio/index.html

10. Gibbs, S. et. al. (1998) Virtual Studios: The State of theArt, Eurographics’96 State of the Art Reports.

11. Semple, J.G. & Kneebone, G.T. (1952) Algebraic Projec-tive Geometry. Oxford University Press.

12. Stolfi, J. (1991) Oriented Projective Geometry: A Frame-work for Geometric Computations. Academic Press.

13. Tsai, R.Y. (1987) A Versatile Camera CalibrationTechnique for High Accuracy 3-D Maching VisionMetrology Using Off-the-shelf TV Cameras and Lenses.IEEE Journal of Robotics & Automation, 3: pp. 323–344.

14. Basu, A. (1990) Active Calibration: Alternative Strategyand Analysis. Proc. IEEE Conf. Computer Vision andPattern Recognition (CVPR), pp. 127–140.

15. Faugeras, O. (1993) Three-Dimensional Computer VisionA Geometric Viewpoint, MIT Press.

16. Willson, R.G. (1994) Modeling and Calibration ofAutomated Zoom Lenses, Ph.D. thesis, CMU-RI-TR-94-03, CMU.

17. Devernay, F. & Faugeras, O. (1995) Automatic Calibra-tion and removal of distortion from scenes of structured

environment. Proc. SPIE conference, San Diego, CA, July1995.

18. Stein, G.P. (1995) Accurate Internal Camera Calibrationusing Rotation with Analysis of Sources of Error. Proc.5th Int’l Conf. Computer Vision (ICCV), Boston, MA,June 1995.

19. Stein, G.P. (1997) Lens Distortion Calibration UsingPoint Correspondences. Proc. Int’l Conf. Computer Visionand Pattern Recognition (CVPR), pp. 602–608.

20. Zhang, Z. (1995) On the Eipiplar Geometry BetweenTwo Images With Lens Distortion. Proc. Int’l Conf.Pattern Recognition (ICPR), Vienna, Aug. 1996, pp.407–411.

21. Batista, J. Araujo, H. & de Almeida, A.T. (1998) IterativeMulti-Step Explicit Camera Calibration. Proc. IEEE Int.Conf. Computer Vision, pp. 709–714.

22. Sawhney, H.S. & Kumar, R. (1999) True Multi-ImageAlignment and Its Application to Mosaicing and LensDistortion Correction. IEEE Trans. Pattern Analysis andMachine Intelligence 21: 325–243.

23. Jack, K. (1996) Video Demystified, 2nd Edition, A Hand-book for the Digital Engineer, High text Interactive.

24. Willson, R.G. & Shafer, S.A. (1993) What is the Centerof the Image. Proceedings of IEEE Conference onComputer Vision and Pattern Recognition, pp. 670–671.