06.11.Fast Software Image Stabilization With Color Registration

8/14/2019 06.11.Fast Software Image Stabilization With Color Registration

1/6

proceedingsof (he 1998 IEEElRSJIntl. Conference on IntelligentRobotsand Systems

Victoria, B.C., Ctlnada October 1998

Fast Software Image Stabilization with Color RegistrationCarlos Guestrin Fabio Cozman Eric Krotkov

{guest rin,fgcozman,epk}@cs.cmu.eduRobotics Institute, Carnegie Mellon University

AbstractWe present the formulat ion and implem entat ion of

an image stabilization system capable of stabilizingvideo with very large displacements between frames.A coarse-to-fine technique is applied in resolution andin model spaces. Th e registration algorithm uses phasecorrelation to obtain a n initial estim ate for translationbetween images; then Levenberg-Marquardt method fornon-linear optimization is applied to refine the solu-tion . Registration is perfor med in color space, usinga subset of the pixels selected by a gradient-based sub-sampling criteria. Th is software impleme ntation runsat 5H z on non-dedicated hardware (Silicon GraphicsRI0000 workstaition).

1 IntroductionConsider a camera moving in the world; for exam-

ple, a person using a hand-held camera or a cam-era mounted on a car. The images obtained are of-ten shaky, particularly when the camera moves fast,When there is st significant amount of motion, it ishard to keep track of objects in the video sequence.Getting disoriented is easy when the camera movesquickly. When comparing two consecutive frames of avideo sequence, t


2/6

Morimoto and Chellappa developed a system th atperforms motion estimation by tracking a set of fea-ture points [9]. The system of Hansen et al. fits aglobal motion model to the optical flow [5]. Theseimplementations achieved higher frame-rates (between10 and 28Hz) using specialized hardware. These ap-proaches use tracking or flow to stabilize video. O urapproach stabilizes the video by directly estimatinga global transformation by applying an image regis-tration algorithm, This result could then be used fortracking or flow estimation. We believe our approachis less prone to local effects. Our approach achievesa lower frame-rate, but does not use specialized hard-ware. The performance of other approaches will prob-ably be up to one or two orders of magnitude lowerif used on non-dedicated hardware, while our systemwould have performance comparable to existing sys-tems if implemented on dedicated hardware.

Irani et al. formulate a method for determining 3Dego-motion from the 2D motion obtained from sta-bilization [7]. Burt and Anandan demonstrate regis-tration of a frame in the video sequence to a mosaic,rather than to the previous frame [2].

The registration algorithm presented in this paperis also applicable to mosaicing. A similar methodol-ogy was formulated and used for image mosaicing bySzeliski [12]. Work on image registration is extensive,Brown presents a survey of this field [l].

3 Registration AlgorithmsIn our system, image registration is used to esti-mate canera motion between consecutive frames of

the video sequence. The effects of camera motionin the sequence are modeled by the camera motionmodel. In the literature, there are many models forcamera motion [7, 9, 121. The most common modelsare: translation, rigid, affine and projective [4].In this Section, we present two methods used inour image stabilization system: phase correlation andminimization of image difference.3.1 Phase Correlation

An estimate of image translation can be obtainedby using the phase difference of the Fourier transform.This algorithm was proposed by Kuglin and Hines [SI.We will introduce this algorithm by analyzin thecontinuous one dimensional case. Given f ( t ) ancfg(t)such that g ( t ) = f ( t -.a), that is, g is a translatedversion of f, heir Fourier transforms are different byan exponential term on the translation, a:

F[g(t )] e-""JF[J(t )]

Therefore, the phases (@ ) are different by a linearterm in a , magnitude is invariant to translation:@tg(t)l- @tJ(t)l --aw .

The desired translation can be obtained by calcu-W(t) l = Qtg(t)l-@![f(t)l= -a" Ilating the inverse Fourier transform of F[d( t ) ] ,here:l lW(~) l I l 1 .

The inverse is a delta function translated by a:F[d( t ) ] e-aW' + d ( t ) = 6 ( t - a ) .

Therefore, d ( t ) will have value zero everywhere ex-cept at a, which is the desired translation.This same reasoning can be extended to continuous2-D functions. Consider, g(x, ) = f ( x - a,y - ) , inthis case, the function d ( z ,y) is defined by:

Qtd(%Y)l = @ h [ g ( ~ I Y ) I- @[f (zc ,y ) l= - ( a w l + bw2) >1 1 ~ [ 4 s , Y ) l l l = 1 .Therefore:

d ( ~ , y ) 6 ( ~a,y - ) .An analogous methodology can be used to deter-

mine the translation between discrete 2D functions,for example images. In the case of images and othersensor data, it is usually the case tha t g is not a per-fect translated copy of f . In this case, the matchingfunction d(x, ) will not have a single non-zero value.The estimate for the translation is obtained by deter-mining the maximum value of d( 2, ) .

An estimation of translation could also be obtainedusing an iterative method. These methods yield lo-cal estimates, thus, are prone to fall into local min-ima. This effect is accentuated when displacementsare large. The advantage of this method over iter-ative optimization methods is that the estimate ob-tained with the phase correlation method is global,thus, overcoming he limitation of local methods. Thephase correlation algorithm is capable of dealing withtranslations of up to 50% of the image size.3.2 Minimising Image Difference

The registration problem can also be modelled asthe minimization of the intensity difference betweenthe images [12]. It is common to formulate the errorfunction as the sum of the square intensity difference:

E 2 = C [ I t + i ( ~ l ~ ) - ~ ~ ( ~ , ~ ) ] 2 ; w h e r eU , W ) = ~ [ ( ~ , Y ) ] . (1)",Y

7 maps the coordinate system of image t to thato f t + 1. This transformation is usually a parametric

20

uthorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.


3/6

model of the effect on the images of the camera mo-tion. Thus, the problem can be defined as finding thevalues of the parameters of 7 that minimize E.

An iterative numerical optimization method is usedto minimize E2 [ la ] . These methods are prone to fallinto local minima, instead of the desired global min-ima. To circumvent this problem, the method canbe performed in a coarse-to-fine fashion, for example,using a Gaussian image pyramid [l l] . Another in-teresting technique is to also perform a coarse-to-fineapproach on model space. [6].3.3 Optimizing in Color Space

When using grayscale images, a blue sky could beincorrectly registered to a brown mountain, if theyhave similar intensities. This kind of mismatch is verydisturbing to us, as viewers and users of the registeredimages, because a blue sky and a brown mountain arevery different; they dont even have the same color!We have worked on the hypothesis that color gives astrong cue to the registration process, the cue neces-sary to minimize this kind of mismatch. Therefore, itwas incorporated in the formulation of the algorithm.Consider an ima, e divided into three bands, red(R),reen (G ) and %lue(a).This formulation incor-porates color information into the error function:

E K ~ x [ R t + i ( u , v )R t ( z , y ) I 2 +K ~ x [ G t + i ( u , v ) G t(z 3y )I2+,Y

,YK I 3 C [ B t t 1 ( u , v ) - Bt(Z,Y)I2 i (2 )

X ? Ywhere ( u , v ) = 7 [ ( z , y ) ] .

The constants K R , G and KB re weights for eachcomponent. Again, the problem is defined as findingthe values of the parameters of 7 tha t minimize E2 .

4 Motion CompensationIn this Section, wle present an overview of some mo-

0 overlays on raw video: an application of thiswork is creating stab le graphical overlays on videoso that they appear to stick to world featuresas the camera moves. For example, buildings canbe labeled in art aerial sequence, or the goal of ateleoperated rover could be indicated.

0 rigid frame: the motion estimate is used to warpall frames to a reference frame. The viewer feelsthat camera motion has been removed.

tion compensation techniques. [4]

0

5

smoothed camera motion: the high-frequencycomponent of the camera motion is removed,eliminating jittering in the video, and is used tostabilize the overlays. The low-frequency compo-nent of the camera motion, such as that caused bya rover turning, should not be altered. A possibleapproach is to remove the low-frequency compo-nent from the motion compensation. [4]

Feature TrackingFeature tracking is used in many computer vision

applications. The motion of the camera and of objectsin the scene are the main causes of feature motion.Image stabilization can be used estimate the cameramotion component.

In most tracking approaches, a window (or tem-plate) around the desired feature is matched using alocal search algorithm. These methods are affectedby local effects (local minima), such as textureless re-gions and local repetitive patterns. A coarse-to-fineapproach is usually applied to minimize this problem,but it will not solve it completely.

For applications in which camera motion is themain source of motion in the video sequence, our ap-proach is able to estimate th is motion. Therefore, itis possible to use image stabilization to track features.This is achieved by applying the transformation [T ]othe original position of the feature, thus, obtaining anew estimate of feature location. [4]

The advantage of using a global method is tha t itis not affected by local effects. This allows us, for ex-ample, to track a region with no texture. However,the motion of the feature may not be perfectly esti-mated with a global motion model. In this case, imagestabilization can be used to obta in an initial estimateof feature motion. This estimate can then be refinedusing a local method.

6 Implementationand ResultsWe implemented an image stabilization system us-

ing the formulation presented previously. This sys-tem was integrated into a Visual Position Estimatordesigned to aid operators of teleoperated rovers [3].This system was able to stabilize a live video streamat 5Hz, using a software-only mplementation on a Sil-icon Graphics RlOOOO workstation. This frame-rate isadequate for a space robotics application, such as thatof the visual position estimation system.

21



4/6

The image stabilization system implemented is ableto stabilize video with large displacements betweenframes, up to 50% of the image size. This is essentialwhen the frame-rate is low.

Video clips and illustrations are available:http://www.s. emu.edu/-guestrin/ImageStabilization/6.1 Overview

The outl ine of the system implement is representedbelow, for each frame, these steps were followed:1. Registration: the parameters of the transfor-

mation 7 are determined in tw o steps:0 phase correlation: first estimate of trans-

lation between images;0 minimization of SSD: using Levenberg-

Marquardt method in a coarse-to-fine ap-proach on the model and on the image space.

2. Motion Compensation: stable overlays arecreated; one of these techniques of motion com-pensation is applied:

0 overlays on raw video;0 rigid frame;0 smoothed camera motion.

3. Feature Tracking: if desired, a local trackingcan be performed around the position of the over-lays to minimize the drift. The transformation isalso used to pre-warp the tracking template.

6.2 RegistrationFigures 2 and 3 illustrate the quality of a typical

result of the algorithm. On inspection of the differenceimages, notice that the difference in the overlappingregion of the stabilized rames is much smaller thanthe difference in the original frames.

The input to our image stabilization system is a live320x240 pixels video stream. Since th e Fast FourierTransform (FFT) requires a window size of 2" pixels,our algorithm used a 256x128 window.6.2.1 Phase CorrelationPerforming FFT on a large image can be computation-ally expensive. To increase performance, the imagesmay be subsampled before applying the phase corre-lation algorithm. Subsampling, however, will have aneffect on the accuracy of the algorithm. Thus, there

Figure 2: The top two frames were selected from avideo sequence. The bottom image is the square ofthe difference between the two.

Figure 3: The top two frames were extracted from astabilized sequence. The bottom image is the squareof the difference between the two.

is a tradeoff between increased performance and ac-curacy. Through experimentation, we found that, forthis application, subsampling the images eight timesoffered the best performance to accuracy ratio.

Applying the phase correlation algorithm in thismanner yielded a good estimate for the translationbetween images. This estimate w a s generally within 5to 10%of the result obtained with the minimization ofimage difference using the translation motion model.Subsampling the image eight times decreased the timeto apply the phase correlation algorithm to only about3% of the total computation time of the image stabi-lization system. As expected theoretically, the phasecorrelation method was able to detect translations ofup to 50% of the image size (in this implementation,up t o 128 pixels horizontally and 64 vertically).

22

http://www/http://www/


5/6

6.2.2 Minimizing Image DifferenceTo perform this minimization, the Levenberg-Marquardt method for non-linear optimization waschosen [lo]. Using this iterative optimization method,sub-pixel accuracy can be achieved.In our implementa tion, optimization is carried outin a coarse-to-fine alpproach in image space, using aGaussian pyramid. A coarse-to-fine approach is alsoapplied in model space, sta rtin g with a simpler modeland using the result as the initial estimate for the next,more detailed, model. In many cases, the largest mo-tion component in video sequences is caused by trans-lation. It is common to see the camera panning, butnot rotating about its optical axis. For this reason,we first estimate translation between images and thenestimate other motion models, such as the rigid model(rotation+translation) used in our system.

Registration in Color SpaceWe chose to weigh each band equally. Therefore,

the constant KR, K G and KB in equation ( 2 ) wereset to one. The weights for each band could be tunedthrough experiment ation or with a physical analysisof the application.

Although, at first the color formulation seemstwo times more computationally expensive than th egrayscale one, in practice this was not the case. Theformulation color folrmulation was only 20 % slower,because the optimiization converged in fewer itera-tions.

Robustness, on the other hand, increased dramati-cally with the use of color information. For many testsequences, visually noticeable mismatches were muchmore frequent with the grayscale formulation thanwith color. This algorithm was also applied to imagemosaicing. A large number of sequences were testedand the number of visually noticeable mismatches de-creased by about 5 0 %.

Gradient-based SubsamplingIn order to increase performance, we optimize on a

subset of the image pixels. If we chose the pixels in thissubset randomly or in using a regular patte rn, we maylose important features that help the algorithm con-verge to a correct solution. Therefore, we would liketo select a subset of the image pixels using a criteriathat maintains a large proportion of the information.Thus, a gradient-based subsampling was performed.

The image gradient information is used t o deter-mine the partial derivatives and the Hessian of theerror function E2 with respect to the model param-eters. Intuitively, we can think tha t regions of low

intensity variation in the images only make E2 nu-merically larger. The gradient information, however,is necessary to determine how to converge to a solu-tion. Therefore, gradient-based subsampling will de-crease the amount of data used in the optimization,but maintain a large part of the information neededfor convergence.

In our implementation, we only consider a band ofa pixel (each pixel is composed of three bands) if itsgradient is above a threshold. This threshold is dy-namically determined. First, t he percentage of thepixels that will be used in the computation is defined.Then, the threshold gradient value that binds thatpercentage of the pixels is determined. In our imple-mentation, the threshold value is determined by, first,sorting the gradient pixels. The desired threshold isloo-binding percentagein the sorted vector. For our application, we foundthat using 8% of the pixels yielded the best perfor-mance to accuracy ratio.

the pixel with index npixelJ 100

6.3 Motion Compensation

Figures 4 and 5 illustrate the results of image sta-bilization on a video sequence. Th e position of theflag was estimated using the global motion estimationcalculated by the registration algorithm. Experimentswere also performed in smoothing camera motion andmotion coding, obtaining successful results (see [4]).

During our tests, we compared the visual result ofsmoothed camera motion to tha t of jus t using the rawvideo. We found th at , for low frame-rates, the im-provement is minimal. It seems that in the low frame-rat e case the high-frequency mot ion we would like toremove has not been captured due to the lower ac-quisition frequency. Therefore, smooth ing the cameramotion is only necessary when the frame-rate is high.

6.4 Feature Tracking

A feature tracker, using t he global motion estima-tion as an initial estimate for feature motion, was im-plemented as described in Section 5. First, the globalmotion calculated in the stabilization step presentedpreviously is used to estimate the position of the fea-ture in the current frame. Then , a local tracking algo-rithm is applied to refine this estimate. [4] Using thisfeature tracking improved, visually, the accuracy ofthe positioning of overlays over long video sequences.

23



6/6

Figure 4: The rigid frame technique of motion con;

I I I I

)ensation is applied to stabilize a video sequence.

Figure 5 : The motion compensation technique overlays on raw video is applied to stabilize the flag.

7 ConclusionIn this paper, we have presented the formulationand implementation of an image stabilization system.

The formulation was applied to feature tracking andcreation of stable graphical overlays. A software-onlyimplementation stabilizes a live video stream at 5Hz.

The image registration algorithm used phase corre-lation to obta in an initial estimate for translation be-tween images. The Levenberg-Marquardt method wasthen used to minimize the image difference in a secondstep. This minimization w as performed in color spaceon a subset of image pixels selected using a gradient-based subsampling criteria. The system is able to dealwith large displacements between the images (up to128 pixels in the current implementation).

The phase correlation method yields a global mea-sure that helps overcome the limitations of algorithmsthat tend to fall into local minima. The translationestimate was robust and also fast, using only 3% ofthe total time and being within 5 to 10 % of the resultof the iterative method.

Using color in the regis tration process improved therobustness of the algorithm in the order 50%. Regis-tration in color space was 20% slower than grayscale.It would be useful to address questions such as whenand why the algorithm fails. A challenging issue is toformulate and test physically founded ideas to tunethe weights of each band in the color registration pro-cess.

Usual subsampling techniques will often affect ro-bustness. In our experiments, using gradient-basedsubsampling had little effect on robustness and greatlyincreased the performance.For feature tracking, it would be very useful to com-pare the results of the approach presented here withthose of other methods. It is also important to im-

prove the local tracker, possibly merging templates,instead of acquiring new Ones on every frame.

References[ l] L. Brown. A survey of image registration techniques. A CM

Computang Surveys, 24(4):325-376, December 1992.[ Z ] P. Burt and P. Anandan . Image stabilization by registra-

tion to a reference mosaic. DARPA Image UnderstandingWorkshop, november 1994.

[3]F. Cozman and E. Krotkov. Automatic mountain detectionand pose estimation for teleoperation of lunar rovers. Int.Conference on Robotics and Automation, 1997.

[4] C. Guestrin, F. Cozman, and E. Krotkov. Image stabi-lization for featur e tracking and gen eration of sta ble videooverlays. Technical Report CMU-RI-TR-97-42, RoboticsInst itut e, Carnegie Mellon University, November 1997.

[5] M. Hansen, P. Anandan, K. Dana, G. Van der Wal, andP. Burt . Real-time scene atabilization and mosaic construc-tion. DARPA Image Understanding Workshop, November1994.

[6] M. Irani, B. Rousso, an d S. Peleg. Computi ng occludingand transparent motions. International Journal of Com-puter Visaon, 12(1):5-16, January 94 .

[7] M. Irani, B. Rousso, and S. Peleg. Recovery of ego-motionusing image stabilization. In International Conjerence onComputer Vision an d Pat te rn Recognition, pages 454-460,March 94 .

[SI C. Kuglin and D. Hines. Th e phase correlation image align-ment method. IEEE Conference on Cybernetics and Soci-ety, September 1975.

[9] C. H. Morimoto and R. Chellappa. Automatic digital im-age stabilization. IEEE International Conference on Pat-tern Recognition, August 1996.

[lo] W . Press, W . Vetterling, and S. Teukolsky snd B . Flannery.Numerical Recipes in C. Cambridge Press, 1992.

[ l l ] A. Rosenfeld. Multireso lution Image Processing a nd Anal-ysis. Springer-Verlag, 1984.

[12] R. Szeliski. Image mosaicing for tele-reality applications.Technical Report CRL 94/2, Digital Equipment Corpora-tion, Cambridge Research Lab, May 1994.

24

Documents

06.11.Fast Software Image Stabilization With Color Registration