5
VIDEO STABILIZATION WITH L1-L2 OPTIMIZATION Hui Qu, Li Song Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University ABSTRACT Digital videos often suffer from undesirable camera jitter- s because of unstable camera motions. In this paper we present a novel video stabilization algorithm by mixed L1-L2 optimization, aiming at removing unwanted camera move- ments as well as keeping the original video information to the greatest extent. In the proposed algorithm, we compute smoothed camera paths that are composed of constant, linear and parabolic segments by L1 constraints, meanwhile using the L2 norm of the difference between smoothed and original camera paths to retain the video information. Different from other existing methods, there is only one parameter to control effects of two terms, which is both flexible and easy to meet different requirements in practice. We further design an effi- cient moving window scheme to support online processing or unlimited length video. Experimental results demonstrate the good performance of our proposed algorithm. Index TermsVideo stabilization, online processing, mixed L1-L2 optimization 1. INTRODUCTION As the development of hand-held camera devices, people can obtain videos easily from their cell phones or digital cam- eras. Compared to film cameras, cell phones are significant- ly lighter, resulting in low quality videos with jitters. The same problem occurs when the camera is mounted on a vehi- cle with unstable motion, such as the unmanned aerial ve- hicle (UAV). Video stabilization is applied to alleviate the handshaking problem in order to improve the visual quality of these videos or be a preprocess of some other procedures, such as object tracking and object detection, to increase pre- cision and robustness. Most video stabilization algorithms consist of three main steps [1, 2, 3, 4, 5, 6, 7]: (1) Original camera path estima- tion, (2) Smooth camera path computation, and (3) Synthe- sizing the stabilized video. Different methods differ in the three steps. Video stabilization is achieved by first estimating the o- riginal camera path. One can employ feature tracking and 2D linear motion models to compute the 2D camera path [1, 2, 3], or use Structure from Motion (SfM) like Liu et al.[5] to esti- mate the 3D camera path. The 2D method is computationally efficient while the 3D camera path is more accurate at the ex- pense of computation complexity. Other methods like block matching [7] are also useful in certain situations. Smooth camera path estimation removes high-frequency jitters and computes the global transformation necessary to stabilize the current frame. Grundmann et al.[2] used L1- smoothness constraint based on cinematography principles to obtain optimal camera path, which can lead to good stabiliza- tion results, but discard much original video information. Liu et al.[3] introduced a technique that imposes subspace con- straints on feature trajectories. They factor a feature matrix into a product of a camera matrix and a scene matrix and then smooth the scene matrix. The factorization will be not accu- rate if there are not enough long trajectories. The final step is synthesizing the stabilized video using the transformations obtained in smooth camera path estima- tion. Many methods like [2, 3] just keep central parts of the original frames to achieve better visual quality. However, fur- ther post-processing, e.g. impainting in [4], can be applied to obtain full frames. Among many 2D stabilization methods, Grundmann’s L1 Camera Path Optimization method [2] proposed in 2011 can be treated as state-of-the-art and has been integrated in- to Google’s YouTube Editor. Our proposed optimization is related to L1 optimization, which minimizes the the first, second and third derivatives of the resulting camera path with some linear constraints. However, our algorithm is more general as we optimize both L1 norm of smooth path and L2 norm of the difference between smoothed and original cam- era paths. Actually, Lee et al. had similar motivation [6], and they optimized both feature matches and motion similarity of neighboring features when estimating camera path. However, they used both L2 norm for two terms and involved many empirical parameters when solving the optimization problem, which make it hard to implement and adapt to different video contents. In contrast, our mixed L1-L2 optimization have only one adjusting parameter and can be efficiently solved by convex optimization tools developed in recent years. Motivated by the above works [2, 6], we propose a mixed L1-L2 optimization for 2D video stabilization, which can not only achieve good stabilized results, but also retain as much as information of original video as possible. By adjust only one parameter, users can control the degree of stabilization and fidelity of original videos as needed. Furthermore, we 29 978-1-4799-2341-0/13/$31.00 ©2013 IEEE ICIP 2013

VIDEO STABILIZATION WITH L1-L2 OPTIMIZATIONmedialab.sjtu.edu.cn/publications/2013/2013_ICIP_QH.pdf · 2013. 12. 3. · Index Terms Video stabilization, online processing, mixed L1-L2

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VIDEO STABILIZATION WITH L1-L2 OPTIMIZATIONmedialab.sjtu.edu.cn/publications/2013/2013_ICIP_QH.pdf · 2013. 12. 3. · Index Terms Video stabilization, online processing, mixed L1-L2

VIDEO STABILIZATION WITH L1-L2 OPTIMIZATION

Hui Qu, Li Song

Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University

ABSTRACT

Digital videos often suffer from undesirable camera jitter-s because of unstable camera motions. In this paper wepresent a novel video stabilization algorithm by mixed L1-L2optimization, aiming at removing unwanted camera move-ments as well as keeping the original video information tothe greatest extent. In the proposed algorithm, we computesmoothed camera paths that are composed of constant, linearand parabolic segments by L1 constraints, meanwhile usingthe L2 norm of the difference between smoothed and originalcamera paths to retain the video information. Different fromother existing methods, there is only one parameter to controleffects of two terms, which is both flexible and easy to meetdifferent requirements in practice. We further design an effi-cient moving window scheme to support online processing orunlimited length video. Experimental results demonstrate thegood performance of our proposed algorithm.

Index Terms— Video stabilization, online processing,mixed L1-L2 optimization

1. INTRODUCTION

As the development of hand-held camera devices, people canobtain videos easily from their cell phones or digital cam-eras. Compared to film cameras, cell phones are significant-ly lighter, resulting in low quality videos with jitters. Thesame problem occurs when the camera is mounted on a vehi-cle with unstable motion, such as the unmanned aerial ve-hicle (UAV). Video stabilization is applied to alleviate thehandshaking problem in order to improve the visual qualityof these videos or be a preprocess of some other procedures,such as object tracking and object detection, to increase pre-cision and robustness.

Most video stabilization algorithms consist of three mainsteps [1, 2, 3, 4, 5, 6, 7]: (1) Original camera path estima-tion, (2) Smooth camera path computation, and (3) Synthe-sizing the stabilized video. Different methods differ in thethree steps.

Video stabilization is achieved by first estimating the o-riginal camera path. One can employ feature tracking and 2Dlinear motion models to compute the 2D camera path [1, 2, 3],or use Structure from Motion (SfM) like Liu et al.[5] to esti-mate the 3D camera path. The 2D method is computationally

efficient while the 3D camera path is more accurate at the ex-pense of computation complexity. Other methods like blockmatching [7] are also useful in certain situations.

Smooth camera path estimation removes high-frequencyjitters and computes the global transformation necessary tostabilize the current frame. Grundmann et al.[2] used L1-smoothness constraint based on cinematography principles toobtain optimal camera path, which can lead to good stabiliza-tion results, but discard much original video information. Liuet al.[3] introduced a technique that imposes subspace con-straints on feature trajectories. They factor a feature matrixinto a product of a camera matrix and a scene matrix and thensmooth the scene matrix. The factorization will be not accu-rate if there are not enough long trajectories.

The final step is synthesizing the stabilized video usingthe transformations obtained in smooth camera path estima-tion. Many methods like [2, 3] just keep central parts of theoriginal frames to achieve better visual quality. However, fur-ther post-processing, e.g. impainting in [4], can be applied toobtain full frames.

Among many 2D stabilization methods, Grundmann’sL1 Camera Path Optimization method [2] proposed in 2011can be treated as state-of-the-art and has been integrated in-to Google’s YouTube Editor. Our proposed optimization isrelated to L1 optimization, which minimizes the the first,second and third derivatives of the resulting camera path withsome linear constraints. However, our algorithm is moregeneral as we optimize both L1 norm of smooth path and L2norm of the difference between smoothed and original cam-era paths. Actually, Lee et al. had similar motivation [6], andthey optimized both feature matches and motion similarity ofneighboring features when estimating camera path. However,they used both L2 norm for two terms and involved manyempirical parameters when solving the optimization problem,which make it hard to implement and adapt to different videocontents. In contrast, our mixed L1-L2 optimization haveonly one adjusting parameter and can be efficiently solved byconvex optimization tools developed in recent years.

Motivated by the above works [2, 6], we propose a mixedL1-L2 optimization for 2D video stabilization, which can notonly achieve good stabilized results, but also retain as muchas information of original video as possible. By adjust onlyone parameter, users can control the degree of stabilizationand fidelity of original videos as needed. Furthermore, we

29978-1-4799-2341-0/13/$31.00 ©2013 IEEE ICIP 2013

Page 2: VIDEO STABILIZATION WITH L1-L2 OPTIMIZATIONmedialab.sjtu.edu.cn/publications/2013/2013_ICIP_QH.pdf · 2013. 12. 3. · Index Terms Video stabilization, online processing, mixed L1-L2

design an efficient moving window scheme to support onlineprocessing for unlimited length shaky videos.

The rest of the paper is organised as follows. We introduceGrundmann’s work briefly in section II, and then present anew mixed L1-L2 model for video stabilization in section IIIand some key issues are discussed in section IV. Experimentsare shown in section V to demonstrate the performance of ouralgorithm. And conclusion comes in section VI.

2. PRIOR L1 OPTIMIZATION FRAMEWORK FORVIDEO STABILIZATION

In [2], Grundmann et al. used feature tracking and 2D lin-ear motion model fitting to compute the original camera path.The frames of the video are denoted by I1, I2, · · · , In, andthe motion of features from It to It−1 is modeled by a 2Dmotion model Ft, which is similarity or affine. Then the orig-inal camera path Ct is defined as

Ct+1 = CtFt+1 ⇒ Ct = F1F2 · · ·Ft (1)

With Ct, they expressed the desired optimal path Pt as

Pt = CtBt (2)

where Bt is the update transform that stabilizes the corre-sponding frame. Grundmann et al. assumed that the optimalpath is only composed of three kinds of segments: a constantpath representing a static camera, a path of constant velocityrepresenting a panning or a dolly shot and a path of constantacceleration representing ease in and out transition betweenstatic and panning cameras. Therefore, the objective functionof their L1 optimization problem is

O(P ) = ω1∥D(P )∥1 + ω2

∥∥D2(P )∥∥1+ ω3

∥∥D3(P )∥∥1

(3)

where ω1, ω2, ω3 are empirical weights and D means deriva-tive. The relative values of ω1, ω2, ω3 are crucial to the s-moothed camera path and should be carefully set. They alsoadded inclusion constraint to preserve the intent of the video.

Finally, Grundmann et al. transformed the original framesby Bt and retain the content within a crop window, thus thestabilized video has no blank areas but discard some infor-mation on the boundary of the original video. Besides, theyperformed residual motion suppression to reduce rolling shut-ter effects.

Their algorithm is effective for a variety of videos. How-ever, it discards information due to cropping, which may benot suitable for videos with important information near theboundary. What’s more, the three parameters in equation 3are empirically set and hard to be adaptable to different kindsof videos.

3. VIDEO STABILIZATION WITH MIXED L1-L2OPTIMIZATION

L1 optimization has the property of sparsity, making the com-puted optimal path has derivatives which are exactly zero formost segments; while L2 optimization is to achieve the bestestimation in a least square sense, e.g. fit a line by samplepoints with errors. In order to keep the boundary informa-tion of original videos as much as possible while performingvideo stabilization, we expect that the optimal smooth camerapath is close to the original path, which can be realized by theintroduction of a L2 term in the objective function:

O(P ) = L1(P ) + λ∥P − C∥2 (4)

where L1(P ) is the L1 term similar to that in equation 3(ω1 = ω2 = ω3 = 1):

L1(P ) = ∥D(P )∥1 +∥∥D2(P )

∥∥1+∥∥D3(P )

∥∥1

(5)

and λ is a weight to adjust the smoothness of the path.Similar to [2], the L1 term in objective function consists of

the first, second and third derivatives of optimal path. But un-like [2], we retain Ct in our optimization, e.g. for ∥D(P )∥1,it can be decomposed into equation 6. For each frame, Ct isdifferent, and it is unreasonable to remove Ct for the sum ofdifference in equations 6.

∥D(P )∥1 =n−1∑t=1

|Pt+1 − Pt| =n−1∑t=1

|Ct+1Bt+1 − CtBt|

(6)For the L2 term, it minimizes the difference between o-

riginal camera path and optimal camera path:

∥P − C∥22 =n∑

t=1

(Pt − Ct)2=

n∑t=1

(CtBt − Ct)2 (7)

Compared to the algorithm in [2], we have no differentweights in the L1 term. In fact, the introduction of L2 normcan automatically set the weights of the three kinds of seg-ments according to the shape of original camera path. If theoriginal path is nearly constant (with high frequency jitters)in a period of time, then the weight of ||D(P )||1 in L1 ter-m is much greater than that of two others because the opti-mal path should be close to constant due to L2 norm. Andif a segment of original path is with almost constant velocity,then ||D2(P )||1 dominates the L1 term. The optimal path-s obtained via our algorithm and Grundmann’s algorithm areshown in fig. 1. Note that the optimal path of our algorith-m is smooth without setting weights on three kinds of seg-ments. Besides, our optimal path is closer to the original paththan Grundmann’s path, therefore, we can set the crop win-dow larger to retain more content of original video.

There are many off-the-shelf toolbox to solve such mixedL1-L2 convex optimization problems as discussed in [8].Here we use the freely available CVX solver 1.

1CVX Research: http://cvxr.com/cvx/.

30

Page 3: VIDEO STABILIZATION WITH L1-L2 OPTIMIZATIONmedialab.sjtu.edu.cn/publications/2013/2013_ICIP_QH.pdf · 2013. 12. 3. · Index Terms Video stabilization, online processing, mixed L1-L2

0 20 40 60 80 100 120 140 160 1800

50

100

150

200

250

300Motion in x over frames

original pathoptimal path via algorithm in [2]optimal path via our algorithm

−50 −40 −30 −20 −10 0 10 20 30 400

50

100

150

200

250

300Motion in y over frames

original pathoptimal path via algorithm in [2]optimal path via our algorithm

Fig. 1. Optimal camera path obtained by the algorithm in [2]and by our algorithm for the same video. The crop window is80% size of the original frame. The parameter λ in equation4 is set to 0.5.

4. KEY ISSUES AND SUMMARY OF OURALGORITHM

For a practical video stabilization algorithm, there are severalissues need to be addressed to make the proposed algorithmmore robust and efficient.

4.1. Online Video Stabilization

Many post-processing algorithms can handle short clips ofvideo. However, for long videos, the number of variablesin the optimization problem may result in low efficiency andlarge memory consuming. To make the algorithm more effi-cient, we design an online processing scheme.

Intuitively, long videos can be cut into several segmentsto be stabilized separately. A problem is that the optimalpath may have a shift at the beginning of each segment (seefig.1) and the whole optimal path may be discontinuous atjoint frames. So adjacent segments should have overlaps.

Let N denote the length of each segment or the windowand K is the number of overlap frames. When the stabiliza-tion process begins, we compute the optimal path of first Nframes P

(1)t within the window, and only stabilize the first

N − K frames. Then the window is moved to the next Nframes with K overlapped frames with the previous segment,i.e. from IN−K+1 to I2N−K . The optimal path of N framesP

(2)t within the window is also computed. For the first K

overlap frames, optimal path is obtained by the weighted av-erage of P (1)

t and P(2)t :

Pt = υiP(1)t + (1− υi)P

(2)t (8)

where t = N − K + 1, · · · , N , and υi, i = 1, 2, · · · ,K areweights and their values are related to the frame number t.In this paper we simply set their values to υi = i/K, i =1, 2 · · ·K, with K = 30. Subsequently, the optimal path Pt

and update transformations Bt of first N − K frames in thecurrent window are acquired and used to stabilize the frames.As the window moves forward, the same process proceedsuntil the end of the video. The process is shown in fig.2.

Fig. 2. The moving window process. The red brace meansthe range of the window.

4.2. The choice of parameter λ

We do not set different weights for three terms of L1 in thecost function, so the choice of λ is crucial for the results.When λ is too small, the optimization problem is close tothat in equation 3 with ω1 = ω2 = ω3 = 1, thus the optimalcamera path will be not smooth enough on the transition ofconstant path and path with constant velocity. What’s worse,the shift at the beginning of each segment may be large withsmall λ, so the optimal path of overlap frames computed byequation 8 will be inaccurate (fig.3(a)), causing that the w-hole optimal path seems not smooth as desired and the visualquality of stabilized video is not good, either. If the value ofλ is too large, then the optimization concentrates on the L2part, pulling the optimal path near to the original camera pathand making the optimal path lack of smoothness (fig.3(d)).

0 20 40 60 80 100 120 140 160 1800

50

100

150

200

250

300

original camera pathoptimal camera path

(a) λ = 0.10 20 40 60 80 100 120 140 160 180

0

50

100

150

200

250

300

original camera pathoptimal camera path

(b) λ = 0.5

0 20 40 60 80 100 120 140 160 1800

50

100

150

200

250

300

original camera pathoptimal camera path

(c) λ = 1.00 20 40 60 80 100 120 140 160 180

0

50

100

150

200

250

300

original camera pathoptimal camera path

(d) λ = 2.0

Fig. 3. Optimal camera path obtained by our algorithm withdifferent values of λ.

Actually, λ can be treated as a factor that controls the de-gree of stabilization. For videos which have no important in-formation on the boundary, λ can be relatively small to obtainperfect visual feeling. While for videos which may have keyinformation on the boundary, such as surveillance videos andUAV videos, λ may be set a little larger. Therefore, we can

31

Page 4: VIDEO STABILIZATION WITH L1-L2 OPTIMIZATIONmedialab.sjtu.edu.cn/publications/2013/2013_ICIP_QH.pdf · 2013. 12. 3. · Index Terms Video stabilization, online processing, mixed L1-L2

Fig. 4. Stabilized frames of “sidewalk” video. The first row is our result with a 95% crop window size, and the second row isthat of Grundmann et.al[2] with a 90% crop window size.

Algorithm 1: Video stabilization for each segmentStep 1: Feature selection and tracking, outlier rejectionStep 2: Fit motion model Ft and compute the original

camera path Ct in equation (1)Step 3: Solve mixed L1-L2 optimization problem in equ-

ation (4) to obtain update tranform Bt

Step 4: Reduce rolling shutter effects as [2]Step 5: Stabilize original frames by Bt

not only reduce jitters but also preserve most of the informa-tion, although the stabilized videos have some low frequencyshake. In a word, the users can set the value of λ according totheir needs to obtain stabilized videos as they expect.

4.3. Summary of the proposed algorithm

The proposed algorithm for each segment is summarized inAlgorithm 1.

In step 1, we track features by pyramidal Lucas-Kanade[9] like Grundmann et al, but we perform global outlier rejec-tion by RANSAC. To improve the accuracy of outlier rejec-tion, we set a minimum distance between features to ensurethe distribution of selected features is relatively uniform onthe whole frame. Besides, we re-select features for trackingevery 10 frames to reduce the accumulated error of trackedfeatures. In step 3, the problem has inclusion and proximityconstraints, which are the same as those in [2]. And in step4, homography is used to replace similarity in some framesto suppress rolling shutter effects due to its higher accuracyon modeling inter-frame motions. However, homography isunstable and the replacement should be carefully controlled.We use the similar method as Grundmann et al. in [2].

5. EXPERIMENTS

To evaluate the performance of the proposed algorithm, wehave applied it to stabilize typical shaky videos. We also

compare our method to that of Grundmann et al.[2]. “side-walk” is a surveillance video obtained by a shaky camera.Some frames of stabilized results by our algorithm and byGrundmann’s algorithm are shown in fig.4. λ is set to 0.5, thelengths of the window and overlap frames are 150 and 30 re-spectively. Values of weights in equation 3 are ω1 = 10, ω2 =1, ω3 = 100. We can retain 95% of the original frames sincethe optimal path is close to the original path, while by Grund-mann’s method only 90% contents are preserved.

There is a time recorder at the bottom of the video. Ob-viously, our stabilized frames contain the most part of timeinformation while Grundmann’s results lost this information.As a result, if we want to analysis the video after stabiliza-tion, e.g. figure out when the three people in the center offirst frame walked out of the camera’s view, we cannot obtainuseful information from the stabilized video by Grundmann’smethod. Besides, the visual quality of our stabilized videois nearly the same as that of Grundmann et al. More resultsand comparison are available at the website http://www.youku.com/playlist_show/id_18891274.html.

6. CONCLUSION

We have proposed a novel approach for video stabilization.By introduction of mixed L1-L2 optimization, we can ob-tain stabilized videos as well as preserve as much informa-tion as possible. We further design an efficient moving win-dow scheme to support processing online or unlimited lengthvideo. In contrast to the algorithm of Grundmann et al.[2],our method can be more useful on videos with important in-formation on the boundary.

7. ACKNOWLEDGEMENT

This work was supported by National 863 project(2012AA011703), NSFC (61221001, 60932006), the 111 Project (B07022) and the Shanghai Key Laboratory of Digital Media Pro-cessing and Transmissions.

32

Page 5: VIDEO STABILIZATION WITH L1-L2 OPTIMIZATIONmedialab.sjtu.edu.cn/publications/2013/2013_ICIP_QH.pdf · 2013. 12. 3. · Index Terms Video stabilization, online processing, mixed L1-L2

8. REFERENCES

[1] S. Battiato, G. Gallo, G. Puglisi, and S. Scellato, “Siftfeatures tracking for video stabilization,” in Proc. of In-ternational Conference on Image Analysis and Process-ing (ICIAP), 2007, pp. 825–830.

[2] Grundmann M., Kwatra V, and Essa I, “Auto-directedvideo stabilization with robust l1 optimal camera path-s,” in IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2011, pp. 225–232.

[3] F. Liu, M. Gleicher, J. Wang, H. Jin, and A. Agarwala,“Subspace video stabilization,” In ACM Transactions onGraphics, vol. 30, 2011.

[4] Matsushita Y., Ofek E., Ge W., and etc, “Full-frame videostabilization with motion inpainting,” IEEE Transactionson Pattern Analysis and Machine Intelligence, pp. 1150–1163, 2006.

[5] F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content-preserving warps for 3d video stabilization,” In ACMSIGGRAPH, 2009.

[6] K. Y. Lee, Y. Y. Chuang, B. Y. Chen, and M. Ouhyoung,“Video stabilization using robust feature trajectories,” inProc. IEEE Int. Conf. Computer Vision, 2009, pp. 1397–1404.

[7] S. Battiato, A. R. Bruna, and G. Puglisi, “A robust videostabilization system by adaptive motion vectors filtering,”in Proc. Int. Conf. Multimedia and Expo (ICME), 2008,pp. 373–376.

[8] M. Zibulevsky and M. Elad, “L1-l2 optimization in signaland image processing,” IEEE Sig. Proc. Mag., vol. 27, no.3, pp. 76–88, 2010.

[9] J. Shi and C. Tomasi, “Good features to track,” In IEEECVPR, 1994.

33