Super-resolution Mosaicing for Object based Video Format ...elvera.nue.tu-berlin.de/files/0997Kunter2006.pdf · Fig. 7 Results after super-resolution format conversion for ﬁrst

Super-resolution Mosaicing for Object based Video Format Conversion

Matthias Kunter and Thomas Sikora

Communication Sytems GroupTechnische Universitaet Berlin, 10587 Berlin, Germanykunter, [email protected]

Abstract This paper presents a new approach to spa-tial upsampling of digital video based on super-resolutionmosaics. First, we robustly generate a background mo-saic of higher resolution than the original video. In orderto achieve that goal, we apply hierarchical global imageregistration estimating an optimal parabolic parameterset for each view of a scene shot. The final mosaic isgenerated using statistical and projection grid distancemeasures to avoid the impact of foreground objects andto accomplish super-resolution respectively. Second, ar-bitrarily moving foreground objects are segmented usingMRF-based change detection methods based on the cal-culated mosaic. For the foreground objects an opticalflow field between adjacent frames is computed. Third,we create new views with higher spatial resolution fus-ing re-projected background content from the mosaic to-gether with super-resolution foreground objects obtainedusing optical flow field calculation. Results show thatthis method is able to convert videos into higher spa-tial resolution with very high objective and subjectivequality.

1 Introduction

With the propagation of many different video recording,storage and display devices, video format conversion ap-plications are attracting a great deal of attention. Sinceconsumer devices allow everyone to create visual contentin resolution of low level and medium level quality anddue to limited transmission bandwith and storage ca-pacities, spatial-temporal video upsampling is becomingmore and more important.

Many methods for video format conversion have beenproposed and applied during the last years, especiallymotion-compensation based methods and algorithms ap-plied in the coding domain [5]. We present a super-resolution based approach to convert videos into higher

spatial resolution. Video mosaicing hereby plays an im-portant role summarizing the static background of ascene, recorded with camera motion like zoom, pan, tilt,and rotation. Because of the automatic elimination offreely moving foreground objects, super-resolution mo-saicing is a very effective tool to raise the backgroundresolution of such a scene [3]. Due to its signal enhance-ment characteristic while using all available subsets ofimage samples it has a high performance compared tosimple block motion or region motion based video for-mat conversion methods [5]. A second advantage of videomosaicing is the use for very efficient segmentation offreely moving foreground objects. In order to segmentforeground objects, MRF-based change detection algo-rithms comparing original and mosaic based re-projectedvideo scenes are applied [4]. Thus, a good side effect isthe creation of an object based scenario as it is proposedin the MPEG-4 video coding standard [6].

For resolution enhancement of the arbitrarily movingforeground objects which are not covered by the mosaic-ing process an optical flow based super-resolution tech-nique is applied. Since we are only interested in fore-ground flow no dense flow calculation between adjacentframes is necessary. This part is inspired by the workof Baker and Kanade who researched super-resolutionoptical flow very intensively [7].

Figure 1 shows the flowchart of our proposed method.The next three sections describe the techniques for super-resolution background construction, mosaic based objectsegmentation, and foreground super-resolution computa-tion respectively. In section 5 we show first results andcompare it to standard upsampling methods in an objec-tive and subjective assessment. Section 6 concludes thispaper.

2 Super-resolution Background Construction

We apply hierarchical global motion estimation for theregistration of all frames of a video shot into the refer-ence coordinate system. As underlying trans-formation

2 Matthias Kunter and Thomas Sikora

Fig. 1 Flowchart of the proposed mosaic based video con-version algorithm

model the parabolic transformation [2] is used. It is givenby

(x′, y′)T = T(k; (x, y)T

)(1)

k = (a0, . . . , a5, b0, . . . , b5)T

x′ = a0 + a1x + a2y + a3x2 + a4y

2 + a5xy

y′ = b0 + b1x + b2y + b3x2 + b4y

2 + b5xy .

It can be shown that this 12-parameter model yieldsmuch better registration results than the widely usedperspective model (homography) [2], especially in thecase of small translational camera motion. Since the modelis nonlinear, concatenation and inversion have to be es-timated numerically.

The short term motion parameters are calculated ap-plying Levenberg-Marquardt error minimization, whereasthe error function can be described by

E(t) =12· 1NΩ

·∑

(x,y)∈Ω

(It−1(x, y)− It(x′, y′))2. (2)

It−1 is the reference frame at time t − 1 and It is thetrans-formed and warped frame at time t. The energyterm is normalized by the number of overlapping pixelsNΩ in the reference frame domain Ω . For initializationof the gradient based error minimization we calculate arobust feature-based affine transformation parameter setusing a RANSAC - based technique. This monte-carlo-like method prevents the registration algorithm from get-ting stuck into local minima. Note that the affine trans-formation parameter set is the linear part of Eq. (1).

For global alignment of every frame we finally ap-ply direct parabolic parameter estimation between a pre-liminary constructed mosaic using the first frame as ref-erence coordinate system and the actual frame. In order

Fig. 2 Preliminary (left) and final (right) super-resolutionmosaic of sequence ”Charlottenburg”

to achieve this registration we apply again Levenberg-Marquardt energy minimization. As initial starting valuethe concatenation of the parabolic transformation pa-rameter is used. This can be formulated in a recursivemanner:

T0→t = T0→t−1 ⊗ Tt−1→t. (3)

The preliminary mosaic is a simple summarization ofnewly discovered image content (see Fig. 2).

The robustness of this method makes correct regis-tration possible even if huge parts of the background arecovered by foreground objects.

2.1 Blending Technique

To achieve super-resolution we search the frame In withthe biggest zoom into the scene. Without loss of gener-ality a scaled version of the first frame I0 can be used asreference coordinate system. The scaling factor s shouldbe at least

s ≥ 2 · |an1 bn

2 − an2 bn

1 | . (4)

The term describes the determinant of the Jacobian ofthe affine part of Transformation T0→n which is maximalfor In.

For the construction of the final mosaic a statisti-cal analysis of all candidate pixels for a mosaic pixelover time is performed. Here we assume that foregroundobjects have enough motion that simple median filter-ing would discover the static background. Farin [4] pro-poses to conduct further correlation based analysis whichmakes the process very complex. For our blending ap-proach we follow the rule: A pixel candidate It(x′, y′)belongs to the background only if

|It (T0→t(x, y)−m)| ≤ A ·median∀τ

|Iτ (T0→τ (x, y))−m|

with m = median∀ν

(Iν (T0→ν(x, y))) . (5)

Fig. 3 shows all possible pixel candidates for a lineof the super-resolution mosaic of sequence ”Stefan”. Forsuper-resolution blending, pixel interpolation which is

Super-resolution Mosaicing for Object based Video Format Conversion 3

Fig. 3 Image of mosaic pixel candidates over time t for theline y = 90 of sequence ”Stefan” (see the foreground object)

Fig. 4 Original (upper left), mosaic re-projection (upperright), luminance difference image (lower left), and generatedchange detection mask (lower right) of frame 12 - sequence”Stefan”

equivalent to low-pass filtering has to be avoided. There-fore, out of the background pixel candidates the on ischosen with the smallest distance dt to integer position[2].

3 Mosaic based Foreground Segmentation

Re-projecting the content of the finally calculated mosaicinto the original frame coordinate systems generates veryaccurate background models for every single image It ofa video shot. Thus, the problem of geometric adjustment[1] for object segmentation based on change detection isnon-existent.

We consider significance based change detection fora block of pixels centered at every image pixel of It.Additionally, for exploitation of spatial and temporalconsistency of the computed change mask a simplifiedMarkov-Gibbs random field approach is applied. A pixelis marked as unchanged if its conditional probability ofabsolute block pixel difference exceeds a certain thresh-old [5].

p (D(x)|H0) > τ (6)

D(x) is the mean pixel difference for a block Ωx and H0

is the null hypothesis indicating the fact that a pixel be-longs to the image background. Since p (D(x)|H0) andthe alternative conditional pdf p (D(x)|H1) can be mod-eled by Gaussian random variables we end up with sim-ple thresholding the block pixel differences.

Fig. 5 Principle of flow based super-resolution foregroundconstruction using two adjacent frames: (a) image interpo-lation, (b) optical flow calculation, (c) image warping, (d)robust blending

For incorporating spatial and temporal consistencywe use the previously generated change mask as statemap. Since the probability of a pixel to be marked aschanged raises with the number of marked neighboringpixels in space and time, threshold is decreased linearlydepending on the number of neighboring change pixelsin the temporal previous change mask.

Figure 4 shows exemplarily the result of the segmen-tation method for frame 12 of sequence ”stefan”. Note,that also background regions are detected because of in-dependently moving background objects.

4 Object Super-Resolution using Optical Flow

Foreground objects are not covered by our robust mosaicsuper-resolution process. Since it is very desirable to en-hance the resolution of the foreground objects we applyan optical flow based super-resolution method proposedin [7]. The authors suggest a five step algorithm which’sprinciple is shown in Fig. 5. Using the four neighboringframes of the actual frame In we compute the opticalflow between the upscaled frames and apply a robustblending process to the warped pictures.

As optical flow estimation a hierarchical Lucas-Kanadealgorithm is used [7] which is less robust for whole pic-tures but good enough for foreground objects where onlysmall occlusion occurs. Any other flow algorithm couldbe used instead [8]. Results show that especially for fore-ground objects this process yields in more detailed pic-tures than with any common interpolation method (seeFig. 6).

5 Experimental Results

For the demonstration of the accuracy of our proposedmethod we compared its results applied on lowpass fil-tered and downsampled scenes with the original sequences.

4 Matthias Kunter and Thomas Sikora

Fig. 6 Frame 3 of sequence ”Stefan” after spline interpola-tion (left) and super-resolution optical flow (right); see slightdifferences for script region and face region

Fig. 7 Results after super-resolution format conversionfor first 100 frames of sequence ”Stefan”, downsampled(176x120)

Figure 7 shows the calculated PSNR values for sequence”Stefan” for three different spatial upscaling approachesover 100 frames. The first is a simple interpolation ap-proach using B-splines, known to be a precise type of im-age interpolation. Second we show the results for super-resolution based on optical flow super-resolution only(see section 4). Since the applied method for optical flowcalculation is not reliable for occlusion regions the PSNRcurve is only slightly improved in comparison to the B-spline approach. The third result is obtained applyingour proposed method. The gain in terms of PSNR isup to 0.6dB. The mean values over 100 frames for theB-spline, optical flow based, and the proposed methodaverage to 23.22dB, 23.33dB, and 23.56dB respectively.

To strengthen our results we applied the method forincorrectly subsampled scenes having spatial aliasing.Here, the improvement can be mainly assessed subjec-tively because the flickering caused by aliasing cannotbe measured in terms of PSNR values. See Figure 8 forcorrect restoration of horizontal lines. The mosaic basedmulti sample reconstruction is very helpful to overcomethe aliasing effect.

6 Summary and Conclusion

We presented a new approach for spatial up-conversionof video scenes based on super-resolution mosaicing. Since

Fig. 8 Results: frame 79 of sequence ”stefan”- B-spline inter-polation (left), proposed method (right) - see aliasing effectsfor the ground line in the left image

the mosaic process can only be applied for rigid back-ground objects a optical flow super-resolution methodfor independently moving foreground objects is used.Main advantage of this approach is the simultaneousgeneration of meaningful objects due to the mosaic basedobject segmentation. Results show that for main partsof a video sequence we exceed commonly used upscal-ing methods, such as complex interpolation approaches.Additionally subjective assessment shows that we canmainly overcome the effects of spatial aliasing. Furtherwork includes the incorporation of more reliable flow cal-culation algorithms.

Acknowledgement: This work was developed within3DTV (FP6-PLT-511568-3DTV), a European Networkof Excellence funded under the European CommissionIST FP6 programme.

References

1. R. J. Radke, et al., ”Image Change Detection Algo-rithms: A Systematic Survey,” IEEE Trans. Image Pro-cessing, vol. 14 , no. 3, pp. 294-307, March 2005.

2. M. Kunter, J. Kim, T. Sikora, ”Super-resolution Mosaic-ing using Embedded Hybrid Recursive Flow-based Seg-mentation,” Int. Conf. on Information, Communicationsand Signal Processing (ICICS ’05), Bangkok, Thailand,Dec. 2005.

3. G. Ye, et all. ”A Robust Approach to Super-resolutionSprite Generation,” Int. Conf. on Image Processing(ICIP’05), Genova, Italy, Sept. 2005.

4. D. Farin, P. H.N. de With, and W. Effelsberg, ”Video-Object Segmentation using Multi-Sprite BackgroundSubtraction”, IEEE Int. Conf. on Multimedia and Expo(ICME), Taipei, Taiwan, 2004.

5. A. Smolic, et al., ”MPEG-4 Video Transmission overDAB/DMB: Joined Optimization of Encoding and For-mat Conversion”, Int. Workshop on Multimedia Com-mun. (MoMuc’98), Berlin, Germany, Oct. 1998.

6. T. Sikora ”Mpeg digital video coding standards,” SignalProcessing Magazine, IEEE, vol. 14, no. 5, pp. 82-100,Sep. 1997.

7. S. Baker, T. Kanade, ”Super-resolution optical flow,”Technical Report CMU-RI-TR-99-36, Carnegie MellonUniversity, 1999.

8. R. Fransens, C. Strecha, L. V. Gool, ”A Probablis-tic Approach to Optical Flow based Super-resoltuion,”Conf. on Computer Vision and Pattern Recognition(CVPRW’04),Washington, DC, USA, June 2004.

Documents

Super-resolution Mosaicing for Object based Video Format ...elvera.nue.tu-berlin.de/files/0997Kunter2006.pdf · Fig. 7 Results after super-resolution format conversion for ﬁrst