5
On Quality of Object-Based Video Material Juergen Wuenschmann , Julian Forster , Christian Feller and Albrecht Rothermel Universität Ulm Institute of Microelectronics, Albert-Einstein-Allee 43, 89081 Ulm, Germany Email: {juergen.wuenschmann, christian.feller, albrecht.rothermel}@uni-ulm.de Daimler AG Group Research & Advanced Engineering, Wilhelm-Runge-Str. 11, 89081 Ulm, Germany Email: [email protected] Abstract—Object-based video coding is a fairly new topic and while quality assessment is a main part of traditional video coding it has not been investigated much for object-based material. Many quality assessment metrics have been published and some are widely used. If they can be used for object-based material as well is evaluated in this paper. The two most used full-reference algorithms and two promising no-reference algorithms have been tested. Only one of the no-reference algorithms correlates with subjective impressions and expected quality results. This algorithm unfortunately is very slow (over 8 min/frame). I. I NTRODUCTION The quality of video material is dependent on many factors. Besides the recording quality, which most of the time has to be taken as is, a major part which influences the video quality is video encoding. Encoding is necessary because in most, if not all, applications the data rate of uncompressed video material is too high to be acceptable. For pixel-based video material, many encoding algorithms have evolved over the years and the data rate needed for a certain quality is decremented in that process. Nevertheless, quality degradation is still an issue as commonly used video coding algorithms are lossy and therefore coding artifacts are introduced. In contrast to traditional pixel-based video encoding, an object- based encoding scheme can be used for modeled and animated material [1]. To be competible to pixel-based video encoding, first the data rate has to be comparable [2] and second the quality has to be at least the same. To investigate if the second goal can be reached, a quality evaluation has to be done. There are several well known video quality evaluation algorithms. The usability of these algorithms for object-based video material is investigated in this paper. In Section II an overview of the artifacts introduced in video coding is given. In this section, it is also discussed if they are an issue in pixel-based or object-based video coding or in both of them. A brief insight in video quality algorithms is given in Section III. In Section IV the results of the quality evaluation for object-based video and, for comparison, for pixel-based video are shown. Section V concludes the paper. II. ARTIFACTS The most well known artifact introduced through video cod- ing is blocking. When using a codec from the MPEG family for encoding, a frame is separated into macroblocks. If the data rate becomes too small, the edges of these macroblocks become visible. Another visible artifact is mosquito noise, also called Gibbs effect. It is caused by the discrete cosine transformation and is mostly visible in uniform regions with high frequency components like edges. Blurriness is also very common and is caused mainly by excessive quantization. The latter two occur in wavelet based video compression, such as Motion JPEG 2000, as well. All the aforementioned artifacts are common to traditional video coding and it is well known what methods inside a codec they arise from. As an object-based compression works completely different, it can not be inferred, that it shows similar artifacts. Since the object-based compression is lossy as well, artifacts can be expected. Whether they are visible and annoying needs investigation. Parts in the object-based codec, that use lossy encoding are geometry and animation encoding. III. QUALITY ASSESSMENT ALGORITHMS In quality assessment, most valuable results are always ob- tained by subjective measurements. A number of test viewers are asked to watch certain clips and judge their quality either solely or in comparison to other clips [3]. Since this is a very time consuming process, it is often preferred to use objective quality measurement algorithms to judge the quality. Much research has been done, to develop easy and reliable quality assessment algorithms. They can be grouped in full-, reduced-, and no-reference methods. Many quality assessment algorithms were developed for image quality assessment, but can easily be expanded to work with video by calculating the average quality score over all frames. The most used methods are full-reference metrics, which directly compare a reference picture/block with the one under test. Common ones are peak signal to noise ratio (PSNR ) and structural similarity (SSIM ) score [4]. While PSNR is a pure mathematical metric, SSIM tries to emulate the behavior of the human visual system. Therefore it tries to rate disturbing artifacts down, while e.g. slight luminance differences result in an only slightly lower score. Reduced reference metrics only use a set of pre-computed features of the original frame to judge the quality of the distorted image [5]. Since these features have to be adapted to different kinds of artifacts, the results would not be comparable for pixel-based and object-based video material and therefore they are not applicable to our test scenario. 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) 978-1-4673-1547-0/12/$31.00 ©2012 IEEE

[IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

On Quality of Object-Based Video MaterialJuergen Wuenschmann∗, Julian Forster†, Christian Feller∗ and Albrecht Rothermel∗

∗ Universität UlmInstitute of Microelectronics, Albert-Einstein-Allee 43, 89081 Ulm, Germany

Email: {juergen.wuenschmann, christian.feller, albrecht.rothermel}@uni-ulm.de† Daimler AG

Group Research & Advanced Engineering, Wilhelm-Runge-Str. 11, 89081 Ulm, GermanyEmail: [email protected]

Abstract—Object-based video coding is a fairly new topic andwhile quality assessment is a main part of traditional video codingit has not been investigated much for object-based material. Manyquality assessment metrics have been published and some arewidely used. If they can be used for object-based material aswell is evaluated in this paper. The two most used full-referencealgorithms and two promising no-reference algorithms havebeen tested. Only one of the no-reference algorithms correlateswith subjective impressions and expected quality results. Thisalgorithm unfortunately is very slow (over 8 min/frame).

I. INTRODUCTION

The quality of video material is dependent on many factors.Besides the recording quality, which most of the time hasto be taken as is, a major part which influences the videoquality is video encoding. Encoding is necessary because inmost, if not all, applications the data rate of uncompressedvideo material is too high to be acceptable. For pixel-basedvideo material, many encoding algorithms have evolved overthe years and the data rate needed for a certain quality isdecremented in that process. Nevertheless, quality degradationis still an issue as commonly used video coding algorithms arelossy and therefore coding artifacts are introduced.In contrast to traditional pixel-based video encoding, an object-based encoding scheme can be used for modeled and animatedmaterial [1]. To be competible to pixel-based video encoding,first the data rate has to be comparable [2] and second thequality has to be at least the same.To investigate if the second goal can be reached, a qualityevaluation has to be done. There are several well known videoquality evaluation algorithms. The usability of these algorithmsfor object-based video material is investigated in this paper.In Section II an overview of the artifacts introduced in videocoding is given. In this section, it is also discussed if they arean issue in pixel-based or object-based video coding or in bothof them. A brief insight in video quality algorithms is given inSection III. In Section IV the results of the quality evaluationfor object-based video and, for comparison, for pixel-basedvideo are shown. Section V concludes the paper.

II. ARTIFACTS

The most well known artifact introduced through video cod-ing is blocking. When using a codec from the MPEG familyfor encoding, a frame is separated into macroblocks. If thedata rate becomes too small, the edges of these macroblocks

become visible. Another visible artifact is mosquito noise,also called Gibbs effect. It is caused by the discrete cosinetransformation and is mostly visible in uniform regions withhigh frequency components like edges. Blurriness is also verycommon and is caused mainly by excessive quantization. Thelatter two occur in wavelet based video compression, such asMotion JPEG 2000, as well.All the aforementioned artifacts are common to traditionalvideo coding and it is well known what methods inside acodec they arise from. As an object-based compression workscompletely different, it can not be inferred, that it showssimilar artifacts. Since the object-based compression is lossyas well, artifacts can be expected. Whether they are visible andannoying needs investigation. Parts in the object-based codec,that use lossy encoding are geometry and animation encoding.

III. QUALITY ASSESSMENT ALGORITHMS

In quality assessment, most valuable results are always ob-tained by subjective measurements. A number of test viewersare asked to watch certain clips and judge their quality eithersolely or in comparison to other clips [3]. Since this is a verytime consuming process, it is often preferred to use objectivequality measurement algorithms to judge the quality. Muchresearch has been done, to develop easy and reliable qualityassessment algorithms. They can be grouped in full-, reduced-,and no-reference methods.Many quality assessment algorithms were developed for imagequality assessment, but can easily be expanded to work withvideo by calculating the average quality score over all frames.The most used methods are full-reference metrics, whichdirectly compare a reference picture/block with the one undertest. Common ones are peak signal to noise ratio (PSNR) andstructural similarity (SSIM) score [4]. While PSNR is a puremathematical metric, SSIM tries to emulate the behavior ofthe human visual system. Therefore it tries to rate disturbingartifacts down, while e.g. slight luminance differences resultin an only slightly lower score.Reduced reference metrics only use a set of pre-computedfeatures of the original frame to judge the quality of thedistorted image [5]. Since these features have to be adapted todifferent kinds of artifacts, the results would not be comparablefor pixel-based and object-based video material and thereforethey are not applicable to our test scenario.

2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin)

978-1-4673-1547-0/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

No-reference metrics do not need any information of theundistorted source to evaluate the quality. To accomplish thistask, the algorithms have to deal with the different classesof artifacts that are possible. This means, artifacts and theirstrength have to be identified and the frame has to be judgedalike. Because this is a difficult task, most of the publishedalgorithms can only deal with one class of artifacts, whileignoring all others. An overview of no-reference metrics canbe found in [6] and [7].Since there are no dedicated quality metrics for object-basedvideo material, it will be investigated if the known algorithmsgive valuable results for this material. This would be favorable,since the results should be comparable to pixel-based videocoding quality scores. A reference can be rendered out of theoriginal object-based file. This reference video is h.264 en-/decoded. The object-based file is en-/decoded and renderedto serve as video under test. Therefore, the two mentionedfull-reference metrics will be tested. Besides that, it is alwayspossible to test a no-reference metric on any video available.Since the object based material has no distinct artifacts, onlymetrics who can deal with various artifacts are promising. Ob-viously not usable are metrics that work on encoded material,e.g. in the DCT domain. BIQI and BLIINDS fulfill the givenpreconditions and are used in this paper [8], [9].

IV. RESULTS

The results obtained using PSNR and SSIM are shownin Figs. 1 and 2. As reference the original object-basedvideo was rendered. This video was h.264 encoded with fourdifferent settings. Two constant bit rate versions (10MBit/s,

Fig. 1. PSNR scores for a rotating object with rising complexity. The h.264encoded videos show the expected behavior, whereas the object-based encodedmaterial gets low quality scores.

20MBit/s) and two variable bit rate versions (crf18, crf24; crfmeans constant rate factor and is the quantization parameter)are shown here. On the object-based side, the original filewas en-/decoded using the codec described in [1] (MP4enc/dec). As test scenario the third scenario from [1] waschosen. It contains one object that is repeatedly subdivided

Fig. 2. SSIM scores for a rotating object with rising complexity. The h.264encoded videos get very good quality scores. For higher complexity of thescene, the calculated quality of the object-based videos drops quickly.

with every iteration and therefore the level of detail (in thiscase number of vertices) is increased with every iterationi by v = (2i−1 + 1)3 − (2i−1

− 1)3 (see also Figure 3). Itis observed, that all five versions show a declining PSNRvalue with increasing complexity. For the variable bit ratefiles, this is remarkable since this setting should keep thequality constant. It is also remarkable, that the object-basedencoded files show a significantly lower PSNR value (up to38dB), than all pixel-based files. For SSIM (Fig. 2), whichhas an upper limit of one, the pixel-based files show a veryhigh quality, even for high complexity, whereas the object-based files show a large decline for increasing complexity.To find the reasons for these very bad results, the sceneswere judged subjectively and no distortion could be foundwhile the videos were running. When compared frame wise,slight differences between the original and the object-baseden/decoded scenes could be found (Fig. 3(a) and 3(b)). It canbe noticed, that Fig. 3(b) is slightly darker. Besides that, thereis no noticeable difference. The average luminance differencebetween the original and the object-based coded version hasbeen calculated for different frames and the results are shownin Table I. There is only a slight luminance difference betweenthe original Collada files and the encoded version with thenormal settings (10 bit precision). The luminance differencein the object region is expected to be higher, but is diminishedby the uniform background. More differences can be found, ifa frame is saved in a text based format. That way, pixel valuescan be compared directly. It was observed, that the object wasshifted one or two pixels compared to the original and the pixelvalues in the object region were slightly different comparedto the original. The reason for these deviations lies in theobject-based encoding process. Geometry and Animations getquantized and rounded in the encoding process. Especiallynot all animation values are stored in the encoded file, butwill be interpolated from so-called key frames while decoding.While the object was not visibly distorted as such, the change

Page 3: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

(a) (b)

(c)

Fig. 3. First frame of the incrementing vertices script for eight vertices. (a)Original, (b) object-based encoded with 10bit precision for coordinates andnormals, (c) encoded with 20bit precision for coordinates and normals.

TABLE IAVERAGE LUMINANCE OF ORIGINAL COLLADA FILE, OBJECT-BASED

ENCODED VERSION WITH 10BIT PRECISION AND 20BIT PRECISION.

Frame 1 Luminance

Original 58.28

Encoded, 10bit precision 58.03

Encoded, 20bit precision 58.26

Frame 256

Original 58.52

Encoded, 10bit precision 58.18

Encoded, 20bit precision 58.46

in the geometry introduced the pixel shift. The luminancechange is caused by distortion of face normals. The normalsare used for shading in the rendering process. If a normalgets slightly distorted, the luminance of a whole plane/facechanges. The animation changed due to the interpolationprocess and deviations up to 3◦ were found. To visualize

(a) (b)

Fig. 4. Pixel-by-pixel difference of frame 256 for the normal settingwith 10bit precision for coordinates and normals (a) and 20bit precision forcoordinates and normals (b).

this the pixel-by-pixel difference between the original andthe object-based coded version has been calculated and the

section of the object is shown in Fig. 4(a) for frame 256. Aclearly visible shift of the object is observed, which distortsfull reference quality assessment metrics while it is not noticedwhen watching the clip.These non-visible deviations can not be recognized by espe-cially PSNR. PSNR compares a frame pixel by pixel and sinceevery pixel in the object region is different compared to theoriginal, the PSNR score is bad. Although in [4] it is stated,that SSIM should be invariant to e.g. slight luminance changes,it has big problems with this scenario as can be seen in Fig. 2.The cause must be the shift and the differing animation.If parameters for the quality in the encoder are set accordingly,these factors can be minimized. In Fig. 3(c) no visual differ-ence can be noticed compared to Fig. 3(a) and the luminancedifference is negligible. Nevertheless, the PSNR and SSIMvalues are significantly worse than for the pixel-based videos.The reason for this is again the deviation in the animationcurve, which is visualized in Fig. 4(b) for the high qualitysettings.The problems that degrade the scores are identified andexplicitly do not degrade the visual quality as well knownartifacts do, it must be concluded, that these two metrics arenot usable to judge the quality of object-based video files.All the identified factors that degrade the objective qualityscores have in common, that they are caused by -hopefully-non visible shifts and alterations to the scene.Another class of video quality assessment algorithms, that donot face these shortcomings are no-reference quality assess-ment metrics. They do not use a reference to compare theframe under test and therefore can not identify mathematicaldifferences as potential problems. As item under test thesecond scenario from [1] was taken. The scene has a constantnumber of objects in it, which rotate. The rotational speedis increased with every incrementation. As no reference isneeded, the quality score is also calculated for the uncom-pressed reference Orig Avi, which of course should have thebest quality.The results obtained using BIQI are shown in Fig 5. BIQIhas a score range from 0 to 100 (0 is best, 100 is worstquality). When the rotational speed is 0, which means thereis no animation, the quality score for all the variations areworse than for higher rotational speeds. The score of theuncompressed and object-based encoded MP4 enc/dec Avieven exceed the lower limit of 100. A test with simplepictures explained the cause of this behavior. BIQI uses fiveindependent features to judge the quality. One of them is theJPEG quality score (JQS, [10]), which evaluates jpeg artifacts(blocking). This algorithm does not work with uncompressedand therefore undistorted, simple images and is also highlyunreliable for jpeg compressed material. Therefore, BIQI wascalculated again without JQS. The results are shown in Fig. 6.BIQI now judges the first iteration with no animation with avery good score next to one for all versions of the video. Withanimation, the score drops significantly but shows a very goodquality, too. The uncompressed version and the object-basedencoded version show a slightly worse quality than the h.264

Page 4: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

Fig. 5. BIQI scores with JQS for animation of many objects with risingrotational speed. Lower score is better quality. The lower quality limit shouldbe 100.

Fig. 6. BIQI scores with JQS deactivated for animation of many objects withrising rotational speed. Score distribution is partly not as expected, e.g. crf18h.264 has lower scores than crf24 h.264 and the uncompressed reference haseven lower scores.

encoded files, except for the 10MBit/s h.264. This must beinterpreted as misjudgment, since at least the uncompressedversion must have a better quality than every pixel-basedencoded file. Another misjudgment is clearly, that crf18 h.264shows up to one point worse quality than crf24 h.264. Inconclusion, BIQI can not reliably evaluate the quality of h.264compressed scenes and is also not able to rank uncompressedmaterial correctly. Therefore, the judgment of the object-basedcompressed scenes is also to be mistrusted and BIQI can notbe used to evaluate the objective video quality for the givenmaterial.In Fig.7 the results for BLIINDS are shown. BLIINDS has ascore range from 0 to 100 (0 is worst, 100 is best quality).All BLIINDS scores here are between 30 and 50, whichshould be a mediocre quality. Even the original, uncompressedfiles have scores below 50. Comparing the files against each

Fig. 7. BLIINDS scores for animation of many objects with rising rotationalspeed. Higher score is higher quality and the upper limit is 100. The qualityscore distribution is as expected, but all videos have mediocre scores, eventhe uncompressed reference Orig Avi. Besides outliers, the uncompressedreference and MP4 enc/dec Avi get similar quality scores.

other, they show the expected order for the h.264 compressedfiles. 20MBit/s h.264 has a 2-4 points better score than10MBit/s h.264 and crf18 h.264 has a 1-2 points betterscore than crf24 h.264. The uncompressed files Orig Aviare rated up to 9 points better than the best rated h.264compressed file 20MBit/s h.264. The object-based compressedfiles MP4 enc/dec Avi shows mostly the same scores as theuncompressed Orig Avi, but is in some cases ≈ 5 points betterthan the reference. An explanation for this behavior is still tobe found.Besides the low overall scores, BLIINDS seams to work verywell with the given material, but the computation time makesit unusable for video material in practice. One frame of thegiven scenario in full-HD resolution with a fairly low levelof detail takes 8 min 20 s to calculate on an Intel i5 with2.8GHz. In this case one scene has 501 frames and there are6 different versions of the scene with 13 different animationspeeds, which would sum up to a computation time of ≈ 226days. For all the scenarios it would take years to calculate. Theresults shown here are obtained calculating the quality scoreonly for every 23rd frame to lower the computation time.

V. CONCLUSION

Four video quality assessment metrics have been evaluatedfor their usability with object-based encoded video material.For comparison and to verify the results, the rendered refer-ence version has been h.264 encoded with four different qual-ity settings. First, the two most used full-reference algorithmsPSNR and SSIM have been tested. The results revealed lowquality scores for the object-based encoding, but the reasonscould be found and invalidated. The low scores were causedby pixel shifts, luminance differences and interpolation errorsin the decoding process. All these factors cause no visualdegradation, but lower the scores of both the algorithms,

Page 5: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

because of their mathematical working principle.Next, two of the most versatile no-reference quality assessmentmetrics, BIQI and BLIINDS, were tested. BIQI uses JQS asone of its features, which does not work for uncompressed andartificial material. Therefore BIQI was used without JQS, butthe results showed, that BIQI can not judge the quality reliably.For example, the clips encoded using quantization factor crf18got up to one point lower scores than the ones encoded usingcrf24, which is a way worse quality. Therefore the qualityclassification for the object-based encoded videos has to bemistrusted. BLIINDS shows the expected results for h.264 aswell as the reference and the object-based encoded versions,but shows a strange behavior in some of the measurementsfor the object-based encoded version. The quality jumps fourpoints up and then goes down to normal to go up again.Besides that, BLIINDS has very long computation times of 8min 20 s for one frame with full-HD resolution and is thereforenot usable for a full quality evaluation of many video clips.In summary, all of the evaluated algorithms show majordefeciencies when applied to object-based video data andtherefore are unfeasible for quality assessment in this scenario.To reliably judge the quality of this material, a subjectivequality assessment is the only solution. In the long run, newalgorithms must be developed, that are able to deal with object-based video material.

REFERENCES

[1] J. Wuenschmann, T. Roll, C. Feller, and A. Rothermel, “Analysis andimprovements to the object based video encoder mpeg 4 part 25,” inProc. IEEE 1st International Conference on Consumer Electronics -Berlin, September 2011.

[2] J. Wuenschmann, C. Feller, and A. Rothermel, “File size comparison ofmodeled and pixel based video in five scenarios,” in Proc. InternationalConferences on Advances in Multimedia, May 2012.

[3] “Recommendation itu-r bt.500-11 - methodology for the subjectiveassessment of the quality of television pictures,” 2002.

[4] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,” vol.13, no. 4, pp. 600–612, April 2004.

[5] S. Gauss, T. Müller, T. Roll, J. Wünschmann, and A. Rothermel,“Objective video quality assessment of mobile television receivers,” inProc. IEEE 14th Int Consumer Electronics (ISCE) Symp, 2010, pp. 1–6.

[6] A. V. Murthy and L. J. Karam, “A matlab-based framework for imageand video quality evaluation,” in Proc. Second Int Quality of MultimediaExperience (QoMEX) Workshop, 2010, pp. 242–247.

[7] Frank M. Ciaramello and Amy R. Reibman, “Systematic stress testingof image quality estimators,” in Proc. 18th IEEE Int Image Processing(ICIP) Conf, 2011, pp. 3101–3104.

[8] A. K. Moorthy and A. C. Bovik, “A two-step framework for constructingblind image quality indices,” vol. 17, no. 5, pp. 513–516, 2010.

[9] M. A. Saad, A. C. Bovik, and C. Charrier, “A dct statistics-based blindimage quality index,” vol. 17, no. 6, pp. 583–586, 2010.

[10] Zhou Wang, H.R. Sheikh, and A.C. Bovik, “No-reference perceptualquality assessment of jpeg compressed images,” in Image Processing.2002. Proceedings. 2002 International Conference on, 2002, vol. 1, pp.I–477 – I–480 vol.1.