[IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

Fast Motion Estimation Algorithm for HEVC N.Purnachand1,2, Luis Nero Alves1,2, Antonio Navarro1,2

1Institudo de Telecomunicações,

Pólo-Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal

2Departamento de Electrónica, Telecomunicações e Informática, Universidade de Aveiro Campus Universitário de Santiago

3810-193 Aveiro, Portugal

Abstract—Motion Estimation is an essential process in many video coding standards like MPEG-2, H.264/AVC and HEVC. Despite Motion Estimation has been used at the encoder, it is expected to be used in future consumer devices in the distributed video coding architectures. But the Motion Estimation itself consumes more than 50% coding complexity or time to encode. To reduce the computation time, many fast Motion Estimation Algorithms were proposed and implemented. The present paper proposes a new fast ME algorithm which outperforms the fast ME algorithm implemented in HEVC reference software HM.

Keywords-Motion Estimation; Early Termination; HEVC;

I. INTRODUCTION

Block based video coding is widely used in current video coding standards due to its efficiency. H.264/AVC and HEVC are some examples employing block based video coding [1,2]. The Motion Estimation (ME) and Compensation are two essential processes in block based video coding. The ME tool finds the best matched block position (Motion Vector) in the past (or future frames) for every block in the current video frame, whereas the motion compensation generates compensated frames by using these motion vectors. Among them, ME is the most challenging and time consuming stage. In part, this is due to variable block size motion estimation and multiple reference frames. Both contribute to make ME consume a large portion of the encoding time. Efficient algorithms can mitigate part of this problem. There are several ME algorithms reported in the literature [3-5]. In distributed video coding this ME complexity maybe shifted to the decoder side, but the overall complexity almost remains unaffected [6]. Apart from video compression, ME is also used in frame interpolation techniques for frame rate up-conversion [7, 8].

The present paper proposes a new ME algorithm for HEVC encoder, with rotating hexagonal grids for finding global minimum. It also introduces an adaptive threshold factor for early termination. Section 2 presents a short overview on HEVC motion estimation methods and Section 3 presents the new methods and their integration within motion estimation algorithms. Section 4 discusses the experimental results and finally section 5 presents the concluding remarks.

II. MOTION ESTIMATION AND ITS TOOLS

HEVC is the latest video coding standard developed by JCT-VC (Joint Collaborative Team on Video Coding). In HEVC, the block coding hierarchy is more generalized. Each frame is divided into square blocks called Coding Units (CUs) with maximum size 64x64 and recursively subdivided into square blocks till 8x8. The CUs are assigned quadtrees where

each CU is sub-divided into quadtree based prediction blocks called Prediction Units (PUs) of either intra or inter or skip type. Each PU is again partitioned into quadtree based transform blocks called Transform Units (TUs) specifying transform size. The quadtree representation of CU with their PU structure is shown in Fig. 1.

Motion estimation is the process of finding the best matched block in a search window of the reference frame (past/future frames) for each block in the current frame. The process is illustrated in Fig. 2. For finding the best matched block, the ME algorithm uses lagrangian cost function (JMV) illustrated in (1),

�� (1)

where SAD is the matching function used (here Sum of Absolute Difference), λM is the Lagrangian multiplier, MV is the current motion vector obtained and PMV is the predicted motion vector obtained from motion prediction process defined by the standard (where the motion prediction algorithms should

Fig. 1. Quad-tree Coding Structure in HEVC.

Fig.2. Illustration of ME Process.

2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin)

978-1-4673-1547-0/12/$31.00 ©2012 IEEE

be similar in both encoder and decoder). ‘MV-PMV’ gives the motion vector difference (MVD) and ‘R(MV-PMV)’ denotes rate required to encode this MVD. The final motion vector is the vector of the block with minimum cost function.

ME algorithms searching all the blocks in reference frame search window are called full search algorithms. On the other hand, ME algorithms searching only blocks which are likely to generate sub optimal MVs, are called fast ME algorithms. There are many fast ME algorithms. Some of the important algorithms that are implemented in AVC reference software are UMHexS (Unsymmetrical-cross Multi-hexagon-grid Search), SUMHexS (Simple UMHexS), EPZS (Enhanced Predictive Zonal Search) [3]. H.264 MVC reference software encoder employs TZS algorithm (Test Zone Search) [4]. TZS is also used in encoder of HEVC reference software [5].

The process of searching the best matched block in fast ME algorithms typically consists of four stages. The first stage is the prediction stage (prediction algorithm defined inside ME process), where the algorithm uses motion vectors of previously coded neighboring blocks/Coding-Units (CUs) to predict the initial block for the search process. In the second stage, search patterns are employed to find the global minimum point. The third stage is the early termination, where the ME algorithm uses a threshold to terminate the search process. In the final stage (fine refinement stage), the ME algorithm refines the motion vector if it does not satisfy the early termination criterion.

III. PROPOSED ALGORITHM

The present paper proposes two changes to the ME algorithm implemented in HM namely a new global search pattern for finding the global minima and an adaptive early

termination condition. Both are able to comply with HEVC standard. The improvements in each stage are explained in the following sub-sections.

A. Rotating Hexagonal Grid The global minimum point can be estimated using grid

patterns like diamond, square or hexagon. The diamond and square patterns are shown in Fig. 3(a) and 3(b), respectively. Both have eight points per each grid. For instance, using diamond search patterns for a 64x64 search window, will require a total of 6 grids with 8 inspection points and 1 grid with 4 inspection points (internal small diamond), making a total of 52 (6x8+4) search points. Hexagon patterns like the ones depicted in Fig. 3(c) to 3(f) have six points per each grid. In this case, hexagons exhibit computation time advantage. Using hexagonal search patterns for the same 64x64 search window will require 40 (6x6+4) search points per each search window. When compared with diamond patterns, the hexagonal patterns save around 23% per each search window. However, the computation time savings come at the expense of slight decrease on PSNR. This is simple to understand: diamond patterns cover the motion in all the 8 directions, whereas the hexagon covers motion in only 6 directions.

Generally speaking, it is possible to envisage four hexagonal search pattern configurations. The basic ones are horizontal and vertical hexagonal patterns, depicted in Fig. 3(c) and 3(d). Horizontal hexagons, are good for horizontal motion estimation, however, they lose performance for vertical motion, since they interpolate top and bottom search points. On the other hand if we consider a vertical hexagon, it is a good estimate for vertical motion but loses performance for horizontally moving objects. Hence, in order to cope with both

(a) Diamond Pattern (b) Square Pattern (c) Horizontal Hexagonal

Pattern (d) Vertical Hexagonal Pattern

(e) Rotating Hexagon Pattern – type 1

(f) Rotating Hexagonal Pattern -Type 2

Fig. 3. Search Patterns for Motion Estimation with stride length 8.

vertical and horizontal motion without performance impairments, it is possible to consider rotating hexagonal pattern as shown in Fig. 3(e) and 3(f).

There are two possible types of rotating hexagonal patterns as shown in Fig. 3 (e) and Fig. 3 (f). The first type rotating hexagonal pattern (shown in Fig. 3(e)) is slightly better than type-2 rotating hexagon pattern (shown in Fig. 3(f)). This is can be understood through the motion vector translational model. According to this model, the motion vector density is more concentrated at the origin. Furthermore, motion vectors have higher densities in horizontal direction. Type 1 hexagonal patterns are able to perform better under these perspectives, since these patterns are able to reinforce horizontal motion estimation. Table 2 shows simulation evidence of this fact. Section IV will discuss in detail the results reported in table 2.

B. Adaptive Threshold For Early Termination

After finding the global minimum point, setting up a threshold to terminate the algorithm can save significant amount of computation time. This threshold can be chosen adaptively by averaging the costs of all the previous inter prediction coding units in the first frame (except intra frame) of GOP. Since each inter-PU block has a different size, the average cost for each one will be different and hence the threshold for each block size will also be different. In HEVC, there are four inter PU sizes (also called partition size) for each CU size as shown in Fig. 1. A generalized representation of the threshold value for each block with CU depth d and partition size p is given by,

�� !�"��

#��

$%& (2)

where N(d,p) represents the total number of coding units in the first frame of GOP for the depth d and partition size p, and cost is the distortion cost used in the algorithm (here SAD). For example, in HEVC, the maximum CU depth is 4 and each CU has 3 partition sizes in inter-prediction (2Nx2N, 2NxN, Nx2N) along with unpartitioned 4x4 inter-PU size. Hence there will be 13 (4x3+1) thresholds Th(0,1), Th(0,2), Th(0,3), Th(1,0), Th(1,1) and so on until Th(3,2), Th(4,0), each corresponding to their respective CU sizes from 64x64 through 8x8 and partition sizes 2Nx2N, 2NxN, Nx2N. For other codecs, this threshold

equation can be changed depending on its coding structure. For example in AVC, the coding depth is fixed to 2, with maximum CU size (macroblock) of 16 and thus allowing modes from 16x16 through 4x4, with thresholds Th(0,0) through Th(2,2), Th(3,0) making altogether 7 thresholds corresponding to each one of the 7 modes.

C. Prediction and Fine Refinement Stages

The proposed algorithm uses spatial up, left, top-right/top-left and median predictors. For fine refinement, the algorithm uses the rotating hexagonal grids with reduced stride lengths. The prediction algorithm is similar to that of the fast ME algorithm used in HEVC [5] but for fine refinement stage, the patterns used are changed to rotating hexagon grid type 1, shown in Fig.3 (e). In global search stage, the rotating hexagonal grid uses all possible stride lengths, but in fine refinement stage the stride lengths are reduced in every loop.

IV. SIMULATION RESULTS

The algorithm is implemented in HEVC reference software HM 3.4 [5] and compared with the TZ Search algorithm implemented in the reference encoder. The search range used is 256 with maximum CU Size 64 and maximum CU partition depth 4. The number of frames taken in each sequence is 100. The simulations are carried on windows 7 OS platform with Intel I7 extreme @ 2.93 GHz CPU and 4GB RAM. Table 1 shows the comparison results of ME time and total encoding time for different patterns with the diamond patterns (TZSD) as reference. Table 2 shows the Bjontegaard delta metrics [9] for different patterns TZSD (TZS Diamond), TZSH (TZS Hexagon), TZSRH1 (TZS Rotating Hexagon type 1) and TZSET-

RH1 (TZSRH1 with early termination step). In table 2, TZSD vs TZSH denotes TZSD as reference for comparison. The negative BD-PSNR denotes that the PSNR is decreased with respect to reference algorithm and vice-versa, and negative BD-bitrate denotes bitrate is decreased with respect to reference algorithm and vice-versa.

The simulation results reveal that the proposed algorithm reduces a major amount (around 40% to 80% depending on motion in video sequence) of ME complexity compared to the fast algorithm (TZ Search) that is implemented in HEVC reference software. Due to rotating hexagonal pattern (TZSRH1), there is an overall decrease of almost 15% of

Table 1. ME Time Comparison Results between Diamond, Hexagon and Rotating Hexagon Patterns

Sequence QP ME Time (sec) Δ Δ Δ Δ ME Time (%) Encoding Time (sec) Δ Δ Δ Δ Encoding Time (%)

TZSD TZSRH1 TZSRH2 TZSET-RH1 TZSD vs. TZSRH1

TZSD vs. TZSET-RH1

TZSD TZSRH1 TZSRH2 TZSET-RH1 TZSD vs. TZSRH1

TZSD vs. TZSET-RH1

BasketballPass _416x240

27 �� 32 ��

RaceHorses _416x240

27 �� 32 ��

BQMal l_832x480

27 �� 32 ��

PartyScene _832x480

27 �� 32 ��

BasketBallDrive _1920x1080

27 �� 32 ��

Cactus _1920x1080

27 �� 32 ��

��

computational time (on an average). The RD curves shown in Fig.4 for sequences show that the RD performance almost remains unchanged compared with TZSD and this can also be checked in table 2. The ME time and the total encoding time difference between TZSRH1 and TZSRH2 (TZS with rotating hexagon type 2) is shown in Table 1 and is very negligible, but comparatively TZSRH1 has slightly less ME time.

The Bjontegaard delta (BD) results showed in Table 2 gives very useful comparison results for different patterns. Compared with diamond pattern (TZSD), there is an overall reduction of 0.01 dB of BD-PSNR (Bjontegaard Delta – PSNR) for horizontal hexagon pattern (TZSH), and this value may increase for large sequences. But, when rotating hexagon pattern (TZSRH1) is used, there is a decrease of just 0.0013 dB BD-PSNR compared to TZSD. Thus, the overall reduction of BD-PSNR is compensated when used with rotating hexagon patterns, even after using early termination (TZSET-RH1). Similarly for BD-bitrate, when hexagon pattern is used, there is an increase of 0.29 % compared to TZSD. So, when rotating hexagon pattern is used, there is an increase of just 0.019 % of BD-bitrate compared to TZSH, and hence compensates for TZSET-RH1.

V. CONCLUSION

The proposed algorithm contributes towards reduction in 53.1 %, on average, in the motion estimation computational complexity with negligible loss in RD performance compared to TZ search algorithm that is implemented in HEVC, and hence can be used in HEVC. The algorithm can also be used in

encoders of other codec standards like H.264 AVC, MPEG-4 and decoders of distributed codec standards like Wyner-Ziv.

ACKNOWLEDGEMENTS This work is supported by “Fundação para a Ciência e a

Tecnologia (FCT)” Portugal grant Ref. SFRH/BD/73266/2010.

REFERENCES [1] T. Wiegand, B. Bross, W.J. Han, J.-R. Ohm, and G. J. Sullivan, Working

Draft 3 of High Efficiency Video Coding, JCTVC-E603, JCTVC of ISO/IEC and ITU-T. Geneva, Swiss, Mar. 2011.

[2] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overviewof the H.264/AVC video coding standard,” IEEE Trans. on Circ. Sys. for Video Tech., vol. 13, no. 7, July 2003.

[3] Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, “Comments on Motion Estimation Algorithms in Current JM Software (JVT-Q089)”, Joint Video Team Document, 17th Meeting: Nice, FR, 14-21 October, 2005.

[4] N.Purnachand, L.Nero Alves, A.Navarro, “Improvements to TZ search motion estimation algorithm for multiview video coding”, IEEE IWSSIP 2012, Vienna Apr 2012.

[5] HM Reference Software 3.4 [Online]. Available: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware

[6] F. Dufaux, W. Gao, S. Tubaro, A. Vetro "Distributed video coding: trends and perspectives", EURASIP J. Image Video Process, 2009

[7] J. Ascenso, C. Brites, F. Pereira, “Improving Frame Interpolation with Spatial Motion Smoothing for Pixel Domain Distributed Video Coding”, in Proc. EURASIP Conference on Speech and Image Processing, Multimedia Comm and Services, Slovak Republic, June-July 2005.

[8] W. Hong, “Coherent Block-Based Motion Estimation for Motion-Compensated Frame Rate Up-Conversion”, International Conference on Consumer Electronics, Jan.2010.

[9] Bjontegaard, G, “Calculation of average PSNR difference between RD curves”,VCEG-M33,(2001).

Fig. 4. RD Curves for Video Sequences with QP 37,32,27,22.

Table 2: Bjontegaard Delta Results for TZS algorithm with Different Patterns

Sequence BD-PSNR-Y (dB) BD-BitRate (%)

TZSD vs. TZSH

TZSD vs. TZSRH1

TZSD vs. TZSRH2

TZSD vs. TZSET-RH1

TZSD vs. TZSH

TZSD vs. TZSRH1

TZSD vs. TZSRH2

TZSD vs. TZSET-RH1

BasketballPass_416x240 �� RaceHorses_416x240 ��

BQMall_832x480 �� PartyScene_832x480 ��

BasketBallDrive_1920x1080 �� Cactus_1920x1080 ��

��

Documents

[IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on