22
Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International Conference on Visual Communications and Image Processing (VCIP) 1

Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

Embed Size (px)

Citation preview

Page 1: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

1

Complexity Model Based Load-balancing Algorithm For Parallel Tools Of HEVC

Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han

2013 IEEE International Conference on Visual Communications and Image Processing (VCIP)

Page 2: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

2

Outline

• Introduction• Related Work• Proposed Method• Experimental Results• Conclusion

Page 3: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

3

Introduction

• Demand for new video coding standards has been increasing due to recent expansion of digital broadcasting services and the advent of various multimedia devices.

• Newly supported coding tools cause not only high coding efficiency but also high computational complexity caused from decision process for the diverse modes.

Page 4: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

4

Cont.

• Some studies on parallel processing methods as well as fast mode decision algorithms for HEVC fast encoder are considered to be one of key part in progress.

• In this paper, parallel processing methods using slice and tile tools supported by HEVC is introduced and load-balancing algorithm which enhances slice and tile parallel processing is proposed in this paper.

Page 5: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

5

Related Work• A few parallel tools are adopted in the HEVC

main profile and key tools for parallel processing are tile [5] and wave-front parallel processing (WPP) [6].

• Parallel method– Tile– Entropy slice– WPP(Wavefront parallel processing)

• [5] A. Fuldseth, M. Horowitz, S. Xu, A. Gegall, and M. Zhou, "Tiles," ITU-T/ISO/IEC JCT-VC doc., JCTVCE196, Mar. 2011.

• [6] F. Henry and S. Pateux, "Wavefront parallel processing," ITU-T/ISO/IEC JCT-VC doc., JCTVCE196, Mar. 2011.

Page 6: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

6

Cont.

(a) Tile

(b) Entropy slice

(c) WPP

Page 7: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

7

Cont.

• To select suitable parallel options, several factors such as encoding time saving, coding efficiency decrease, and extensibility for the number of processing cores should be considered.

• Coding efficiency decrease is also one of the most important factors in adopting parallel processing.

Page 8: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

8

Cont.

• Data-level parallelism can be applied to the frame-, slice-, tile-, or coding unit-level according to the parallelization methods.

• Number of non-referenced B frames in IBBP coding structures significantly impacts on coding efficiency and restricts extensibility of processing cores.

Page 9: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

9

Cont.

• Extensibility of the number of processing cores is the highest and coding efficiency loss is also the smallest when using WPP.

• However, it is hard to expect a large encoding time saving with WPP due to restricted data dependency.

• Generally, increase of the number of slices and tiles impacts on bitrate much for low resolution sequences, but increase of the number of slices and tiles does not influence on bitrate much for high resolution sequences.

Page 10: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

10

Proposed Method• To resolve high computational complexity of

HEVC encoder, various technical contributions on early termination methods and fast mode decision algorithms are adopted for the reference software[7][8].

• However, it is not easy to achieve a real-time encoder with only the fast algorithms.

• Computational load should be balanced among core.• [7] R. H. Gweon, Y.-L. Lee, and J. Lim, "Early termination of CU encoding to reduce HEVC

complexity," ITUT/ ISO/IEC JCT-VC doc., JCTVC-F045, July 2011.• [8] K. Choi and E. S. Jang, "Coding tree pruning based CU early termination," ITU-T/ISO/IEC JCT-

VC doc., JCTVC-F092, July 2011.

Page 11: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

11

Complexity Model For HEVC Encoder• For slice and tile tools, the number of CTU

should be determined earlier than actual encoding with complexity prediction.

𝐶𝐻𝐾 (𝑠 ,𝑚|𝑙 )={1, 𝑖𝑓 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑(𝑠 ,𝑚∨𝑙)0 , h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒

𝐶𝐶𝑖 (𝑙 )=∑𝑠𝜖𝑆

∑𝑚𝜖𝑀

𝐶𝐸𝑀 (𝑠 ,𝑚)×𝐶𝐻𝐾 (𝑠 ,𝑚∨𝑙)

𝑆= {𝑠|64×64 ,32×32 ,16×16 ,8×8 }m | MERGE, INTER, INTRA}

(1)

Page 12: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

12

Cont.𝑅 (𝑠 ,𝑚 )=𝑟 (𝑠 ,𝑚)×2𝑤(𝐶𝑇𝑈 )/𝑤(𝑠 )

𝐶𝐸𝑀 (𝑠 ,𝑚 )=𝑅(𝑠 ,𝑚)×𝑁𝐹

• R(s, m) : complexity per unit.• r(s, m) : complexity ratio of each CU

size and mode.• w(s) : width of CU size.• NF : a normalization factor for fixed-

point operation.

Page 13: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

13

Cont.

• The proposed complexity model for HEVC encoder is evaluated with the Pearson product moment correlation with HEVC common test sequences under the HEVC common test conditions.

Page 14: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

14

Cont.

• Pearson product-moment correlation coefficient is a measure of the linear correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.

𝜌𝑋 ,𝑌=𝑐𝑜𝑣(𝑋 ,𝑌 )𝜎 𝑋𝜎 𝑌

=𝐸[ (𝑋−𝜇𝑋)(𝑌 −𝜇𝑌 )]

𝜎 𝑋𝜎𝑌

Page 15: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

15

Complexity Model Based Load-balancing Algorithm For Parallel Tools Of HEVC

• Number of CTUs for each temporal level slice

(2)

(3)

𝑆𝐶𝑖 (𝑘 )= ∑𝑙=0

𝐿 (𝑘 )−𝑙

𝐶𝐶𝑖(𝑙)

• L(k) : the number of CTUs assigned to k-th slice.• i : frame index.• j : temporal layer id.• k : slice number.• N is the number of slices in a frame.• CTUinFrame is the number of CTUs in the frame.

Page 16: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

Cont.• Number of CTUs are assigned to each tile for a

temporal layer with column and row offsets for load-balancing for tile-level parallel processing.

16

(4)

(5)

• L(k) : the number of CTUs assigned to k-th tile.• i : frame index.• j : temporal layer id.• k : tile number.• NlnWidth and Nheight : number of tiles composing a frame in horizontal and vertical

directions.• CTUlnWidth and CTUheight : number of CTUs of a tile in horizontal and vertical

directions.

(6)

Page 17: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

17

Cont.

• Control of complexity balancing for a tile-level parallelism is harder than that for a slice-level parallelism because size of tile is determined by only tile width and height not by CTU offset used in load balancing for slice-level parallelism.

Page 18: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

18

Experimental Results• HM 11.0 reference software is utilized.• A PC equipped with the Intel® Core™ i7-3930K CPU and

16GB memory was used for this evaluation. Intel® C++ 64-bit compiler XE 13.0 used in Windows 7 64-bit operating system.

• A frame is partitioned into four slices or tiles for fair evaluation.

• Two fast encoding algorithms, CFM [7] and ECU [8] adopted for HM are employed to evaluate the proposed load-balanced parallelization.

• [7] R. H. Gweon, Y.-L. Lee, and J. Lim, "Early termination of CU encoding to reduce HEVC complexity," ITUT/ ISO/IEC JCT-VC doc., JCTVC-F045, July 2011.

• [8] K. Choi and E. S. Jang, "Coding tree pruning based CU early termination," ITU-T/ISO/IEC JCT-VC doc., JCTVC-F092, July 2011.

Page 19: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

19

Cont.

Page 20: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

20

Cont.

Page 21: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

21

Cont.

Page 22: Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International

22

Conclusion

• To maximize encoding time gain of parallel processing for HEVC encoder, load balance algorithms based complexity prediction model are proposed.

• Average ATS gain of slice-level parallel processing is achieved by 12.05% by adaptively adjusting the number of CTUs. Average ATS gain of tile-level parallel processing is 3.81 %.

• ATS gain obtained by load-balancing algorithm is higher in slice-level than in tile-level parallelism.