33
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power-Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad Shafique, Jörg Henkel Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), Germany Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014

Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

Embed Size (px)

Citation preview

Page 1: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power-Efficient Workload

Balancing

Muhammad Usman Karim Khan, Muhammad Shafique, Jörg Henkel

Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), Germany

Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014

Page 2: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

2

Outline

Introduction Analysis of HEVC Encoder Proposed method Experimental Result Conclusion

Page 3: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

3

Outline

Introduction Analysis of HEVC Encoder Proposed method Experimental Result Conclusion

Page 4: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

4

Introduction(1/4)

By aiming at 50% bit-rate reduction and preserving the same subjective video quality as H.264, HEVC has become a prime candidate to replace H.264 encoders .

This gain in compression efficiency comes at a high cost of computational complexity due to the inclusion of numerous additional encoding tools.

Page 5: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

5

Introduction(2/4)

In real-world video encoding systems, the video must be compressed under tight constraints of time budget and output bit-rate.

The additional tools and timing-constraints give rise to several challenges for implementing a HEVC system on a hardware platform.

Page 6: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

6

Introduction(3/4)

By distributing the workload of HEVC encoder on multiple cores, the total encoding time can be reduced that may potentially improve the overall energy efficiency.

HEVC standard allows exploiting parallel encoding tools, like slices and video tiles to fulfill these requirements.

Page 7: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

7

Introduction(4/4)

HEVC video encoder on a many-core system must exploit the changing workload to reduce the total power consumption while meeting the quality of service demands.

The power consumption can be dynamically reduced by using a workload-driven operating frequency adaptation scheme.

Page 8: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

8

Outline

Introduction Analysis of HEVC Encoder Proposed method Experimental Result Conclusion

Page 9: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

Analysis of HEVC Encoder(1/3)

Tiles are at the lowest level of coding hierarchy. Therefore, they will consume least system memory, and hence, will be fastest among other parallelisms.

Unlike slices, tiles do not have their associated headers. Thus, tiles exhibit the potential to provide relatively better output video quality compared to slices.

9

Page 10: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

Analysis of HEVC Encoder(2/3)

10

Page 11: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

Analysis of HEVC Encoder(3/3)

A single tile per frame generates the best quality. For T2 tiles per frame, T×T tile results in the best video

quality[24]. The total number of tiles within a slice must be minimized

and the total tile-rows and tile-columns within a slice must be equal.

11

• [24] C. Chi et al., “Improving the parallelization efficiency of HEVC decoding,” in ICIP, pp. 213–216, 2012.

Page 12: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

12

Outline

Introduction Analysis of HEVC Encoder Proposed method Experimental Result Conclusion

Page 13: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

13

Proposed method(1/2)

Workload Estimation Select the tile structure and the maximum workload of each

core. considering operating frequency, total number of cores and

frames per second. Workload Allocator

Allocating workload to each core by utilizing user’s tolerance of the output bit-rate.

Workload Manager Managing the workload by adapting the operating frequency

of each core in order to reduce power consumption.

Page 14: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

14

Proposed method(2/2)

Page 15: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

15

Proposed method – Workload Estimation(1/3)

Tile Formation and Maximum Workload Estimation Number of cores is determined to distribute the

HEVC-Intra application’s workload. Adjust the number of Intra directions to curtail the

computational complexity.

Page 16: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

16

Proposed method – Workload Estimation(2/3)

Workload is given by:

(2)

• : the total number of cycles consumed per second for the given tile.• k : number of tiles.• fp : frame rate.

• Nk : total number of CTUs of tile k.• : number of cycles consumed by a CTU of a frame with K tiles, with the

given θ and QP values .

Page 17: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

17

Proposed method – Workload Estimation(3/3)

Page 18: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

18

Proposed method – Workload Allocator(1/6)

Workload Allocator For workload balancing, an adaptation interval is

defined.

Page 19: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

19

Proposed method – Workload Allocator(2/6)

The starting tile of this interval is always a fully searched tile (θ = θinit,k) to achieve best compression.

Workload of tile in the future frames is gradually adjusted down (θ ≤ θinit,k) to reduce workload and power consumption.

If the total number of compressed bytes for the current NKT increases beyond a certain threshold, we increase θ, thereby increasing the workload.

Page 20: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

20

Proposed method – Workload Allocator(3/6)

The threshold is set statistically using the following equation:

(3)

• B : total number of compressed bytes. • μ : the average bit-rate.• υ : the variance of B.

Page 21: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

21

Proposed method – Workload Allocator(4/6)

If a certain number of frames have been processed or B exceeds a threshold, adaptation and KT insertion is required.

(4)

• : equals the bit-rate of KT.• n : a user-defined parameter for tolerance in the bit-rate parameter.

Page 22: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

22

Proposed method – Workload Allocator(5/6)

For every CTU, if threshold in equation 4 is satisfied, θ is adjusted as:

(5)

• u : a user defined parameter.

Page 23: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

23

Proposed method – Workload Allocator(6/6)

We can estimate the total cycles consumed per CTU:

(6)

Page 24: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

24

Proposed method – Workload Manager(1/3)

Workload Manager Adjusting θ will increase or decrease the workload. The intra prediction mode selected corresponds to

the direction of texture flow. Determine the most probable prediction and θ is

centered on this prediction.

Page 25: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

Proposed method – Workload Manager(2/3)

This prediction/direction is obtained by sorting a histogram created by gradients of each individual pixel [22][26].

We propose a much simpler solution, similar to the one presented in [27].

25

• [22] W. Jiang, H. Ma, Y. Chen, “Gradient based fast mode decision algorithm for intra prediction in HEVC,” in CECNet, pp. 1836–1840, 2012.

• [26] M. Shafique, B. Molkenthin, J. Henkel, “An HVS-based Adaptive Computational Complexity Reduction Scheme for H.264/AVC video encoder using Prognostic Early Mode Exclusion,” in DATE, pp.1713–1718, 2010.

• [27] M. U. K. Khan, J. M. Borrmann, L. Bauer, M. Shafique, J. Henkel, “An H.264 Quad-FullHD low-latency intra video encoder,” in DATE, pp.115–120, 2013.

Page 26: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

26

Proposed method – Workload Manager(3/3)

Page 27: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

27

Outline

Introduction Analysis of HEVC Encoder Proposed method Experimental Result Conclusion

Page 28: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

28

Experimental Result(1/4)

We have developed a C++ based multi-threaded HEVC Intra-encoder in our lab.

With 1-tile (single thread) configuration, our software is ~13 faster than HM-9.2 reference software for full-HD (1920*1080) video sequences.

Hardware platform simulation is performed via the Sniper many-core simulator [30].

• [30] T.E. Carlson, W. Heirman, L. Eeckhout, “Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation,” in SC, pp. 1–12, 2011.

Page 29: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

29

Experimental Result(2/4)

Page 30: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

30

Experimental Result(3/4)

Page 31: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

31

Experimental Result(4/4)

Page 32: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

32

Outline

Introduction Analysis of HEVC Encoder Proposed method Experimental Result Conclusion

Page 33: Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad

33

Conclusion

A novel software architecture of HEVC-Intra encoding with run-time power-efficient workload balancing on many-core systems is presented.

This adjusted workload is used to adapt operating frequency, thereby reducing the power consumption of the many-core system.