1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection...

Preview:

Citation preview

1

Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection

Bongsoo Jung, Byeungwoo Jeon

Journal of Visual Communication and Image Representation 2008

2

Outline

Introduction Complexity Analysis Method

Pre Macroblock Mode Selection Adaptive Slice-level Parallelism

Experimental Results Conclusions

3

Introduction

H.264/AVC achieves high coding efficiency Variable block size, multiple reference frame,

quarter-pel motion vector accuracy,etc. High computational complexity

Complexity reduction algorithm Parallel processing

4

Introduction

GOP level Simple but high latency

Frame level Keep coding efficiency, but the dependence am

ong frames limits the thread scalability Slice level

Encode independently but less coding efficiency Macroblock level

High dependency

5

Introduction

MBs in a slice may not have similar computational complexity. Unnecessary extra waiting time in some thr

eads.

slice 0

slice 1

slice 2

slice 3

slice 4

slice 5

slice 6

slice 7

Encoding time

PU0

PU1

PU2

PU3

PU4

PU5

PU6

PU7

6

Main Purpose

Objective Using parallel algorithm to speed up

H.264/AVC encoder Maximize the parallelism efficiency by

distributing the workload equally. Method

Pre processing: Fast MB mode selection Adaptive slice-level parallelism

7

Complexity Analysis

Inter prediction mode of MBs in H.264 Intra prediction mode: 4*4, 16*16

8

Complexity Analysis

The run-time complexity of the H.264/AVC encoder Pentium IV 2.4GHz Foreman_CIF with IPPP structure

9

Pre Macroblock Mode SelectionOverview

Why? High computational complexity of ME in

variable block size Remove unnecessary ME block size and RD

calculation of intra prediction mode This removal leads to

Complexity reduction Workload balancing among slices

10

Pre Macroblock Mode SelectionInter MB mode selection

MC block sizes in video sequence Foreground region : 8*8 or smaller Non-moving region : 16*16

High temporal correlation Check consistency history of block size 16*

16 and zero MV Two measurements

Zero motion consistency (ZMC) Large block consistency (LBC)

11

Pre Macroblock Mode SelectionInter MB mode selection

Zero Motion Consistency (ZMC) Indicates how long a specified block has had

a zero MV consecutively

When a block is encoded in intra mode ZMC is set to 0

t : frame index , ZMC0 = 0,

(n,m;i,j) indicates a 4*4 block at (n,m)

within a MB (i,j)

high value of ZMC

high prob. of belonging

to background region

12

Pre Macroblock Mode SelectionInter MB mode selection

Zero Motion Consistency Score Indicates how likely a MB being a stationary

region

TMOTION : A threshold value

13

Pre Macroblock Mode SelectionInter MB mode selection

Large Block Consistency (LBC) Indicates the number of continuous frames h

aving a 16*16 MC block size at (i,j)th MB

When a block is encoded in intra mode LBC is set to 0

bestModet(i,j) : The best MB mode of the (i,j) MB in tth

frame

LBC0 = 0

14

Pre Macroblock Mode SelectionInter MB mode selection

Large Block Consistency Score Indicates how likely a MB being partitioned in

16*16

TMODE1 ,TMODE2 : Threshold values used to make the

assessment of the LBC

15

Pre Macroblock Mode SelectionInter MB mode selection

A illustration of LBCS

16

Pre Macroblock Mode SelectionInter MB mode selection

Conditional probability of MB modes given ZMCS = High

The other block sizes are very unlikely to appear (less than about 0.04)

Early detect SKIP and P16*16 mode

TMotion = 4

17

Pre Macroblock Mode SelectionInter MB mode selection

Joint conditional probability of given LBCS with ZMCS = Low

A: LBCS = High, B: LBCS = Medium, C: LBCS = Low

TMODE1 = 1, TMODE2 = 4

18

Pre Macroblock Mode SelectionPre selective intra mode selection

High computational load of computing RD costs of intra mode

Comparing temporal correlation with spatial correlation of the current MB prior to frame coding

19

Pre Macroblock Mode SelectionSelective intra mode selection

Mean Absolute Temporal Difference

Mean Absolute Spatial Difference

cx,y : Pixel values at location (x,y) of MB in current frame

rx,y : Pixel values at location (x,y) of MB in previous frame

X, Y : Horizontal and vertical dimensions of a MB

MASDH : The MASD between horizontally

neighboring pixels

MASDV : The MASD between vertically

neighboring pixels

20

Pre Macroblock Mode SelectionSelective intra mode selection

Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes

A larger w makes skipping intra mode search easier

A smaller QP will incur more intra modes than a larger QP

w: Weighting factor, currently is set to 0.6

More temporally correlated than spatially correlated

21

Pre Macroblock Mode SelectionMB mode classfication

Decision table of candidate MB mode

A block diagram of MB selection

22

Adaptive Slice-level ParallelismOverview

Characteristic Easy to implement Lower overhead of inter communication a

mong processor unit Good scalability Increase bitrate

Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits

Hard to decide a slice boundary prior toencoding

23

Adaptive Slice-level ParallelismFixed MB assignment

The number of consecutive MBs in each slice

L : The number of processor units on a multi-core system

M : The total number of MBs in a frame i : Slice index

Example : number of processing unit L = 8, sequence resolution

is CIF (352*288), M = 22*18 = 396

We can assign about 49 MBs to each slice

24

Adaptive Slice-level ParallelismFixed MB assignment

The scheduling of slice-level parallelism in eight processor units

slice 0

slice 1

slice 2

slice 3

slice 4

slice 5

slice 6

slice 7

Encoding time

PU0

PU1

PU2

PU3

PU4

PU5

PU6

PU7

slice 0

slice 1

slice 2

slice 3

slice 4

slice 5

slice 6

slice 7

Encoding time

PU0

PU1

PU2

PU3

PU4

PU5

PU6

PU7

Ideal case Practical case

Bottleneck

25

Adaptive Slice-level ParallelismFixed MB assignment

The imbalance of computational load distribution

Exhaustive Search Method Fast ME / Fast Mode Search

26

Adaptive Slice-level ParallelismFixed MB assignment

Computational load for encoding one frame in slice level parallelism

Computation load of the tth frame by a single processor system

Ctslice(i) : The computational load of ith slice in tth frame

L : Number of slice in a frame

27

Adaptive Slice-level ParallelismFixed MB assignment

The speedup of multiprocessor system over a single processor system

To achieve the maximum speedup Computation loads of each slice should be

as similar as possible Adaptive slice partition method

28

Adaptive Slice-level ParallelismComplexity estimation model

A simple estimation method by utilizing the result of fast MB mode selection

Define the group value g corresponding to the candidate MB modes

29

Adaptive Slice-level ParallelismComplexity estimation model

Complexity model

Ck,CHKIntra(g) : Complexity cost of the kth MB

g : Group index

einter : Estimated complexity cost of inter mode in g = 1

eintra : Complexity cost according to the intra mode check

in g = 1

α1, α2, α3, β1 β2 β3 : Weighting values of complexity cost

30

Adaptive Slice-level ParallelismComplexity estimation model

Relative computational load

4,5.28

3, 3.12

2,2.42

1, 1

)(

33

22

11

0,

gee

gee

gee

gee

gC

IntraInter

IntraInter

IntraInter

IntraInter

IntraCHKk

CHKintra = 0

CHKintra = 1

Assume einter = 1, eintra = 0

α1=2.42, α2=3.12,α3=5.28

4,9.48

3, 7.23

2,.486

1,97.4

)(

33

22

11

1,

gee

gee

gee

gee

gC

IntraInter

IntraInter

IntraInter

IntraInter

IntraCHKk

β1=0.82, β2=0.83, β3=0.84

Assume einter = 1, eintra = 3.97

31

Adaptive Slice-level ParallelismAdaptive MB assignment

The total computational load at the tth frame

Ideal computational load of each slice for the uniform workload distribution

1

0, )(

~ M

kIntraCHKk

t gCC

L

CC

ttslice

~~

32

Adaptive Slice-level ParallelismAdaptive MB assignment

MB assignment of slice

Much better than fixed MB assignment in each slice

33

Adaptive Slice-level ParallelismAdaptive MB assignment

Entire block diagram

34

Experimental ResultsOverview

Performance comparison between proposed MB mode decision and the conventional method

Comparing adaptive slice-level parallelism with fixed slice-level parallelism

35

Experimental ResultsMB mode selection

Average encoding time saving AST[%]

BDPSNR and BDBR are used to measure the performance against FULL_1Slice

FULL_1Slice : Exhaustive methodFMD_1Slice : Fast MB mode search method

36

Experimental ResultsRate distortion curves

37

Experimental Results

R-D performance compared to one slice per frame (FMD_1Slice)

38

Experimental ResultsRate distortion curves

39

Experimental ResultsSlice-level parallelism

Comparing adaptive and fixed slice level parallelism

Speedup

meOverheadTiisliceEncTimeMAX

SliceFMDEncTimeSpeedup

FixedFMDiFixedFMD

_

_

)1_(

meOverheadTiisliceEncTimeMAX

SliceFMDEncTimeSpeedup

AdaptiveFMDiAdaptiveFMD

_

_

)1_(

Encoding time of one slice per frame

by a single processor system

The longest encoding time of a slice using fixed mode

The longest encoding time of a slice using adaptive mode

40

Experimental ResultsSpeedup

41

Conclusions

Proposed a fast MB mode selection using consistency history of block size and a zero MV

Proposed a intra mode selection by comparing the correlation

Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder

42

Reference

Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March 2003.

B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003.

I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.

Recommended