39
1 H.264 Advanced Video CODEC IC-SOC, Hawaii, November 22, 2004 Youn-Long Lin Department of Computer Science National Tsing Hua University

H.264 Advanced Video CODEC - UCLAcadlab.cs.ucla.edu/icsoc/protected-dir/Nov2004Presentations/YLin.pdf · Para mem VLC Picture manage Software ... mode only ) Cb (8 x 8) Cr (8 x 8)

  • Upload
    haanh

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

1

H.264 Advanced Video CODEC

IC-SOC, Hawaii, November 22, 2004

Youn-Long LinDepartment of Computer Science

National Tsing Hua University

2

Video Coding FlowColor

Space Conversion

EntropyCoding

TransformationTo Frequency

Domain

Quantization

Prediction(Redundancy

Reduction)

RGB YUV Residual

Coefficients

LargeCoefficients

Bit Stream

3

Video Coding StandardsMPEG-2 Video:

big success in digital tv, dvd, ...Designed for high bit-rates > 1.5 Mbit/sNot suitable for wireless applications

H.263/MPEG-4 Video:Designed also for low bit-rates < 100 kbit/sContain some network adaptation featuresSuitable for wireless applications (chosen by 3GPP)

H.26L: New ITU-T Q.6/SG16 (VCEG) projectSuperior design for all bit-rates and error resilienceContains improved network adaptationWill replace H.263/MPEG-4 Video in wireless ?Also called H.264 or MPEG-4 Part 10

4

64kbps ~ 150Mbps64kbps –~2Mbps

2-15 MbpsUp to 1.5 Mbps

Transmission rate

I, P, B, SI, SPI, P, BI, P, BI, P, B, DPicture type

5 frameOne frameOne frameOne frameRef frame

3 profile8 profiles5 profilesNoProfiles

¼ pel¼ pel½ pel½ pelPixel accuracy

16 MVs per MBYesYesYesME, MC

VLC, CAVLC and CABAC

VLCVLCVLCEntropy coding

Increase at the rate of 12.5%

Vector quant.constant increment

constant increment

Quant. step size

4*4 int transformDCT/ WaveletDCTDCTTransform

8*8, 8*16, 16*8, 16*16, 4*8, 8*4, 4*4

16*16, 8*816*8

8*88*8Block size

16*1616*1616*16(frame)16*8(field)

16*16MB size

H.264MPEG-4MPEG-2MPEG-1Standard

5

H.264 Profiles

SP and SI slices

Data partitioning

B slices

Weighted prediction

I slices

CAVLC

Slice Group and ASO

Redundant Slices

Interlace

CABAC

P slices

Baseline profile

Main profile

Extended profile

6

H.264 Decoding Profile

MC

18%

Pic Rec

3%

DF

15%

Intra Pred

10%

IDCT/IQ

10%

Frame Level

Decoding

20%

CABAC

24%

7

H.264 Decoder Block Diagram

CABAC MBinfomem

Coeffmem

MC

Intrapred

IDCT/IQ

Predmem

PicRec

Residualmem

reconstructmem

unfiltermem

MVmem

Ref idxmem

DF

picnummem

refMBmem

Ref framemem

ParamemVLC

Picturemanage

SoftwareHardwareMemory

raw stream

H.264 stream

8

Entropy DecodingH.264/AVC entropy coding methods

VLCCAVLCCABAC

CABAC can save up to 7% of bit-rate in comparison with CAVLC

9

Variable Length Coding

10

Variable Length Coding

Exp-Golomb code is used universally for all symbolsVLC with regular construction

11

VLC decoding

linfo_ue(Exp-Golomb)

syntax = (se(v) | me(v))?

syntax = me(v)?

linfo_cbpinter/intra

bit stream & syntax element

syntax = u(v)?

Code Num

Get VLC length

linfo_se

Yes

No

Yes

YesNo

No

me se ue u_vme_disable = 1

12

Context-based Adaptive Binary Arithmetic Coding (CABAC)

13

CABAC algorithmInit new slice

Initialize C ontext table

G et 2 bytes from bit stream

Initialize codlO ffset, codlR ange

D ecide next syntax

elem ent to be decoded

D ecide context from

N eighbor syntax elem ent

Init decode new m acroblock

N orm al D ecoding process

G et one byte from bit stream

Bypass

D ecoding process

Term inal D ecoding

process

14

Handshaking

Read parameter

start_read_pa end_read_pa

BuildTable

Init_slice end_buld

Decoder

Init_cabacend_cabac

end_slice

Read &write24 macroblockInit_rw

end_rw_macroblock

Read &write24 macroblock

end_rw_coeff

90 clock cycle

15

CABAC Experimental results

Gate count 138,226 gates200MHz based on TSMC 0.13 µmstandard cell library 2 to 3 cycles to generate one bit of data.Sufficient to decode main profile CIF video stream at 30 fps

16

Inverse Quantization and Inverse Transform

17

Index of Sub_Blocks

15141110

131298

7632

541 0

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

2120

1918

2524

232200

10

01

11

00

10

01

11

00

10

20

30

01

11

21

31

02

12

22

32

03

13

23

33

-1

00

10

01

11

00

10

01

11

16 17

Luma (16x16)

Scanning order of residual blocks within a macroblock (16x16 pixels)

(16x16 Intra mode only )

Cb(8 x 8)

Cr (8 x 8)

18

16-Way Parallel Inverse Transc00c02

c01c03

c10c12

c11c13

c20c22

c21c23

c30c32

c31c33

++++++++++++++++

1/2

1/2

1/2

1/2

1/2

1/2

1/2

1/2

reorder

c00c20

c10c30

c01c21

c11c31

c02c22

c12c32

c03c23

c13c33

1/2

1/2

1/2

1/2

1/2

1/2

1/2

1/2

row by row column by column++++++++++++++++

++++++++++++++++

19

Synthesis Report

Synthesized using the Synopsys Design Compiler with TSMC 0.13μm standard cell library

17.7 mWPower

Consumption(Dynamic Power)

53,623Gate Count

100 MHzFrequencyModule “itrans & rescale”

20

Intra-Frame Prediction

21

4x4 Luma Prediction Mode

22

16x16 Luma Prediction Mode

23

Inter-Frame Prediction(Motion Compensation)

24

H.264 Motion Compensation

Variable block size :

Macroblocktypes

Sub macroblocktypes

16x16 16x8 8x16 8x8

8x8 8x4 4x8 4x4

25

Block Size Tradeoff

Large Block Fewer MVsBig residuals

Small BlockMore MVsLittle residuals

26

6-tape Filter Block Diagramx x x x x xreg

reg

reg

reg

reg

reg

reg reg reg reg reg reg

+ + +reg reg reg

+

+

coeff1 coeff2 coeff3 coeff4 coeff5 coeff6

pixel_1

pixel_2

pixel_3

pixel_4

pixel_5

pixel_6

27

Pipeline SchedulingMotion vector

generationGet reference

FrameInterpolation

Weight prediction

WriteMV & RefIdx

Memory

ReadMV & RefIdx

Memory

Picture ordering

Motion vectorgeneration

Get referenceFrame

WriteMV & RefIdx

Memory

ReadMV & RefIdx

Memory

Motion vectorgeneration

Get referenceFrame

InterpolationWeight prediction

WriteMV & RefIdx

Memory

ReadMV & RefIdx

Memory

Motion vectorgeneration

WriteMV & RefIdx

Memory

ReadMV & RefIdx

Memory

Motion vectorgeneration

ReadMV & RefIdx

Memory

Motion vectorgeneration

28

MC Memory UsageGlobal memory

Parameter information memoryCABAC memoryReference frame memory

Local memory ( total 26,112 bits )mv memory ( 2 x 16 x 23 x 16 = 11,776 bits)Reference pixel memory ( 2 x 8 x 16 x 16 = 4,096 bits)Down pixel memory ( 2 x 8 x 3 x 16 = 768 bits)Left pixel memory ( 2 x 8 x 2 x 16 = 512 bits)Right pixel memory ( 2 x 8 x 3 x 16 = 768 bits)Temporal pixel memory ( 2 x 8 x 16 x 16 = 4,096 bits)Predict pixel memory ( 2 x 8 x 16 x 16 = 4,096 bits)

29

MC Synthesis Report

Synthesized using the Synopsys Design Compiler withTSMC 0.13μm standard cell library

156 mWPower Consumption

78,056Gate Count

100 MHzFrequency

30

Deblocking Filter

31

Deblocking Filter Introduction

Deblocking filter can achieve up to 9% bit-rate saving without degrading video quality

Reference from EBU Technical Review

32

Deblocking Filter Algorithm

The deblocking filter process can be divided into

horizontal filter across vertical edges,vertical filter across horizontal edges

33

Horizontal Filter across Vertical Edges

Macroblock N

16 8

168

8

8

0123456789101112131415

20212223242526272829303132333435

16171819

40414243444546474849505152535455

36373839

60616263646566676869707172737475

56575859

80818283848586878889909192939495

76777879

Macroblock N

34

Vertical Filter across Horizontal Edges

16 8

168

8

8

Macroblock N

0123456789101112131415

20212223242526272829303132333435

16171819

40414243444546474849505152535455

36373839

60616263646566676869707172737475

56575859

80818283848586878889909192939495

76777879

Macroblock N

35

H.264 Decoder Integration

36

Integration OverviewMB decoder IP for H.264/AVCIP integration

CABACMC, Intra PredictionIDCT/IQPicture ReconstructionDFMain Controller.

AMBA interfaceFPGA prototyping, HW/SW Co-VerificationMain Profile; CIF 30fps

37

MB Decoder FSMStartSlice

MB_type

CABAC,I_decode

CABAC,PB_decode

Last_MB?MB_type MB_type

PB_decode

I_decode

Slicedone

PB_MB

CABAC

I_MB

I_MB

PB_MB

NO Yes

End_MB

End_MB

End_MB

End_MB

38

Current Progress

IP integrationMain Controller, Deblocking FilterPicture Reconstruction

HW/SW Co-Simulation for reference software and deblocking filter FPGA prototyping for CABAC

39

Future Work

IP integrationAMBA interfaceHW/SW Co-VerificationAn AMBA-Compliant MB decoder for H.264/AVC