FINAL PROJECT - TU Wien · Autor: Maria Salvat Perarnau Director: Markus Rupp Supervisor: Luca Superiori Data: 25 de juliol de 2007 Resum L’objectiu d’aquest treball es el de

xxxxxxxxxxx

FINAL PROJECT

TITLE : Application of SP and SI frames in wireless multimedia communication

TITULATION: Enginyeria Tecnica de Telecomunicacio, especialitat Sistemes deTelecomunicacio

AUTHOR: Maria Salvat Perarnau

DIRECTOR: Markus Rupp

SUPERVISOR: Luca Superiori

DATE: July 25, 2007

Tıtol : Application of SP and SI frames in wireless multimedia communication

Autor: Maria Salvat Perarnau

Director: Markus Rupp

Supervisor: Luca Superiori

Data: 25 de juliol de 2007

Resum

L’objectiu d’aquest treball es el de realitzar un estudi sobre les imatges SP i SI introduıdespel codec H.264/AVC. Estudiarem les seves caracterıstiques, el seu comportament, aixıcom les diferencies amb les sequencies de vıdeo constituıdes unicament per imatges I i P.Aquest dos tipus d’imatges pretenen introduir una millora en aspectes del video streamingcom poden ser random acces, el switching entre diferents bitrates,...

En el primer capıtol farem un repas dels conceptes basics i els aspectes mes importantsde l’estandard H.264/AVC. Comentarem les aplicacions del codec, els profiles que el de-fineixen i els tipus d’imatges que introdueix.

En els tres seguents capıtols estudiarem el comportament d’una sequencia de vıdeo,Foreman, introduint nomes imatges I i P en primer lloc. En segon lloc afegint-hi les imatgesSP, i, finalment, una sequencia amb imatges I, P i SI.

Tot seguit, pasarem al capıtol 5 on buscarem les caracterıstiques mes optimes per acon-seguir els amples de banda utilitzats per la tecnologia UMTS.

En el capıtol 6 ens centrarem en l’estudi del switching entre una sequencia de alta qualitati una de baixa.

Despres, analitzarem els canvis necessaris en el codi de l’encoder per tal de poder real-itzar el switching en uns punts definits.

Finalment, trobarem les conclusions.

Title : Application of SP and SI frames in wireless multimedia communication

Author: Maria Salvat Perarnau

Director: Markus Rupp

Supervisor: Luca Superiori

Date: July 25, 2007

Overview

The objective of this thesis is to make a study about the SP and SI pictures introduced bythe codec H.264/AVC. We are going to study their characteristics, their behavior, and also,the differences between the video streaming sequences only formed by I and P frames.These two types of frames, SP and SI, introduce an improvement in some applications ofthe video streaming as random access and switching between different bitrates.

In the first chapter, there is a resume of the basic concepts and the most important aspectsof the standard H.264/AVC. We are going to talk about the codec applications, its profilesand the images types.

In the next three chapters there is the study of a video sequence behavior, Foreman ,introducing, at first, I and P frames only. Then, we are going to add SP frames, and finally,a sequence with I, P and SI frames.

In the chapter 5 we are going to find the better characteristics to achieve the bandwidthused by the UMTS technology.

In the chapter 6, there is the study of the switching between a high and a low qualitysequences.

Next, we are going to analyze the necessary changes in the encoder code in order to letthe switch happen in defined points.

Finally, we will discuss the conclusions.

To my family and Jordi.

CONTENTS

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

CHAPTER 1. H.264/AVC Overview . . . . . . . . . . . . . . . . . . . . . 3

1.1. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2. H.264/AVC in wireless environments . . . . . . . . . . . . . . . . . . . . . 51.2.1. Transport in Wireless systems . . . . . . . . . . . . . . . . . . . . . 6

1.3. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.1. Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.2. Main . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.3. Extended . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4. Frames types and format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4.1. Frame types definition . . . . . . . . . . . . . . . . . . . . . . . . . 9

CHAPTER 2. I and P frames . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1. Encoding works in baseline profile . . . . . . . . . . . . . . . . . . . . . . 112.1.1. I frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2. P frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2. Standard sequence description . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1. Simulation and results of I and P sequences . . . . . . . . . . . . . 14

2.3. Bitstream switching in baseline profile . . . . . . . . . . . . . . . . . . . . 15

CHAPTER 3. SP frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1. Advantages and disadvantages of using SP frames . . . . . . . . . . . . . 17

3.2. Primary and secondary SP frames . . . . . . . . . . . . . . . . . . . . . . 17

3.3. Encoding and decoding SP frames . . . . . . . . . . . . . . . . . . . . . . 193.3.1. Encoding and decoding process of primary SP frames . . . . . . . . 19

3.3.2. Encoding and decoding process of secondary SP frames . . . . . . 20

3.4. How does an SP frame work? . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5. Simulations and results of bitstream switching . . . . . . . . . . . . . . . 23

3.6. Comparison of I switching and SP switching . . . . . . . . . . . . . . . . . 26

CHAPTER 4. SI frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1. Advantages and disadvantages on using SI frames . . . . . . . . . . . . . 29

4.2. Encoding and decoding process of SI frames . . . . . . . . . . . . . . . . 29

4.3. How does an SI frame work? . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4. Simulation and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

CHAPTER 5. Simulation Scenario . . . . . . . . . . . . . . . . . . . . . 35

5.1. 44 Kbps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2. 105 Kbps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3. 360 Kbps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.4. Comparison of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

CHAPTER 6. Switching simulations . . . . . . . . . . . . . . . . . . . . 43

6.1. High and Low Quality Simulation . . . . . . . . . . . . . . . . . . . . . . . 436.1.1. Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.2. Switching simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

CHAPTER 7. Code improvements . . . . . . . . . . . . . . . . . . . . . 49

7.1. Original code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2. Modified code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

CHAPTER 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 55

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

APPENDIX A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.1. Abbrebiations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.2. Table of results for the simulation scenario . . . . . . . . . . . . . . . . . 61

APPENDIX B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

B.1. Configuration file: encoder.cfg . . . . . . . . . . . . . . . . . . . . . . . . 65

B.2. Configuration file: decoder.cfg . . . . . . . . . . . . . . . . . . . . . . . . 74

LIST OF FIGURES

1.1 Standardization Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 H.264/AVC standard in transport environment. . . . . . . . . . . . . . . . . . . 61.3 H.264/AVC Profiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Subdivision of a frame into slices. . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Spatial Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Full macroblock prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Multi-frame motion compensation. . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Standard Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Relation between I & P frames and the QP. . . . . . . . . . . . . . . . . . . . 142.6 On the right, the I frame size vs the GOP size. On the left, the P frame size. . . 152.7 Switching by means of I frames, first step. . . . . . . . . . . . . . . . . . . . . 152.8 Switching by means of I frames, second step. . . . . . . . . . . . . . . . . . . 16

3.1 An SP frame in a stream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Secondary SP frames, SPABn. . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Block diagram of the encoding process of primary SP frames. . . . . . . . . . . 193.4 Block diagram of the decoding process of primary SP frames. . . . . . . . . . . 203.5 Resumed block diagram of the encoding process of secondary SP frames. . . . 213.6 Block diagram of the decoding process of secondary SP frames. . . . . . . . . 213.7 Temporal sequence of switching on the right. The two bitrates on the left. . . . . 223.8 Bitstream switching process, steps 1 and 2. . . . . . . . . . . . . . . . . . . . 223.9 Bitstream switching process, steps 3 and 4. . . . . . . . . . . . . . . . . . . . 233.10The size of I frames versus SP rate, for two different QP, 23 and 37. . . . . . . 233.11The quality of I frames versus SP rate, for two different QP, 23 and 37. . . . . . 243.12The size of P frames versus SP rate, for two different QP, 23 and 37. . . . . . . 243.13The quality of P frames versus SP rate, for two different QP, 23 and 37. . . . . . 253.14The size of SP frames versus SP rate, for two different QP, 23 and 37. . . . . . 253.15Graphic with the size of the three frame types. . . . . . . . . . . . . . . . . . . 263.16Comparison of the qualities between I and SP switching. . . . . . . . . . . . . 273.17Comparison of the bitrates between I and SP switching. . . . . . . . . . . . . . 27

4.1 Resumed block diagram of the encoding process of SI frames. . . . . . . . . . 304.2 Block diagram of the decoding process of secondary SP frames. . . . . . . . . 304.3 Sending the video stream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 What happens without SI frames when an error appears. . . . . . . . . . . . . 314.5 SI frames: the client finds the error and sends a warning. . . . . . . . . . . . . 314.6 SI frames: an SI frame is generated. . . . . . . . . . . . . . . . . . . . . . . . 324.7 SI frames: the error is corrected. . . . . . . . . . . . . . . . . . . . . . . . . . 324.8 I frame size versus SI rate for two different QP, 23 and 38. . . . . . . . . . . . . 334.9 I frame quality versus SI rate for two different QP, 23 and 38. . . . . . . . . . . 334.10P frame size versus SI rate for two different QP, 23 and 38 . . . . . . . . . . . 344.11Comparison graphic of the size of the three frame types. . . . . . . . . . . . . 34

5.1 Videostream visualization of the pair selected, QP 37 and GOP 40. . . . . . . . 36

5.2 Videostream visualization of the pair selected, SP rate of 15 and QP 37. . . . . 375.3 Videostream visualization of the pair selected, SIrate 45 and QP 38. . . . . . . 375.4 Videostream visualization of the pair selected, QP of 30 and a GOP 68. . . . . 385.5 Videostream visualization of the pair selected,SP rate 8 and QP 31. . . . . . . 395.6 Videostream visualization of the pair selected, QP 38 and SI rate 24. . . . . . . 395.7 Videostream visualization of the pair selected QP of 23 and a GOP value of 20. 405.8 Videostream visualization of the pair selected, SP rate 45 and QP 23. . . . . . 415.9 Videostream visualization of the pair selected, QP 23 and SI rate 49. . . . . . . 41

6.1 Comparison graphic of P and SP frames. . . . . . . . . . . . . . . . . . . . . 446.2 Visualization on High and Low quality. . . . . . . . . . . . . . . . . . . . . . . 456.3 Quality comparison between sequences with and without SP frames, for high

and low quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.4 Visualization of the frames, 1st frame of the streaming. . . . . . . . . . . . . . 466.5 Visualization of the frames, 8th frame of the streaming. . . . . . . . . . . . . . 466.6 Text visualization of the sequence. . . . . . . . . . . . . . . . . . . . . . . . . 476.7 Quality of switching simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . 476.8 Comparison of the Qualities. . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.9 Frames of the switching video streaming. . . . . . . . . . . . . . . . . . . . . 48

7.1 encoder.cfg: SP rate variable. . . . . . . . . . . . . . . . . . . . . . . . . . . 497.2 encoder.cfg: change QP variable. . . . . . . . . . . . . . . . . . . . . . . . . 497.3 Encoder code: SetImgType function. . . . . . . . . . . . . . . . . . . . . . . . 507.4 Text file of a switching simulation. . . . . . . . . . . . . . . . . . . . . . . . . 507.5 Example of a file with switching point. . . . . . . . . . . . . . . . . . . . . . . 517.6 Text file of the switching simulation with the modified code. . . . . . . . . . . . 527.7 Function SetImgType modified. . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A.1 Results for I & P sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62A.2 Results for I & P & SP sequences. . . . . . . . . . . . . . . . . . . . . . . . . 63A.3 Results for I & P & SI sequences. . . . . . . . . . . . . . . . . . . . . . . . . 64

LIST OF TABLES

5.1 Results for 44 Kbps by I&P sequence. . . . . . . . . . . . . . . . . . . . . . . 365.2 Results for 44 Kbps by I&P&SP sequence. . . . . . . . . . . . . . . . . . . . . 365.3 Results for 44 Kbps by I&P&SI sequence. . . . . . . . . . . . . . . . . . . . . 375.4 Results for 105 Kbps by I&P sequence. . . . . . . . . . . . . . . . . . . . . . 385.5 Results for 105 Kbps by I&P&SP sequence. . . . . . . . . . . . . . . . . . . . 385.6 Results for 105 Kbps by I&P&SI sequence. . . . . . . . . . . . . . . . . . . . 395.7 Results for 360 Kbps by I&P sequence. . . . . . . . . . . . . . . . . . . . . . 405.8 Results for 360 Kbps by I&P&SP sequence. . . . . . . . . . . . . . . . . . . . 405.9 Results for 360 Kbps by I&P&SI sequence. . . . . . . . . . . . . . . . . . . . 41

6.1 Frame sizes for a high quality simulation. . . . . . . . . . . . . . . . . . . . . 436.2 Frame sizes for a low quality simulation. . . . . . . . . . . . . . . . . . . . . . 436.3 Frame sizes for a low quality simulation. . . . . . . . . . . . . . . . . . . . . . 436.4 Frame sizes for a low quality simulation. . . . . . . . . . . . . . . . . . . . . . 446.5 Pair number definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.6 Frame sizes for a switching simulation. . . . . . . . . . . . . . . . . . . . . . . 46

1

INTRODUCTION

H.264/AVC is the newest coding standard based on hybrid block video compression. Itis the result of the standardization effort of the ISO Moving Pictures Group (MPEG) andITU-T Video Coding Experts Group (VCEG).

The main purpose of this standard is the attempt to improve the compression efficiency ofthe video streaming. Because of this characteristic, H.264/AVC is the best standard to usein wireless systems.

Slow variance due to distance, shadowing, handover, etc. transform the wireless channelin a slowly varying variable-bit-rate channel. As a consequence of these variations of thebitrate, the study of the new frames introduced by the standard, SP and SI frames, is a factto take into account to provide a better switching between the bitrates. It is also importantto introduce other applications such random access, error recovery, etc.

We start this work with the study of a standard sequence formed by I and P frames only.I frame only depends on itself, but P frames depend on the previous frames encoded.The size of the sequence, as in any other sequence type, depends on the QuantizationParameter (QP) and the GOP (Group Of Pictures) value. The GOP value is the space (interms of pictures) between two I frames. So, if there is an error during the transmissionof the stream we have to wait until the next I frame, which can be several seconds later,to continue sending the video without any error. Not to reduce the effects of the errortransmission but to make error propagation smaller, the standard introduces the two newimages types.

In the standard H.264, the bitstream switching can be produced by meanings of I frames orSP frames. SP frames are smaller than I frames for same quality. They provide better bitcharge because we can introduce them more often without increasing the bitstream sizeas the way it is increased by introducing the I frames.There are two types of SP frames,primary and secondary. To each primary SP frame, corresponds a secondary. SecondarySP frames are only introduced when a switching is produced.

SI frames share the instant refresh properties of I frames but are only sent after a frame islost and in a random access. The main advantage of the SI frames is that they do not needany reference frame to be encoded, because they use intra prediction. One of the mostimportant problems introduced by SI frames is that they are bigger even than I frames, sothat makes its use more restricted.

In this thesis we have also analyzed simulation scenario used by UMTS. The reason isthat it is supposed to introduce these frame types in this technology environment.

We also studied bitstream switching because is an important application of SP frames.

Finally, we present our code improvements in order to let the switch happen in definedpoints, rather than each switching period as it was previously implemented.

After this work, we can conclude that SP frames are better than I frames in a switchingscenario. The reason is that the SP switching simulation gives a more constant level of

2 Application of SP and SI frames in wireless multimedia communication

the bitrate values than I switching, which introduces very high peaks. Also, the SP framesintroduce less bits than SI frames.

For the SI frames we can conclude that their use is more restricted because SI framesintroduce more bits than any other frame, they are even bigger than I frames.

H.264/AVC Overview 3

CHAPTER 1. H.264/AVC OVERVIEW

H.264/AVC is the newest coding standard based on hybrid block video compression. Itis the result of the standardization effort of the ISO Moving Pictures Group (MPEG) andITU-T Video Coding Experts Group (VCEG). The ”official” title is Advanced Video Codingfor the MPEG4 and H.264 for the ITU-T, but it is called and known as standard H.264/AVC.

The purposes of the standardization effort are to improve the compression behavior, to de-velop a unique and a simple video coding design and to provide a ”network-friendly” videorepresentation which addresses ”conversational” (telephony) and ”non-conversational” (stor-age, broadcasting or streaming) applications [1]. Together with this purposes, the de-manded services and the popularity of high definition TV have produced the need of ahigher coding efficiency, high quality and high bitrate. For the earlier standards, trans-mission media such as xDSL, UMTS or Cable Modem offer much smaller data rates thanbroadcast channels. Even for DVB-T, there is insufficient spectrum available.

In the past years, the two groups had developed their own standard. MPEG had developedfocusing its achievements in the video storage, while the main target of the VCEG wasthe video streaming. Below, there is a graphic with the developed standards of eachorganization.

Figure 1.1: Standardization Scheme.

In 1991, the ISO Moving Pictures Group introduced the standard MPEG-1. The MPEG-1is the initial standard of video and audio compression, which is used by Video CD (VCD)and includes the popular format of audio compression MP3. The quality obtained by theVCD is similar to a domestic VHS. As an extension of this first standard, in 1994 appearedthe MPEG-2. MPEG-2 is focused on the generic codification of moving pictures and audioinformation. It is generally used for the video and audio compression, which includes:terrestrial TV (DVB-T), satellite TV (DVB-S), cable TV (DVB-C), High Definition TV (HDTV)and, it is also used by SVCD (Super VCD) and DVD (Digital Versatile Disc). After thissecond standard, ISO MPEG designed the standard MPEG-3. MPEG-3 was designed to


be a video compression standard for the High Definition TV (HDTV), but the advantageson the use of MPEG-2 demonstrated that it was possible to achieve similar results with thisearlier standard; so, they did not continue enhancing MPEG-3.

While MPEG was developing its own standards, the ITU-T Video Experts Group devel-oped its own too. They started in 1990 with the developing of H.261. The H.261 is astandard originally designed for transmission over ISDN lines. H.261 supports two imageresolutions, QCIF (Quarter Common Interchange Format) which is 144x176 pixels andCIF (Common Interchange Format) which is 288x352 pixels. After this one, appeared theH.262 standard, which is identical to the one developed for the MPEG for the HDTV andDVD, MPEG-2. The two contents of these standards are exactly the same because itwas developed between the ITU-T and the ISO organizations, as has happened with theH.264/AVC standard. In 1996, after this common standardization, the ITU-T designs thenew standard H.263. H.263 is low-bitrate compressed format standard for videoconferenc-ing. Originally, it was designed as an enhanced standard based on H.261, the previousITU-T standard for video compression, MPEG-1 and MPEG-2.

Once these two organization developed their standards, they agreed to design a newvideo compression standard as the product of a collective partnership effort known asthe Joint Video Team (JVT). The H.264 name follows the ITU-T naming convention, whilethe MPEG-4 AVC name relates to the naming convention in ISO/IEC MPEG. The firstcomplete version of the H.264/AVC was presented in May 2003.

The intention of the organization was to develop a new standard without increasing thecomplexity of design, only enhancing some parts to make it more efficient, and to makeit more compatible with much more applications than the previous standards. The basicfunctional elements are little difference from the previous standards, the important changesin H.264/AVC occur in the details of each functional element [2].

1.1. Applications

The purpose of H.264/AVC is to make it compatible with most of the existing applications,many of them possible with the previous standards.

Next, there is a list with the most important applications defined in Draft ITU-T Recommen-dations [3]:

1. Cable TV on optical networks (CATV).

2. Direct broadcast satellite video services (DBS).

3. Digital subscriber line video services (DSL).

4. Digital terrestrial television broadcasting (DTTB).

5. Interactive Storage Media (ISM): optical disks, etc.

6. Multimedia Mailing (MMM).


7. Multimedia services over packet networks(MSPN).

8. Real-time conversational services (RTC): videoconferences, videophone, etc.

9. Remote video surveillance (RVS).

10. Serial Storage Media (SSM).

The development of H.264/AVC has opened the doors of new markets and industrial op-portunities. Nowadays, many enterprises and companies have introduced the using of thestandard in their own developing [4]. As an illustration of this development, consider thecase of ”mobile TV” for the reception of audio-visual content on cell phones or portable de-vices, presently on the verge of commercial deployment. Several such systems for mobilebroadcasting are currently under consideration:

• Digital Multimedia Broadcasting (DMB) in South Korea.

• Digital Video Broadcasting - Handheld (DVB-H) in Europe and United States ofAmerica.

• Multimedia Broadcasting/Multimedia Service (MBMS)as specified in Release 6 of3GPP.

For these three mobile TV services, the use of H.264/AVC is focused to obtain a bettervideo compression. If they can achieve this better video compression, then,it is possibleto achieve better error robustness and better quality on the transmission systems.

Another field of application is for satellite TV services. Important enterprises and compa-nies of satellite TV distributors have announced deployments of H.264/AVC.

1.2. H.264/AVC in wireless environments

Because of the enhanced video compression efficiency and error resilience features, H.264/AVCis the best standard to use in wireless systems. The troubles caused by the limited band-width over the radio-link require a better video compression which is the main requirementfor a video coding standard to be successful in a mobile environment. The video com-pression performance of the H.264/AVC video coding layer typically provides a significantimprovement. The network-friendly design goal of H.264/AVC is addressed via the net-work abstraction layer that has been developed to transport the coded video data over anyexisting and future networks including wireless systems. This two layers are described insection 1.2.1 Transport in wireless systems.

Slow variance due to distance, shadowing, handover, etc. transform the wireless channelin a slowly varying variable-bit-rate channel. With an appropriate setting of the initial delayand receiver buffer a certain quality of service is guaranteed [5].


The latter techniques such as switching predictive to achieve a channel adaptive streamingare supported by H.264/AVC. As the streaming server is in general aware of the currentchannel bitrate, the transmitter can decide to send one of several pre-encoded versions ofthe same content taking into account the expected channel behavior.

1.2.1. Transport in Wireless systems

This new standard, as in the earlier ones, does not explicitly define a CODEC ( COderDECoder). What is defined is the syntax for encoding a bitstream and the method ofdecoding it.

H.264/AVC is designed in two layers. The first one is the Video Coding Layer (VCL) whichrepresents efficiently the coded video contents. The second one is the Network Abstrac-tion Layer (NAL) which is designed to adapt the format of the video details to the transmis-sion support. It also provides adapted header informations for different transport layers orstorage media [6].

The figure 1.2 depict the standard in transport environment:

Figure 1.2: H.264/AVC standard in transport environment.

1.3. Profiles

Profiles and levels specify conformance points that provide interoperability between en-coder and decoder implementations within applications of the standard and between vari-ous applications that have similar functional requirements.

Currently, in the standard there are defined six profiles, but the most important are: base-line (BP), main (MP) and extended (EP). The other three are: high, high 10 and high 4:2:2


profile. Each one of them contemplates its own characteristics and every one has its ownapplications. Next, there is a figure with the contents of each one (figure 1.3).

All profiles define a set of coding tools or algorithms that can be used to generate a compli-ant bitstream. All decoders complying a specific profile have to support all features in thatprofile. Encoders are not required to make use of any particular set of features supportedin a profile but have to provide conforming bitstreams.

The High profile is used for an 8x8 spatial prediction and transform, the monochromeformat and scaling matrices. The High 10 is used in the case the bit depth is up to 10b.Finally, the High 4:2:2 is applied for 4:2:2 chroma format. Next there is an explanation ofthe three foremost profiles.

Figure 1.3: H.264/AVC Profiles.

1.3.1. Baseline

Baseline profile is, typically considered, the simplest profile and includes the basic featuresof the H.264/AVC standard:

• Only I and P slice types may be present.

• Flexible Macroblock Ordering (FMO).

• Arbitrary Slice Ordering (ASO).

• Redundant Pictures.

• Motion-compensated prediction.

• In-loop deblocking.

• Intra-prediction.

• Context Adaptative Variable Length Coding (CAVLC).

This profile emphasizes coding efficiencies and robustness with low computational com-plexity. The features not supported by the baseline profile are:


• B slices.

• Weighted prediction.

• Picture or macroblock adaptive switching between frame and field coding.

• SP and SI slices

1.3.2. Main

The Main profile emphasizes primarily coding efficiency alone. It typically allows the bestquality at the cost of higher complexity (essentially due to the B-slices and CABAC) anddelay. This second profile contains all the features of the baseline with the exception of:FMO, ASO and Redundant Slices.

It also includes:

• B slices.

• Field coding.

• Weighted prediction.

• Macroblock adaptive frame-field (MBAFF).

• Coding Adaptative Binary Arithmetic Coding (CABAC).

Only a subset of the coded video sequences that are decodable by a Baseline profiledecoder can be decoded by a Main profile decoder.

1.3.3. Extended

The Extended emphasizes robustness and flexibility with high coding efficiency.

This profile is a superset of the Baseline and main profiles supporting all tools in the speci-fication with the exception of CABAC. The SP/SI slices and slice data partitioning tools areincluded only in this profile.

1.4. Frames types and format

In the existing video coding standards, such as MPEG-2, H.263 and MPEG-4, three maintypes of frames are defined: I, P and B slices. H.264/AVC supports these three types andadds two types more which are new: SP and SI slices.


A picture of a video sequence, a frame, is divided into macroblocks. Each macroblock hasa fixed size that cover a rectangular picture area of 16x16 samples of the luma component(brightness) and 8x8 samples of each of the two chroma components.

The macroblocks are organized in slices, which represent regions of a given picture thatcan be decoded independently of each other. Here there is an example:

Figure 1.4: Subdivision of a frame into slices.

To know more about what is the luma and chroma component see [7]

1.4.1. Frame types definition

I or ”Intra” slices are the simplest ones. They are coded using Intra prediction. They arenot referred to any previous slice of the video sequence, they only contain reference fromthemselves. The first frame of a sequence have to be Intra coded. All profiles support thistype.

P or ”Predicted” slices are coded using Inter prediction. Inter prediction creates a pre-diction model from one or more previously encoded video frames. The model is formedby shifting samples in the reference frame(s) (motion compensated prediction). The AVCCODEC uses block-based motion compensation, the same principle adopted by every ma-jor coding standard since H.261. with at least one motion compensated prediction signalper prediction block [8]. All profiles support this type.

B or ”Bi-predicted” slices are coded using Inter prediction with two motion-compensatedprediction signals per prediction block that are combined using a weighted average. Allprofiles except the Baseline profile supports this type.

SP or ”Switching P” slices permit an efficient switching between two different bitstreamscoded at different bitrates, without the large numbers of bits required by I slices. They areonly supported by the Extended profile.

SI or ”Switching I” slices are encoded only using Intra prediction, allow exact match withSP slices for random access or error recovery. They are only supported by the Extendedprofile.


I and P frames 11

CHAPTER 2. I AND P FRAMES

2.1. Encoding works in baseline profile

Nowadays, UMTS allows only the baseline profile reference TS 260. So, at first, we aregoing to start the study with baseline profile but not because we are going to use it.

Baseline profile only accepts I and P frames, so SP and SI frames are not allowed, and weare going to start with and study of the basic frames.

2.1.1. I frames

In contrast with the previous standards, in H.264/AVC intra prediction is always conductedin the spatial domain, while, previously, was in transform domain (frequency domain co-efficients). The idea of spatial prediction is based on the observation that adjacent mac-roblocks tend to have similar textures. It is possible to predict the current macroblock to beencoded by the surrounding macroblocks.

Figure 2.1: Spatial Prediction.

One of the advantages of using spatial prediction is the improvement of the predictedsignal quality and permits to take as reference areas that are not temporally predicted.

There are two basic definitions for Intra prediction:

1. Full macroblock intra prediction (16x16 luma prediction)

2. 4x4 luma prediction

Intra 16x16 luma prediction defines only one spatial prediction scheme for the whole mac-roblock. Pixels may be filled from surrounding macroblocks at the left and the upper edgeusing one of four possible prediction modes. Intra prediction is also performed for thechroma planes using the same range of prediction modes. However, different modes maybe selected for luma and chroma [9]. There are four types for full macroblock prediction:


horizontal, vertical, DC and plane. The best way to understand the four ways is viewing anexample:

Figure 2.2: Full macroblock prediction.

The vertical mode (0) predict the current macroblock to be encoded by the vertical adjacentmacroblocks. The horizontal mode (1) is the same than mode (0) but using the horizontaladjacent macroblocks. The DC mode (2) makes the prediction averaging the values ofthe neighbor macroblocks. Finally, the plane mode (3) is defined by a three-parametercurve-fitting equation, having a brightness, slope in the horizontal direction, and slope inthe vertical direction that approximately matches the neighboring pixels.

2.1.2. P frames

P frames are encoded by using Inter prediction. Inter prediction is based on a creation ofa prediction model from one previously picture. In the previous standards, such MPEG-2and its predecessors was predicted only by using one previously picture to predict the val-ues of the incoming pictures. The standard H.264/AVC gives the freedom of choosing thereference frame among various pictures. The new design is developed to enable efficientcoding by allowing an encoder to select among a larger number of pictures as reference(motion compensation model). These pictures have been decoded and stored at the de-coder. The motion compensation model is formed by shifting samples in the referenceframe.

The new features of motion compensation introduced by H.264/AVC are:

• Multiple reference picture motion compensation: in this new standard, inter predic-tion is based on prediction by using a larger number of pictures (figure 2.3).

• Quarter-sample-accuracy motion compensation: motion compensation describes apicture by the origin section of that picture in a previous sample. The frames arepartitioned in blocks of pixels. The offset accuracy in the previous standards is abouta half of a pixel, but with H.264/AVC is about a quarter of a pixel. The offset betweenthe two areas has 1

4 -pixel resolution.

I and P frames 13

Figure 2.3: Multi-frame motion compensation.

2.2. Standard sequence description

A standard sequence of the H.264/AVC is defined in baseline profile by I and P slices. Ifwe want to introduce the others types of frames we have to change the profile, but we aregoing to see it in the next chapters.

The size of the sequence depends on the Quantization Parameter (QP) and the GOP(Group of Pictures) number.

The QP is a parameter used for determining the quantization of transform coefficients inH.264/AVC. This parameter can take 52 values and allows the changing of the codingsequence quality. The bigger is the QP, the lower is the quality of the video streaming.

The GOP size is the number of frames between two consecutive I frames. It determinesthe error resilience: if we have a bigger GOP it is able to compress better, but the error canpropagate longer. The error is propagated until the next I frame, which starts a new GOP.It can be defined using two variables:

• N+1, which is the number of pictures inside of the GOP, including the I frame.

• N, which is the spacing between I frame.

Here, there is an example:

Figure 2.4: Standard Sequence.


2.2.1. Simulation and results of I and P sequences

In this section, there is a study of I and P frames behavior. We want to analyze thedependency of these frames on the Quantization Parameter and the GOP size. All thesimulations are made with the entire Foreman sequence (400 frames).

As a first analysis, we will discuss the relation between I and P frames size and the QP. Inthis case, the video sequence was simulated with a GOP value of 20.

Figure 2.5: Relation between I & P frames and the QP.

As it was expected, the size of the frames depends on the QP of the encoded streaming.The reason is just a fact of the quality of the resulting frame.The bigger is the QP value,the lower is the quality of the sequence, and vice versa.

Another thing that is very important to remark is, as is obvious in the graphic, I frames aremuch bigger than P frames. It has an easy explanation because the frame size follows thespatial and temporal prediction efficiency. More complex frames require more bits for theirdescription, I frames prediction, while others are described by fewer bits, P frames.

Now, let’s see the results of I and P frames depending on the GOP size. For these simula-tions the QP is fixed in 36.

For the GOP size, if we take a look to our results (figure 2.6), we can notice that I framesdo not depend on the GOP size, they only depend on the characteristics of the I framesselected to be encoded. For the P frames, we realize that there is some kind of depen-dency with the GOP size, the bigger is the GOP size, the bigger is the size of the frames.Prediction is less efficient.

I and P frames 15

Figure 2.6: On the right, the I frame size vs the GOP size. On the left, the P frame size.

2.3. Bitstream switching in baseline profile

In a wireless video streaming system, may be difficult to achieve a guaranteed end-to-endquality of service over the entire streaming period. Datarate in wireless multimedia com-munication channels changes very often, so, to get a better quality of the service for theuser, a change between the transmitted video datarates is needed. Such switching of thechannel characteristics optimizes the use of the radio resources and facilitates providingthe rewired quality of service.

In the previous standards and in baseline profile, perfect bitstream switching is only possi-ble at I frames, mis-match free switchings.

Now, let’s see an example. There are two encoded bitstreams and we want to change fromthe one encoded with high quality to the low.

In the case the switching is necessary, we have to wait until the next I frame because Pframes are predicted by temporal prediction, taking as reference the previous processedframes.

Figure 2.7: Switching by means of I frames, first step.


We start transmitting the highest bitrate sequence until we arrive to the I frame. Once wehave arrived to the I frame, we change from the first sequence to the other sending the Iframe of the lowest bitrate bitstream.

Then we continue transmitting the P frames of the second bitstream.

Figure 2.8: Switching by means of I frames, second step.

The drawback of using I frames in these applications is that they require much larger num-ber of bits than P frames at the same quality. Therefore I switching gives an unacceptablebitrate.

That is why H.264/AVC has introduced the SP and SI frames, in order to improve thequality using less resources.

SP frames 17

CHAPTER 3. SP FRAMES

SP frames are specially-coded frames which enable efficient switching between videostreams and efficient random access for video decoders. They are only permitted in theextended profile.

The difference between SP frames and P frames is that SP frames allow identical frames tobe constructed even when they are predicted using different reference frames. SP framesare smaller than I frames and they are designed to support switching between similarcoded sequences without the increased bitrate, penalty of I slices.

They are classified as secondary SP frames and primary SP frames. For each primary SPframe, a corresponding secondary SP frame is generated, which has the same identicalreconstructed values as the primary, but they are only set during bitstream switching.

3.1. Advantages and disadvantages of using SP frames

The using of SP frames gives some advantages which make the SP frames more efficientand comfortable than other frames for working :

• They require fewer bits than I frames to achieve the same quality. For example, toachieve a quality of 36 dB, an SP frame employs 4’8 Kbits, while the I frame employs24 Kbits.

• It is possible to reconstruct a picture with different reference frames.

• They can be used instead of I frames in switching, fast forward, fast backward, ran-dom access and error resilience and recovery.

In the other hand, they have some disadvantages:

• They are not allowed in baseline profile.

• They are less efficient than normal P pictures: the overall coding efficiency is de-graded if many switching points are assigned.

3.2. Primary and secondary SP frames

SP frames are designed to support bitstream switching. Besides, as it’s shown in the figure3.1, they can be placed in a single bitstream even when there is no foreseen bitstreamswitching.


Figure 3.1: An SP frame in a stream.

There are two types: primary and secondary SP frames.

Now, assume that there are two bitstream of the same video sequence but encoded withdifferent encoding parameters, and we want to switch from the bitstream A to the bitstreamB.

At the switching point there are three SP frames (figure3.2).

Figure 3.2: Secondary SP frames, SPABn.

The first one is SP An which is generate in bitstream A, and its reference is the previousencoded frame of its own bitstream, A’n-1. SP An is going to be the reference of A’n+1. Thesecond SP frame is the frame named as SP Bn. It is encoded in the second bitstream. It isreferred to the previous frame B’n-1 and is going to be the reference of B’n+1. This two SPframes are primary SP frames.

Then, there is the third SP frame, SP ABn, which is a secondary SP frame. This frameis only generated when the switching is proceed. As a reference to encode it, the servertakes a previous frame of the origin bitstream, A’n-1. But, in this case, SP ABnis going to bethe reference for the next frame which belongs to bitstream B, B’n-1.

The secondary SP frames are frames only used when switching from one bitstream toanother. If there is no bitstream switching they are not applied.

It can be resumed as when a switching is needed, from the primary SP frame of the firstbitstream, a secondary SP frame is generated to change the bitrate.

In the next sections, we are going to know exactly how the switching is performed.

SP frames 19

3.3. Encoding and decoding SP frames

In this section there is a description of the encoding and decoding process for primary andsecondary SP frames.

In both case we assume that we are encoding and decoding nonintra blocks in SP. Forintra blocks the process is identical as the I frames.

3.3.1. Encoding and decoding process of primary SP frames

In figure 3.3 there is the block diagram of the encoder for primary SP frames[10].

Figure 3.3: Block diagram of the encoding process of primary SP frames.

The encoding process starts subtracting a motion-compensated version of the last framereconstructed. Then, with the original image, the block P is predicted by motion-compensation.After predicting P, a forward transform is applied and the transform coefficients are quan-tized and dequantized with SPQP as the quantization parameter. The results obtainedafter the quantization are marked in the figure as dpred. After processing the predictedblock P, the encoder substracts the results dpred from the transform coefficients of the orig-inal image, cerr. Then, they are quantized using PQP as quantization parameter and sentto the multiplexer with the motion information (motion vector).

Once, we have seen the encoding process of the primary SP frames, let’s see the decod-ing.

The decoding process follows the same steps as the encoder but in another order (figure


3.4)[10].

Figure 3.4: Block diagram of the decoding process of primary SP frames.

By motion-compensation, a predicted block P is obtained and transformed, cpred. The cpred

results are added to the received inverse quantized error coefficients, crec. The result ofthe addition is quantized and dequantized with the SPQP as the quantization parameter.

The quantization parameter of the inverse quantization, PQP, of the error coefficients isnot necessarily be the same as the quantization parameter used to quantize the additionresult, SPQP.

3.3.2. Encoding and decoding process of secondary SP frames

The secondary SP frames follow the same scheme than primary SP frames in the encodingprocess, but there are two details that make it different.

The first one is that secondary SP frames predict the block P using as the previouslyreconstructed frame the frame of the origin bitstream (bitstream A in the previous sectionof the chapter). And the second one is that the transform coefficients of the predicted blockP are subtracted from the original image of the destination bitstream, bitstream B. Here,there is a simple diagram of the encoding process [11]:

SP frames 21

Figure 3.5: Resumed block diagram of the encoding process of secondary SP frames.

The decoding process is quite different from the decoding process of the primary SPframes.

Let’s see the scheme[10]:

Figure 3.6: Block diagram of the decoding process of secondary SP frames.

If we compare this diagram with the block scheme of the primary SP frames (figure 3.4),we can appreciate two important differences. With the primary SP frames the predictedblock P is transformed and added to the inverse quantized error coefficients. Then, theresult of the addition, cpred, is quantized.

For the secondary SP frames, the predicted block is transformed and quantized before theaddition, lpred. Then, this result is added to the predicted error coefficients without beinginverse quantized. The result of the addition is inverse quantized and inverse transformed.

3.4. How does an SP frame work?

The best way to understand how an SP frame work is by an example.

First of all, in figure 3.7 is drawn the temporal visualization of the bitstream switching thatwe want to do.


Figure 3.7: Temporal sequence of switching on the right. The two bitrates on the left.

In the streaming server,there are two encoded sequences, one of high bitrate, 128 Kbps,and another of low bitrate, 64 Kbps, both of them with a SP rate of 7.

If we want to switch without SP frames, we have to wait until the next I frame, which canbe several seconds later. But if we make it with SP frames we only have to wait for an SPframe, which probably is going to appear and be encoded before an I frame.

Due to the channel conditions we cannot continue using the bitrate of 128 Kbps, so at timet1we have to switch to the lowest bitrate.

We send the coded sequence from the high bitrate stream until a primary SP frame (num-ber one in figure 3.8). This primary SP frame, SP An, is referred to the previous P frameencoded. SP An is going be the reference for the next frame. In the case there is noswitching, SP An is going to be the reference of the next P frame of th 128 Kbps stream.But, when the switching is proceed, from 128 Kbps to 64 Kbps, a secondary SP frame isgenerated, SP ABn (number two in figure 3.8). SP ABnis referred to the primary SP frame,SP An.

Figure 3.8: Bitstream switching process, steps 1 and 2.

The secondary SP frame is going to be the reference for the next P frame in the 64 Kbpsstream (number three in figure 3.9). If we want to switch again, from 64 Kbps to 128 Kbps,we have to do the same way back (number four in figure 3.9). We have to wait until the nextSP frame, SP Bn, which is going to be the reference of the secondary frame generated toswitch, SP BAn.

In the decoding process, when we want to decode a P frame that follows a switching,

SP frames 23

Figure 3.9: Bitstream switching process, steps 3 and 4.

it does not matter which type of SP frame is chosen. We can use both, primary andsecondary SP frames for reconstruct the picture. We need to know which one is used onlyto know if there have been a switching or not.

3.5. Simulations and results of bitstream switching

In this section there is an analysis of the impact of SP frames in the size and the quality ofthe frames in the video sequences. Exactly, we are going to examine the dimension of theframes relying on SP rate and QP.

For these simulations, the GOP size is fixed in 50 and we have used the complete Foremansequence (400 frames).

Let’s see what happens with the I frames in figure 3.10. The graphics represent the sizeof the I frames for two fixed QP, 23 and 37, versus the SP rate.

Figure 3.10: The size of I frames versus SP rate, for two different QP, 23 and 37.

As we can observe on the graphics, there is no relation between the SP rate and the


bit number of the I frames, it does not change if we modify the SP rate. For the qualityhappens exactly the same (figure 3.11).

Figure 3.11: The quality of I frames versus SP rate, for two different QP, 23 and 37.

For P frames the results are quite different (figure 3.12).

Figure 3.12: The size of P frames versus SP rate, for two different QP, 23 and 37.

On the contrary, we have noticed that there is a relation between the SP rate and the sizeof the P frames. The bigger is the SP rate, the lower is the number of bits of the P frames.

Therefore, we can conclude that SP frames degrade the quality of bitstreams (figure 3.13).

SP frames 25

Figure 3.13: The quality of P frames versus SP rate, for two different QP, 23 and 37.

Now, the analysis of SP graphics in figure 3.14.

Figure 3.14: The size of SP frames versus SP rate, for two different QP, 23 and 37.

The size of the SP frames does not depend on the SP rate value, they only depend on thefrequency of the SP frames to be encoded. Of course, if the size does not depend on theSP rate, with the quality happens also the same.

Finally, let’s take a look to the graphic comparing the size of the three types of frames withthe SP rate.

As we know, the size in bits of the SP frames is smaller than the size of I frames, but arebigger than the number of bits of P frames, for a given QP (figure 3.15).

So, we can say that if we are talking about sizes with the reference of the QP, the SP frameshave to be in the middle, between the I frames and P frames size. But, if the reference isthe SP rate the smallest frames are SP frames. For a given QP, with the size of the frames


depending only on the SP rate, the size of the SP frames is the smallest one.

Figure 3.15: Graphic with the size of the three frame types.

3.6. Comparison of I switching and SP switching

In this part of the chapter we are going to compare the results obtained between a SPswitching and an I switching.

What we want to do is a switching between high and low quality sequences. The high qual-ity sequence corresponds to a video streaming with a QP of 28, while the low sequencecorresponds to 49.

As a first analysis, there is the graphic of the quality over time (figure 3.16).

On one hand, as we can observe, with an I switching we can reach higher quality thanwith a SP switching. The two values of qualities, the high and the low, are bigger with anI switching than with SP switching. So, we can conclude that SP switching degrades thequality of the video streaming.

In both cases, the difference between the high and the low quality is between 15 and 16dB.

On the other hand, as we can observe, with an I switching we can reach highest peaks ofbitrate than with a SP switching. But, introducing SP frames the bitrate is less variable, thedifference of the peak values is lower. When the switching take place, the bitrate achievedby the I frame is higher than the bitrate achieved by the SP frame, the difference is about3’25 Kbps (figure 3.17).

We only represent the first 100 frames in both graphics because of its periodicity.

SP frames 27

Figure 3.16: Comparison of the qualities between I and SP switching.

Figure 3.17: Comparison of the bitrates between I and SP switching.


SI frames 29

CHAPTER 4. SI FRAMES

SI frames are frames that allow an exact match with an SP slice for random access or errorrecovery purposes, while using only Intra prediction.

SI frames share the instant refresh properties of I frames but are only sent after a frame islost.

SI frames, in some applications, like, video streaming switching, are used in conjunctionwith SP-frames, but their using gives some advantages in other fields.

An SI frame uses intra prediction as an I-frame and still reconstructs identically the framewhich follows it, the corresponding P frame, which uses motion-compensated prediction.

4.1. Advantages and disadvantages on using SI frames

The main advantage of the SI frames is that they do not need any reference frame to beencoded, because they use intra prediction. This advantage make them very useful insuch applications as:

• Error recovery: if there is some error we can reconstruct the following stream withoutany reference frame.

• Random Access: we can restart the playing from any SP frame, which means thatan SI frame is going to be generated.

Another advantage is that we can use them only when they are needed by a warningmessage from the client.

As disadvantages we can say that their size is much bigger than any other frames, they areeven bigger than I frames. This disadvantage makes them less useful than SP frames ina switching scenario because SP frames can achieve the same results with lower numberof bits.

4.2. Encoding and decoding process of SI frames

In this section there is a description of the encoding and decoding process for SI frames.

As it happens with the secondary SP frames, SP frames predict the block P using as thepreviously reconstructed frame the frame of the origin bitstream (bitstream A), predictedby motion-compensation. But to encode an SI frame, the original image with which theencoder makes the subtraction, frame B, must be intra-predicted[11].


Figure 4.1: Resumed block diagram of the encoding process of SI frames.

Now, let’s see the scheme of the decoder[10]:

Figure 4.2: Block diagram of the decoding process of secondary SP frames.

By intra-prediction, a predicted block P is obtained, transformed and quantized, lpred. Thelpred results are added to the received error coefficients, lerr. The result of the addition,lrec,is dequantized with the SPQP as the quantization parameter, and finally, inverse trans-formed.

4.3. How does an SI frame work?

As we have done with the SP frames, the best way to understand how an SI frame worksis by an example.

There is one video stream to be sent. The streaming server send an encoded sequenceover an RTP (Real Time Protocol) stream to the client.

Now, suppose that an erroneous frame is transmitted. If we were using a standard se-quence, we should wait until the next I frame, which could be several seconds later.

SI frames 31

Figure 4.3: Sending the video stream.

Figure 4.4: What happens without SI frames when an error appears.

But, introducing the SI frame, when an error is found, the client sent to the server an RTCPwith a warning.

Figure 4.5: SI frames: the client finds the error and sends a warning.

Once the server receives the RTCP (Real Time Control Protocol), we only have to wait untilan SP frame. Then, the server, instead of the SP frame, sends an SI frame synchronizedwith the SP frame to continue transmitting correct frames for the rest of the sequence.


Figure 4.6: SI frames: an SI frame is generated.

Even introducing the SI frames, as we have just seen, the server transmits some errorpictures to the client, but, the number of erroneous frames could be much lower than thenumber if we use the standard sequence because we can introduce SP frames with moreperiodicity than I frames. So, the final sequence transmitted from the server to the client isthe one that follows:

Figure 4.7: SI frames: the error is corrected.

4.4. Simulation and results

We have simulate a sequence with SI frames. As we have done with SP frames, we aregoing to analyze the dependency of the size and the quality of the frames with the SI rate.

For these simulations, the GOP size is fixed in 50 and we have used the complete Foremansequence (400 frames).

We start with the I frames. Below there are the graphics of the size of the I frames for twofixed QP, 23 and 38, versus the SP rate.

SI frames 33

Figure 4.8: I frame size versus SI rate for two different QP, 23 and 38.

As it has happened with the SP rate, the I size does not depend on the rate of the SIframes. There is no connexion between them, but as we saw on the I and P sequence file,the size depends on the ”random” selected frame to be I-encoded.

If we take a look to the quality we can conclude exactly the same:

Figure 4.9: I frame quality versus SI rate for two different QP, 23 and 38.

The results for P frames are quite similar with the ones obtained for the SP rate.

There is a relation between the SI rate and the size of the P frames, the bigger is the SPrate, the lower is the number of bits of the P frames.

The SI frames, as it happens with SP frames and their bitrate, do not depend on the ratewhich they are introduced in the encoding process.


Figure 4.10: P frame size versus SI rate for two different QP, 23 and 38

Finally, there is a graphic comparing the sizes of this three types of frames:

Figure 4.11: Comparison graphic of the size of the three frame types.

As we have already said in the section of disadvantages, the sizes of the SI frames are big-ger even than the I frames. In this graphic there is a clear representation of the differencebetween the sizes of the frames.

Simulation Scenario 35

CHAPTER 5. SIMULATION SCENARIO

Usually, the user profile of UMTS is 64, 128 and 384 Kbps, but the bandwidth for the realmultimedia transmission is 44, 105 and 360 Kbps, because of the headers of the packetsand the audio.

We know that the bitrate is a function of SP and SI rate and QP, so we made severalsimulations trying to reach these demanded bitrates. In the case of sequence with only Iand P frames the bitrate is function of the GOP and QP.

The method used for doing this is finding the optimal pair of SP rate/QP. For us, the optimalone, is the one that has the best quality with the bitrate demanded as the threshold.

We have done this study to the three types of sequences which are analyzed in this work:

• I & P frames sequences.

• I & P & SP frames sequences.

• I & P & SI frames sequences.

For all the simulations, the GOP size is 50, except in I and P frames sequence where theGOP size is the variable.

To see all the results of the simulations of this section see Appendix B.

5.1. 44 Kbps

Here, in this section we are going to find the best pairs for each type of sequence to achievethe bitrate demanded: 44 Kbps.

Let’s start with I and P frames sequence. We have defined this bitrate with different pairsof QP-GOP. We have fixed the QP number, and then changing the GOP value until wemanage the appropriate bitrates. We have used the complete Foreman sequence (400frames).

The best pair of QP-GOP is the one which has the best quality without surpassing thethreshold of 44 Kbps. So, in this case, the optimal result is the one achieved by the pairof QP 37 and GOP 40, because the bitrate does not exceed the value of 44 Kbps. Thequality of this pair is 29’91 dB.

Here, there is the table with the pairs which achieve, approximately, the bitrate demanded:


QP GOP Bitrate [Kbps] Y-PSNR [dB] Size [bits]37 40 43,99 29,91 58647237 41 44,06 29,91 58740838 20 44,30 29,34 59068838 21 44,04 29,30 58726438 22 43,11 29,30 57484039 14 43,80 28,81 58400039 15 42,77 28,79 57021640 10 44,38 28,25 59168840 11 42,78 28,23 57039941 8 43,30 27,65 577304

Table 5.1: Results for 44 Kbps by I&P sequence.

Let’s see the video stream:

Figure 5.1: Videostream visualization of the pair selected, QP 37 and GOP 40.

For the I, P and SP frames sequence the method is quite different. We have fixed the GOPsize, to 50, and then we changed the QP value, while also changing the SP rate, until wemanaged the 44 Kbps. We have also used the complete Foreman sequence (400 frames).

With the following pairs of SP rate - QP we have achieved the bitrate of 44 Kbps:

QP SP rate Bitrate [Kbps] Y-PSNR [dB] Size [bits]37 3 45,73 29,34 60970437 5 44,35 29,56 59136037 6 43,95 29,61 58597637 7 43,78 29,64 58371237 10 43,46 29,73 58080037 15 43,14 29,79 575184

Table 5.2: Results for 44 Kbps by I&P&SP sequence.

Like we have done with the I and P sequence, we are going to choose the best pair for thebitrate. To choose the better we have to take into account that the bitrate is not bigger than44 Kbps, and the best quality is achieved.

Taking a look to the table we can say that the best pair is the one with a SP rate of 15 andQP 37 . The quality obtained is 29’79 dB. It is not the highest Bitrate, but the quality ismuch better. Let’s see the visualization in figure 5.2.

Finally, let’s see the results for the sequence which introduces the SI frames. As we havedone with the previous sequence, we have fixed the GOP size, and then changing the QPvalue until we manage the appropriate bitrates.


Figure 5.2: Videostream visualization of the pair selected, SP rate of 15 and QP 37.

Here there is the table with the QP - SI rate pairs which achieve the bitrate demanded:

QP SP rate Bitrate [Kbps] Y-PSNR [dB] Size [bits]38 40 44,80 29,15 59734438 41 44,54 29,17 59384038 43 44,28 29,18 59036038 44 44,37 29,17 59156038 45 43,60 29,17 581272

Table 5.3: Results for 44 Kbps by I&P&SI sequence.

Now, that we have got the table with the bitrates, we are going to select the best one withthe best quality, without surpassing the limit of 44 Kbps. The pair selected is the one withSI rate 45 and QP 38, its quality is the best one and, even, the result of the bitrate is soclose with the reference value. The quality is 29’17 dB.

Despite of that the SI frames introduce a huge number of bits in the sequence, they aremuch bigger than the SP, the quality is quite the same, as we can see on the visualizationof the video stream:

Figure 5.3: Videostream visualization of the pair selected, SIrate 45 and QP 38.

5.2. 105 Kbps

Like we have done with the previous bitrate, 44 Kbps, we are going to find the best resultsto achieve the 105 Kbps.

First, we have got the sequence formed only by I and P frames. We get these pairs ofQP-GOP to achieve the demanded bitrate (Table 5.4).

The best pairs are the ones with a QP of 30 and a GOP value of 68 and 69. Now, we mustchoose the one with less number of bits, so, the selected one is with a GOP of 68.

We can see the visualization of the results in figure 5.4.

Let’s see the results of the sequence formed by I, P and SP frames. Next, there is thetable of the pairs obtained with the simulations which we can get the bitrate of 105 Kbps.


QP GOP Bitrate [Kbps] Y-PSNR [dB] Size [bits]30 60 104,88 34,15 139843230 68 104,19 34,16 138924030 69 104,65 34,16 139563030 70 104,78 34,15 139302430 80 102,42 34,13 136556830 100 100,7 34,10 134263232 12 104,7 33,12 139603234 5 115,19 32,05 153581635 5 100,83 31,36 1344448


Figure 5.4: Videostream visualization of the pair selected, QP of 30 and a GOP 68.

As it happens with the other bitrate, the result of the simulation must be less or equal to105 Kbps, not higher. The pair selected is: SP rate 8 and QP 31. The quality achieved is33’36 dB. The bitrate obtained is the lowest one, 94’61 Kbps, but what we want is to getthe best PSNR possible.

QP SP rate Bitrate [Kbps] Y-PSNR [dB] Size [bits]30 40 106,20 34,13 141596030 43 106,28 34,13 141708030 46 106,03 34,13 141375230 49 106,07 34,13 141428831 3 98,04 33,00 130717631 4 96,86 33,12 129148831 6 95,62 33,30 127499231 7 95,47 33,34 127295231 8 94,61 33,36 1261512


Below, there is the visualization of the video stream for these values:


Figure 5.5: Videostream visualization of the pair selected,SP rate 8 and QP 31.

Finally, the sequence formed by I, P and SI frames. Here there is the table with the bestpairs:

QP SP rate Bitrate [Kbps] Y-PSNR [dB] Size [bits]32 20 106,06 32,80 141414432 21 103,37 32,79 137829632 22 104,20 32,80 138927232 23 102,94 32,80 137258432 24 101,18 32,81 1349032


Only taking a look to it, we can already say which is the best pair to achieve the bitratedemanded, 105 Kbps. The pair is: QP 38 and SI rate 24. The quality obtained is 32’81 dB.The bitrate is the lowest compared with the reference but it has the best quality.

With this bitrate, the quality, if we compare to the one of the I & P & SP sequence, it islower. In the sequence with SP frames we can obtain the demanded bitrate in a qualityrate between 33-34 dB, so, in some cases there is a difference of 2 dB.

But the difference it is not so evident if we visualize the video stream:

Figure 5.6: Videostream visualization of the pair selected, QP 38 and SI rate 24.

5.3. 360 Kbps

Now, as the last analysis we are going to find the best pairs to get the bitrate of 360 Kbps.

The results that we have obtained to achieve the 360 Kbps with the different simulationsby the I and P frames sequence are the ones in the table 5.7.

To find the optimal result we have to consider that the maximum bitrate we have to achieveis 360Kbps. If there is some bitrate bigger it is not valid.

Then, once we have got the correct bitrates we must take a look to the quality. There areonly two pairs that they do not exceed the bitrate value.


QP GOP Bitrate [Kbps] Y-PSNR [dB] Size [bits]23 18 360,88 39,21 481176023 19 355,68 39,21 474240023 20 355,55 39,29 474072024 8 360,93 38,53 481235224 9 348,66 38,5 464879224 10 341,13 38,49 454844825 5 366,00 38,04 487996025 6 348,72 38,00 4649640


The one selected is the one with a QP of 23 and a GOP value of 20.

Here there is the visualization:

Figure 5.7: Videostream visualization of the pair selected QP of 23 and a GOP value of20.

Now, the sequence formed by I, P and SP frames. In the table below we have got the pairswhich get this value:

QP SP rate Bitrate [Kbps] Y-PSNR [dB] Size [bits]23 3 335,55 38,53 447394423 25 358,86 38,89 478482423 30 358,98 38,9 478640023 35 357,38 38,91 477211223 40 357,53 38,92 476702423 45 357,32 38,93 476426423 49 325,86 39,11 4344832


We need a maximum bitrate of 360 Kbps. We are going to choose the pair of SP rate andQP, which achieve this bitrate, with the highest quality . So, the pair selected is: SP rate45 and QP 23, its quality result is 39’11 dB.

With these bitrate we can achieve a very good quality of the video stream:


Figure 5.8: Videostream visualization of the pair selected, SP rate 45 and QP 23.

As the last analysis, the results of I, P and SI frames. With this bitrate we are going to dothe same we have done with the others. Let’s see the table:

QP SP rate Bitrate [Kbps] Y-PSNR [dB] Size [bits]23 20 381,52 39,04 508689623 34 349,15 39,10 465537623 35 357,88 39,09 477176823 36 356,66 39,08 475944023 38 353,74 39,09 471913623 40 352,28 39,09 469709623 49 348,13 39,11 4641752


The selected pair the one with QP 23 and SI rate 49. The quality given by this pair is 39’11dB. This result is the same result of the I, P and SP sequence.

So the visualization has the same quality:

Figure 5.9: Videostream visualization of the pair selected, QP 23 and SI rate 49.

5.4. Comparison of results

If we compare the results of the three types of sequences, it is easy to see that if weintroduce any other frame than the normal ones, I and P frames, to achieve the samebitrate the quality decreases.

It is an evident conclusion because if we introduce SP frames, as it is discussed in chapter3, without SP frames, the sequence achieve better quality. The same happens with the SIframes, as it is discussed in chapter 4.


Switching simulations 43

CHAPTER 6. SWITCHING SIMULATIONS

In this chapter we are going to describe the results of three qualities simulation. The firstone is a High Quality simulation, then it is a Low Quality simulation, and the third is aswitching simulation between high and low quality.

The three simulations are performed using the whole Foreman sequence (400 frames),with a GOP of 50 and a SP rate of 8.

6.1. High and Low Quality Simulation

In this section, there are the analysis of high and low quality results.

For the High Quality simulation the QP parameter is initialized for I and P frames to 28. TheQP parameter of the SP frame is 26. For the Low Quality Simulation the QP parameter isinitialized to 51 for I and P frames. The QP parameter is 49. The reason of introducing thishuge difference between QPs is just to make more evident the degradation of the videostream with the QP value.

As it was expected, the number of bits of SP frames are less than the bits of I frames, butthey are bigger than the ones of P frames on the high quality simulation:

I size: 26069P size: 4757,3

SP size: 7987,3

Table 6.1: Frame sizes for a high quality simulation.

For the Low Quality simulations the results are not exactly the same, the size of the SPframes are lower than any other frame:

I size: 1994P size: 246,7331

SP size: 137,3

Table 6.2: Frame sizes for a low quality simulation.

Because of this strange behavior, we have made a simulation with an intermediate QP of35, and the SPQP is 33:

I size: 12089P size: 1657,1

SP size: 2227,9


As we can see, the results are quite similar to the ones of high quality. The number of bitsof SP frames are bigger than the number of bits of P frames but smaller than the ones in Iframes.


To make sure our results we have made another simulation with a QP of 50 for I and Pframes and the SPQP 48. We have got the same result as the first simulations of lowquality, the SP size is smaller than P size:

I size: 2253P size: 303,5322

SP size: 170,2857


After this results we can deduce that there should be somewhere where the size of P andSP frames are the same. So, we draw the graph with some points to know the crossingone. The first pair is the one corresponding to PQP= 39 and SPQP=37 and the last onePQP=50 and SPQP=48.

Figure 6.1: Comparison graphic of P and SP frames.

So, the crossing point is the pair number 4 which corresponds to PQP 42 and SPQP 40.

Pair number PQP SPQP2 40 384 42 406 44 428 46 4410 48 4612 50 48

Table 6.5: Pair number definition.


6.1.1. Graphics

Now, there is the representation of the first GOP of each simulation, the high and the low,in terms of quality.

As we can observe on the graphics below, the quality of the SP frames is smaller than thequality of the other types of frames:

Figure 6.2: Visualization on High and Low quality.

Now, there is the comparison between a high quality simulation with the same character-istics but without introducing SP frames. Also for the low quality simulations.

Figure 6.3: Quality comparison between sequences with and without SP frames, for highand low quality.

Now, we are going to review the visualization of these sequences. If we visualize the


first sequence of each simulation, high and low quality, is so easy to know which onecorresponds to the high and which one corresponds to the low quality.

Figure 6.4: Visualization of the frames, 1st frame of the streaming.

When we visualize the 8th frame, where the SP frame appears because of the SP rate, weobserve the following:

Figure 6.5: Visualization of the frames, 8th frame of the streaming.

While we are watching the video stream we can not appreciate when an SP frame appears.It makes no difference in the video stream if we see an SP frame, a P frame or an I frame.The whole video stream is visualized with the same perceptive quality.

6.2. Switching simulation

In this part we are going to analyze a simulation done with the switching flag activated. Inthis simulation the SP rate and the switching rate are synchronized, both of them at 8. Itmake no sense to simulate without synchronizing these two rates because the switchinghas to be made at SP frames, and with a GOP of 50 the switching will be produced at a Pframe, which will lead to noticeable impairments.

First, let’s see the mean of number of bits for each frame. We start with the synchronizedsimulation.

I size: 10644P size: 2461,1

SP size: 21905

Table 6.6: Frame sizes for a switching simulation.

In this case, the size of SP frames is bigger than the size I frames. The reason is that thesize of SP frames corresponds to the addition of primary and secondary SP frames. In thetext visualization of the sequence is impossible to distinguish between the two types of SPframes:


Figure 6.6: Text visualization of the sequence.

Now, we are going to observe the graphic:

Figure 6.7: Quality of switching simulation.

In the synchronized switching, as it was expected, when in the sequences appears the SPframe, the quality of the video stream changes from low quality to high, or in the contrary,from high to low quality. The changes between the qualities are produced at the SP framebecause the SP frame is synchronized at the same rate of the switching period, as it wastold, both of them each 8th frame.

The position of these frames in the graphic it is one position more than in the real sequencebecause in the simulation the first position of the sequence is number zero, and in thegraphic the first is number one.


And now, we are going to compare the switching simulation the results of the high and lowsimulations without SP frames.

Until the first SP frame, the values of the qualities are the same for the low quality andthe switching simulation, which starts in low quality. From here the qualities values of theswitching simulation is lower than the original ones, the values of the simulations withoutSP frames. Here is the graphic:

Figure 6.8: Comparison of the Qualities.

Now, we are going to observe some frames of the video streaming.

Figure 6.9: Frames of the switching video streaming.

The picture marked as 1st corresponds to the first frame of the sequence, the IDR frame.We cannot see the image clearly because we are in low quality, but if we observe the nextpicture, the 8th, corresponding to an SP frame, the quality is already high. The last picture,marked as 16th, which corresponds to the 16th frame of the sequence and the 2nd SP, isin low quality.

Code improvements 49

CHAPTER 7. CODE IMPROVEMENTS

The purpose of modifying the code of the encoder is in order to let the switch happen indefined points, rather than in each switching period.

Our intention is to introduce the number of the frames in which we want to make theswitching, without a periodicity of the SP frames, with a text file. The text file contains theposition of the switching frames.

7.1. Original code

In the original code, the SP rate is market by the variable SPPicturePeriodicity.

Figure 7.1: encoder.cfg: SP rate variable.

And the variable in which we can change from one quality to another is ChangeQPStart.

Figure 7.2: encoder.cfg: change QP variable.

With this variable we mark the frame in which we want to change the Quantization Param-eter. The variables marked as ChangeQPI and ChangeQPP are the variables of the newQP.

For example, if the SPPicturePeriodicity variable is 8 means that every 8 frames an SPframe is introduced. And, if the ChangeQPstart is 10 means that every 10 frames, the QPof the frames changes, so, the quality of the sequence changes.

Both variables are initialized in encoder.cfg.

As we have already seen in the section 6.2 Switching simulation, it is important that theSPPicturePeriodicity and the ChangeQPStart must be synchronized.

In the text code, the main function to set the type of the frame is the next one:


Figure 7.3: Encoder code: SetImgType function.

Finally, let’s see the text file of the simulation:

Figure 7.4: Text file of a switching simulation.

We can see that every 8 frames an SP frame is introduced and the switching is proceed.

7.2. Modified code

As we have already said in the introduction of this chapter, our purpose is to achieve theswitching without enabling the SPPicturePeriodicity and the ChangeQPStart.


We focused our studies on the SetImgType function, introduced on the Original Code sec-tion.

In order to make everything more easy without touching the code, to introduce the positionof the switching points, we use a text file.

Figure 7.5: Example of a file with switching point.

About the SetImgType function we have changed its definition. In the original code aninput parameter is not defined, while in our definition the function is: SetImgType(int aux),where aux is the variable used to read every row of the text file. Every time a switchingpoint takes the same value as a frame position an SP is introduced and the variable aux isincremented in one point (if(img-type== SP SLICE)aux++;).

The first part of the function is exactly the same as the original one.

Then, we open the text file and read it. While the file is being read, the switching pointsare saved in the local vector frame[400]. The extension of the vector is 400 because isthe maximum number of frames that we can get (Foreman sequence is formed by 400frames).

If the frame number is the same as the value saved at the frame vector we introduce anSP frame and the index of the secondary SP frames is enabled. Then, we update theinput-qp2start with the current frame number if the qp2start is 0, so it changes from thefirst quality to the second one. After this, we disable the secondary SP frames indicator.

The third one is only used in case we want to change from the secondary quality to thefirst one.

The code of this function is shown in figure 7.7 at the end of this section.


Now, let’s see the text file of the simulation.

Figure 7.6: Text file of the switching simulation with the modified code.

As it is defined in the text file of the switching points, we change from one quality to theother in the frames: 3, 5, 7, 14 and 25. So, there is no primary SP period. We onlyintroduce SP frames in case we wan to make a switching.

If we compare the bits per picture between the frames obtained by the original code andthese one they are almost the same.


Figure 7.7: Function SetImgType modified.


Conclusions 55

CHAPTER 8. CONCLUSIONS

In this work we have made a study of the SP and SI frames introduced by the extendedprofile oh the standard H.264. The purpose was to learn the behavior of these two typesof frames and implementing the existing code in order to make it more efficient.

As a main conclusion we can say that the size of any frame type depends on the QP.Anyway, the SP frames depend also in its own rate. The bigger is the SP rate, the loweris the number of bits of the SP frames. The same happens with the SI frames and the SIrate. The other two type of frames, the I and P, do not depend on any of these two rates.

We can conclude that SP frames are better than I frames in a switching scenario. Thereason is that the SP switching simulation gives a more constant level of the bitrate valuesthan I switching, which introduces very high peaks.

If we compare the SP and the SI frames,the SP frames are better in switching applicationsbecause they are smaller than Si frames, but, in random access and back and fast forwardSI frames are better because they are predicted using intra prediction, so, they do not needany reference frame to be predicted.

In the case of High and Low quality simulations, without using the switching, we can saythat the insertion of SP frames in the sequences reduces the overall quality of the videostream.

In the case of the switching simulations, we can say that it is better to simulate the switch-ing sequence coordinating the SP rate and the switching rate in order to prevent strangebehavior in the simulation, and, of course it is much more effective to make the switchingin an SP frame than in any other frame because we are able to know how it is going to bethe behavior of the stream.

After discussing all the results, we have modified the code in order to work with the basicstreaming sequence, formed by I and P frames, but introducing the SP frames when theyare needed.

Finally, we have to say that to profit the introduction of this two new types of frames wehave to take into account where and when we are going to use them, and what is moreimportant, to introduce a better quality or to decrease the number of bits trying to maintainthe same quality.


BIBLIOGRAPHY 57

BIBLIOGRAPHY

[1] Klaus von Klitzing. The quantised hall effect. RMP, 58:519, 1986.

[2] Iain E.G. Rchardson. H.264 / mpeg-4 part 10 white paper - h.264 overview.www.vcodex.com, page 1, 2002.

[3] Thomas Wiegand, Gary Sulliivan, and Ajay Luthra. Draft itu-t recommendation andfinal draft international standard of joint video specification (itu-t rec. h.264 — iso/iec14496-10 avc). Joint Video Team, 2002.

[4] Detlev Marpe, Thomas Wiegand, and Gary Sullivan. The h.264/mpeg4 advancedvideo coding standard and its applications. IEEE Communications Magazine, page134, 2002.

[5] J Ribas-Corbera, P.A. Chou, and S. Regunathan. A generalized hypothetical refer-ence decoder for h.264/avc. IEEE Trans. Circuits Syst. Video Technol., 13:674, 2003.

[6] Thomas Wiegand, Sullivan Gary, Gisle Bjøntegaard, and Ajay Luthra. Overview ofthe h.264/avc video coding standard. IEEE Tansactions on Circuits and Systems forVideo Technology, 13:2, 2003.

[7] Gary Sullivan and Thomas Wiegand. Video compression - from concepts to theh.264/avc standard. ieeexplore.ieee.org, page 1, 2004.

[8] Ian E. G. Richardson. H.264 / mpeg-4 part 10 white paper: Prediction of inter mac-roblocks in p-slices. page 3, 2003.

[9] Martin Fiedler. Seminar paper: Implementation of a basic h.264/avc decoder. page 6,2004.

[10] Marta Karczewicz and Ragip Kurceren. The sp- and si-frames desing for h.264/avc.IEEE Transactions on circuits and systems for video technology, 13:8, 2003.

[11] Ian E. G. Richardson. H.264 and mpeg-4 video compression: Video coding for next-generation multimedia. page 306, 2003.

APPENDIXES

61

APPENDIX A.

A.1. Abbrebiations

• ISO: International Organization for Standardization

• ITU-T: International Telecommunication Union - Telecommunication Standardization

• JVT: Joint Video Team

• MPEG: Moving Picture Experts Group

• VCEG: Video Coding Experts Group

• AVC: Advanced Video Coding

• xDSL: Digital Subscriber Line

• UMTS: Universal Mobile Telecommunications System

• DVB-T: Digital Video Broadcasting

• HDTV: High Definition TV

• ISDN: Integrated Services Digital Network

• QCIF: Quarter Common Interchange Format

• NAL: Network Abstraction Layer

• VCL: Video Coding Layer

• FMO: Flexible Macroblock Ordering

• CAVLC: Context Adaptative Variable Length Coding

• CABAC: Coding Adaptative Binary Arithmetic Coding

• GOP: Group Of Pictures

• QP: Quantization Parameter

• RTP: Real Time Protocol

• RTCP: Real Time Control Protocol

• PSNR: Peak to Signal-to-Noise Ratio

A.2. Table of results for the simulation scenario

The results of all the simulations we have made for the chapter Simulation Scenario.

Figure A.1: Results for I & P sequences.

Figure A.2: Results for I & P & SP sequences.

Figure A.3: Results for I & P & SI sequences.

65

APPENDIX B.

This appendix includes the code of the encoder and the decoder of the standard.

B.1. Configuration file: encoder.cfg

# New Input File Format is as follows # ¡ParameterName¿ = ¡ParameterValue¿ # Comment# # See configfile.h for a list of supported ParameterNames

######################################################################### Files########################################################################InputFile = ”foreman QCIF 420.yuv” # Input sequenceInputHeaderLength = 0 # If the inputfile has a header, state it’s lenth in byte hereStartFrame = 0 # Start frame for encoding. (0-N)FramesToBeEncoded = 400 # Number of frames to be codedFrameRate = 30.0 # Frame Rate per second (0.1-100.0)SourceWidth = 176 # Frame widthSourceHeight = 144 # Frame heightTraceFile = ”foreman trace enc.txt”ReconFile = ”foreman test rec.yuv”OutputFile = ”foreman test.264”

######################################################################### Encoder Control########################################################################ProfileIDC = 88 # Profile IDC (66=baseline, 77=main, 88=extended; FREXT Profiles:100=High, 110=High 10, 122=High 4:2:2, 144=High 4:4:4, for params see below)LevelIDC = 40 # Level IDC (e.g. 20 = level 2.0)

IntraPeriod = 51 # Period of I-Frames (0=only first)EnableOpenGOP = 0 # Support for open GOPs (0: disabled, 1: enabled)IDRIntraEnable = 0 # Force IDR Intra (0=disable 1=enable)QPISlice = 28 # Quant. param for I Slices (0-51)QPPSlice = 28 # Quant. param for P Slices (0-51)FrameSkip = 0 # Number of frames to be skipped in input (e.g 2 will code every third frame)ChromaQPOffset = 0 # Chroma QP offset (-51..51)UseHadamard = 1 # Hadamard transform (0=not used, 1=used for all subpel positions,2=use only for qpel)DisableSubpelME = 0 # Disable Subpixel Motion Estimation (0=off/default, 1=on)SearchRange = 16 # Max search rangeNumberReferenceFrames = 5 # Number of previous frames used for inter motion search(1-16)PList0References = 0 # P slice List 0 reference override (0 disable, N ¡= NumberRefer-enceFrames)

Log2MaxFNumMinus4 = 0 # Sets log2 max frame num minus4 (-1 : based on FramesTo-BeEncoded/Auto, ¿=0 : Log2MaxFNumMinus4)Log2MaxPOCLsbMinus4 = -1 # Sets log2 max pic order cnt lsb minus4 (-1 : Auto, ¿=0 :Log2MaxPOCLsbMinus4)

GenerateMultiplePPS = 0 # Transmit multiple parameter sets. Currently parameters basi-cally enable all WP modes (0: diabled, 1: enabled)ResendPPS = 0 # Resend PPS (with pic parameter set id 0) for every coded Frame/Fieldpair (0: disabled, 1: enabled)

MbLineIntraUpdate = 0 # Error robustness(extra intra macro block updates)(0=off, N: OneGOB every N frames are intra coded)RandomIntraMBRefresh = 0 # Forced intra MBs per pictureInterSearch16x16 = 1 # Inter block search 16x16 (0=disable, 1=enable)InterSearch16x8 = 1 # Inter block search 16x8 (0=disable, 1=enable)InterSearch8x16 = 1 # Inter block search 8x16 (0=disable, 1=enable)InterSearch8x8 = 1 # Inter block search 8x8 (0=disable, 1=enable)InterSearch8x4 = 1 # Inter block search 8x4 (0=disable, 1=enable)InterSearch4x8 = 1 # Inter block search 4x8 (0=disable, 1=enable)InterSearch4x4 = 1 # Inter block search 4x4 (0=disable, 1=enable)

IntraDisableInterOnly = 0 # Apply Disabling Intra conditions only to Inter Slices (0:dis-able/default,1: enable)Intra4x4ParDisable = 0 # Disable Vertical & Horizontal 4x4Intra4x4DiagDisable = 0 # Disable Diagonal 45degree 4x4Intra4x4DirDisable = 0 # Disable Other Diagonal 4x4Intra16x16ParDisable = 0 # Disable Vertical & Horizontal 16x16Intra16x16PlaneDisable = 0 # Disable Planar 16x16ChromaIntraDisable = 0 # Disable Intra Chroma modes other than DCEnableIPCM = 1 # Enable IPCM macroblock mode

DisposableP = 0 # Enable Disposable P slices in the primary layer (0: disable/default, 1:enable)DispPQPOffset = 0 # Quantizer offset for disposable P slices (0: default)

######################################################################### B Slices########################################################################

NumberBFrames = 0 # Number of B coded frames inserted (0=not used)QPBSlice = 35 # Quant. param for B slices (0-51)BRefPicQPOffset = 0 # Quantization offset for reference B coded pictures (-51..51)DirectModeType = 1 # Direct Mode Type (0:Temporal 1:Spatial)DirectInferenceFlag = 1 # Direct Inference Flag (0: Disable 1: Enable)

BList0References = 0 # B slice List 0 reference override (0 disable, N ¡= NumberRefer-enceFrames)BList1References = 1 # B slice List 1 reference override (0 disable, N ¡= NumberRefer-enceFrames)# 1 List1 reference is usually recommended for normal GOP Structures.# A larger value is usually more appropriate if a more flexible# structure is used (i.e. using HierarchicalCoding)

BReferencePictures = 0 # Referenced B coded pictures (0=off, 1=on)

HierarchicalCoding = 0 # B hierarchical coding (0= off, 1= 2 layers, 2= 2 full hierarchy, 3 =explicit)HierarchyLevelQPEnable = 1 # Adjust QP based on hierarchy level (in increments of 1).Overrides BRefPicQPOffset behavior.(0=off, 1=on)ExplicitHierarchyFormat = ”b2r28b0e30b1e30b3e30b4e30” # Explicit Enhancement GOP.Format is FrameDisplay orderReferenceQP.# Valid values for reference type is r:reference, e:non reference.ReferenceReorder = 1 # Reorder References according to Poc distance for Hierarchical-Coding (0=off, 1=enable)PocMemoryManagement = 1 # Memory management based on Poc Distances for Hierar-chicalCoding (0=off, 1=on)

BiPredMotionEstimation = 0 # Enable Bipredictive based Motion Estimation (0:disabled,1:enabled)BiPredMERefinements = 3 # Bipredictive ME extra refinements (0: single, N: N extra re-finements (1 default)BiPredMESearchRange = 16 # Bipredictive ME Search range (8 default). Note that rangeis halved for every extra refinement.BiPredMESubPel = 1 # Bipredictive ME Subpixel Consideration (0: disabled, 1: singlelevel, 2: dual level)

######################################################################### SP Frames########################################################################

SPPicturePeriodicity = 8 # SP-Picture Periodicity (0=not used)QPSPSlice = 26 # Quant. param of SP-Slices for Prediction Error (0-51)QPSP2Slice = 25 # Quant. param of SP-Slices for Predicted Blocks (0-51)SI FRAMES = 0 # SI frame encoding flag (0=not used, 1=used)SP output = 1 # Controls whether coefficients will be output to encode switching SP frames(0=no, 1=yes)SP output name = ”low quality.dat” # Filename for SP output coefficientsSP2 FRAMES = 0 # switching SP frame encoding flag (0=not used, 1=used)SP2 input name1 = ”high quality.dat” # Filename for the first swithed bitstream coefficientsSP2 input name2 = ”low quality.dat” # Filename for the second switched bitstream coeffi-

cients

######################################################################### Output Control, NALs########################################################################

SymbolMode = 0 # Symbol mode (Entropy coding method: 0=UVLC, 1=CABAC)OutFileMode = 1 # Output file mode, 0:Annex B, 1:RTPPartitionMode = 0 # Partition Mode, 0: no DP, 1: 3 Partitions per Slice

######################################################################### CABAC context initialization########################################################################

ContextInitMethod = 1 # Context init (0: fixed, 1: adaptive)FixedModelNumber = 0 # model number for fixed decision for inter slices ( 0, 1, or 2 )

######################################################################### Interlace Handling########################################################################

PicInterlace = 0 # Picture AFF (0: frame coding, 1: field coding, 2:adaptive frame/fieldcoding)MbInterlace = 0 # Macroblock AFF (0: frame coding, 1: field coding, 2:adaptive frame/fieldcoding)IntraBottom = 0 # Force Intra Bottom at GOP Period

######################################################################### Weighted Prediction########################################################################

WeightedPrediction = 0 # P picture Weighted Prediction (0=off, 1=explicit mode)WeightedBiprediction = 0 # B picture Weighted Prediciton (0=off, 1=explicit mode, 2=im-plicit mode)UseWeightedReferenceME = 0 # Use weighted reference for ME (0=off, 1=on)

######################################################################### Picture based Multi-pass encoding########################################################################

RDPictureDecision = 0 # Perform RD optimal decision between different coded pictureversions.# If GenerateMultiplePPS is enabled then this will test different WP methods.# Otherwise it will test QP +-1 (0: disabled, 1: enabled)RDPictureIntra = 0 # Perform RD optimal decision also for intra coded pictures (0: disabled

(default), 1: enabled).RDPSliceWeightOnly = 1 # Only consider Weighted Prediction for P slices in Picture RDdecision. (0: disabled, 1: enabled (default))RDBSliceWeightOnly = 0 # Only consider Weighted Prediction for B slices in Picture RDdecision. (0: disabled (default), 1: enabled )

######################################################################### Loop filter parameters########################################################################

LoopFilterParametersFlag = 0 # Configure loop filter (0=parameter below ingored, 1=pa-rameters sent)LoopFilterDisable = 0 # Disable loop filter in slice header (0=Filter, 1=No Filter)LoopFilterAlphaC0Offset = 0 # Alpha & C0 offset div. 2, -6, -5, ... 0, +1, .. +6LoopFilterBetaOffset = 0 # Beta offset div. 2, -6, -5, ... 0, +1, .. +6

######################################################################### Error Resilience / Slices########################################################################

SliceMode = 2 # Slice mode (0=off 1=fixed #mb in slice 2=fixed #bytes in slice 3=use call-back)SliceArgument = 700 # Slice argument (Arguments to modes 1 and 2 above)

num slice groups minus1 = 0 # Number of Slice Groups Minus 1, 0 == no FMO, 1 == twoslice groups, etc.slice group map type = 0 # 0: Interleave, 1: Dispersed, 2: Foreground with left-over,# 3: Box-out, 4: Raster Scan 5: Wipe# 6: Explicit, slice group id read from SliceGroupConfigFileNameslice group change direction flag = 0 # 0: box-out clockwise, raster scan or wipe right,# 1: box-out counter clockwise, reverse raster scan or wipe leftslice group change rate minus1 = 85 #SliceGroupConfigFileName = ”sg0conf.cfg” # Used for slice group map type 0, 2, 6

UseRedundantPicture = 0 # 0: not used, 1: enabledNumRedundantHierarchy = 1 # 0-4PrimaryGOPLength = 10 # GOP length for redundant allocation (1-16)# NumberReferenceFrames must be no less than PrimaryGOPLength when redundantslice enabledNumRefPrimary = 1 # Actually used number of references for primary slices (1-16)

######################################################################### Search Range Restriction / RD Optimization########################################################################

RestrictSearchRange = 2 # restriction for (0: blocks and ref, 1: ref, 2: no restrictions)RDOptimization = 1 # rd-optimized mode decision# 0: RD-off (Low complexity mode)# 1: RD-on (High complexity mode)# 2: RD-on (Fast high complexity mode - not work in FREX Profiles)# 3: with lossesDisableThresholding = 0 # Disable Thresholding of Transform Coefficients (0:off, 1:on)DisableBSkipRDO = 0 # Disable B Skip Mode consideration from RDO Mode decision(0:off, 1:on)SkipIntraInInterSlices = 0 # Skips Intra mode checking in inter slices if certain mode deci-sions are satisfied (0: off, 1: on)

# Explicit Lambda UsageUseExplicitLambdaParams = 0 # Use explicit lambda scaling parameters (0:disabled, 1:en-abled)LambdaWeightIslice = 0.65 # scaling param for I slices. This will be used as a multiplieri.e. lambda=LambdaWeightISlice * 2((QP-12)/3)LambdaWeightPslice = 0.68 # scaling param for P slices. This will be used as a multiplieri.e. lambda=LambdaWeightPSlice * 2((QP-12)/3)LambdaWeightBslice = 2.00 # scaling param for B slices. This will be used as a multiplieri.e. lambda=LambdaWeightBSlice * 2((QP-12)/3)LambdaWeightRefBslice = 1.50 # scaling param for Referenced B slices. This will be usedas a multiplier i.e. lambda=LambdaWeightRefBSlice * 2((QP-12)/3)LambdaWeightSPslice = 1.50 # scaling param for SP slices. This will be used as a multi-plier i.e. lambda=LambdaWeightSPSlice * 2((QP-12)/3)LambdaWeightSIslice = 0.65 # scaling param for SI slices. This will be used as a multiplieri.e. lambda=LambdaWeightSISlice * 2((QP-12)/3)

LossRateA = 5 # expected packet loss rate of the channel for the first partition, only validif RDOptimization = 2LossRateB = 0 # expected packet loss rate of the channel for the second partition, onlyvalid if RDOptimization = 2LossRateC = 0 # expected packet loss rate of the channel for the third partition, only validif RDOptimization = 2NumberOfDecoders = 30 # Numbers of decoders used to simulate the channel, only validif RDOptimization = 2RestrictRefFrames = 1 # Doesnt allow reference to areas that have been intra updated ina later frame.

######################################################################### Additional Stuff########################################################################

UseConstrainedIntraPred = 0 # If 1, Inter pixels are not used for Intra macroblock predic-tion.LastFrameNumber = 0 # Last frame number that have to be coded (0: no effect)

ChangeQPI = 28 # QP (I-slices) for second part of sequence (0-51)ChangeQPP = 28 # QP (P-slices) for second part of sequence (0-51)ChangeQPB = 28 # QP (B-slices) for second part of sequence (0-51)ChangeQPBSRefOffset = 0 # QP offset (stored B-slices) for second part of sequence (-51..51)ChangeQPStart = 8 # Frame no. for second part of sequence (0: no second part)

NumberofLeakyBuckets = 8 # Number of Leaky Bucket valuesLeakyBucketRateFile = ”leakybucketrate.cfg” # File from which encoder derives rate val-uesLeakyBucketParamFile = ”leakybucketparam.cfg” # File where encoder stores leakybuck-etparams

NumberFramesInEnhancementLayerSubSequence = 0 # number of frames in the En-hanced Scalability Layer(0: no Enhanced Layer)NumberOfFrameInSecondIGOP = 0 # Number of frames to be coded in the second IGOP

SparePictureOption = 0 # (0: no spare picture info, 1: spare picture available)SparePictureDetectionThr = 6 # Threshold for spare reference pictures detectionSparePicturePercentageThr = 92 # Threshold for the spare macroblock percentage

PicOrderCntType = 0 # (0: POC mode 0, 1: POC mode 1, 2: POC mode 2)

#########################################################################Rate control########################################################################

RateControlEnable = 0 # 0 Disable, 1 EnableBitrate = 45020 # Bitrate(bps)InitialQP = 0 # Initial Quantization Parameter for the first I frame# InitialQp depends on two values: Bits Per Picture,# and the GOP lengthBasicUnit = 11 # Number of MBs in the basic unit# should be a fractor of the total number# of MBs in a frameChannelType = 0 # type of channel( 1=time varying channel; 0=Constant channel)

#########################################################################Fast Mode Decision########################################################################

EarlySkipEnable = 0 # Early skip detection (0: Disable 1: Enable)SelectiveIntraEnable = 0 # Selective Intra mode decision (0: Disable 1: Enable)

#########################################################################FREXT stuff########################################################################

YUVFormat = 1 # YUV format (0=4:0:0, 1=4:2:0, 2=4:2:2, 3=4:4:4)RGBInput = 0 # 1=RGB input, 0=GBR or YUV inputBitDepthLuma = 8 # Bit Depth for Luminance (8...12 bits)BitDepthChroma = 8 # Bit Depth for Chrominance (8...12 bits)CbQPOffset = 0 # Chroma QP offset for Cb-part (-51..51)CrQPOffset = 0 # Chroma QP offset for Cr-part (-51..51)Transform8x8Mode = 0 # (0: only 4x4 transform, 1: allow using 8x8 transform additionally,2: only 8x8 transform)ResidueTransformFlag = 0 # (0: no residue color transform 1: apply residue color trans-form)ReportFrameStats = 0 # (0:Disable Frame Statistics 1: Enable)DisplayEncParams = 0 # (0:Disable Display of Encoder Params 1: Enable)Verbose = 1 # level of display verboseness (0:short, 1:normal, 2:detailed)

#########################################################################Q-Matrix (FREXT)########################################################################

QmatrixFile = ”q matrix.cfg”

ScalingMatrixPresentFlag = 0 # Enable Q Matrix (0 Not present, 1 Present in SPS, 2Present in PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag0 = 3 # Intra4x4 Luma (0 Not present, 1 Present in SPS, 2 Presentin PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag1 = 3 # Intra4x4 ChromaU (0 Not present, 1 Present in SPS, 2Present in PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag2 = 3 # Intra4x4 chromaV (0 Not present, 1 Present in SPS, 2Present in PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag3 = 3 # Inter4x4 Luma (0 Not present, 1 Present in SPS, 2 Presentin PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag4 = 3 # Inter4x4 ChromaU (0 Not present, 1 Present in SPS, 2Present in PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag5 = 3 # Inter4x4 ChromaV (0 Not present, 1 Present in SPS, 2Present in PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag6 = 3 # Intra8x8 Luma (0 Not present, 1 Present in SPS, 2 Presentin PPS, 3 Present in both SPS & PPS)ScalingListPresentFlag7 = 3 # Inter8x8 Luma (0 Not present, 1 Present in SPS, 2 Presentin PPS, 3 Present in both SPS & PPS)

#########################################################################Rounding Offset control########################################################################

OffsetMatrixPresentFlag = 0 # Enable Explicit Offset Quantization Matrices (0: disable 1:enable)QOffsetMatrixFile = ”q offset.cfg” # Explicit Quantization Matrices file

AdaptiveRounding = 0 # Enable Adaptive Rounding based on JVT-N011 (0: disable, 1:enable)AdaptRndPeriod = 1 # Period in terms of MBs for updating rounding offsets.# 0 performs update at the picture level. Default is 16. 1 is as in JVT-N011.AdaptRndChroma = 0 # Enables coefficient rounding adaptation for chroma

AdaptRndWFactorIRef = 4 # Adaptive Rounding Weight for I/SI slices in reference pictures/4096AdaptRndWFactorPRef = 4 # Adaptive Rounding Weight for P/SP slices in reference pic-tures /4096AdaptRndWFactorBRef = 4 # Adaptive Rounding Weight for B slices in reference pictures/4096AdaptRndWFactorINRef = 4 # Adaptive Rounding Weight for I/SI slices in non referencepictures /4096AdaptRndWFactorPNRef = 4 # Adaptive Rounding Weight for P/SP slices in non referencepictures /4096AdaptRndWFactorBNRef = 4 # Adaptive Rounding Weight for B slices in non referencepictures /4096

#########################################################################Lossless Coding (FREXT)########################################################################

QPPrimeYZeroTransformBypassFlag = 0 # Enable lossless coding when qpprime y is zero(0 Disabled, 1 Enabled)

#########################################################################Fast Motion Estimation Control Parameters########################################################################

UseFME = 0 # Use fast motion estimation (0=disable/default, 1=UMHexagonS,# 2=Simplified UMHexagonS, 3=EPZS patterns)FMEDSR = 1 # Use Search Range Prediction. Only for UMHexagonS method# (0:disable, 1:enabled/default)FMEScale = 3 # Use Scale factor for different image sizes. Only for UMHexagonS method# (0:disable, 3:/default)# Increasing value can speed up Motion Search.

EPZSPattern = 2 # Select EPZS primary refinement pattern.# (0: small diamond, 1: square, 2: extended diamond/default, # 3: large diamond)EPZSDualRefinement = 3 # Enables secondary refinement pattern.

# (0:disabled, 1: small diamond, 2: square,# 3: extended diamond/default, 4: large diamond)EPZSFixedPredictors = 2 # Enables Window based predictors# (0:disabled, 1: P only, 2: P and B/default)EPZSTemporal = 1 # Enables temporal predictors# (0: disabled, 1: enabled/default)EPZSSpatialMem = 1 # Enables spatial memory predictors# (0: disabled, 1: enabled/default)EPZSMinThresScale = 0 # Scaler for EPZS minimum threshold (0 default).# Increasing value can speed up encoding.EPZSMedThresScale = 1 # Scaler for EPZS median threshold (1 default).# Increasing value can speed up encoding.EPZSMaxThresScale = 1 # Scaler for EPZS maximum threshold (1 default).# Increasing value can speed up encoding.

B.2. Configuration file: decoder.cfg

foreman test.264 ........H.26L coded bitstreamforeman test2 dec.yuv ........Output file, YUV/RGBforeman QCIF 420.yuv ........Ref sequence (for SNR)1 ........Write 4:2:0 chroma components for monochrome streams1 ........NAL mode (0=Annex B, 1: RTP packets)0 ........SNR computation offset2 ........Poc Scale (1 or 2)500000 ........Rate Decoder104000 ........B decoder73000 ........F decoderleakybucketparam.cfg ........LeakyBucket Params0 ........Err Concealment(0:Off,1:Frame Copy,2:Motion Copy)2 ........Reference POC gap (2: IPP (Default), 4: IbP / IpP)2 ........POC gap (2: IPP /IbP/IpP (Default), 4: IPP with frame skip = 1 etc.)

This is a file containing input parameters to the JVT H.264/AVC decoder.The text line following each parameter is discarded by the decoder.

Documents

FINAL PROJECT - TU Wien · Autor: Maria Salvat Perarnau Director: Markus Rupp Supervisor: Luca Superiori Data: 25 de juliol de 2007 Resum L’objectiu d’aquest treball es el de