Objective of h.264 Content

7/30/2019 Objective of h.264 Content

1/19

http://access.feld.cvut.cz/view.php?cisloclanku=2013010001

Objective Video Quality Evaluation and

H.264/SVC Content Streaming over WLANsVydno dne 24. 01. 2013 (486 peten)

In this article, we study the H.264/SVC video delivery and its objective quality assessment with

respect to IEEE 802.11 networks, in the presence of background traffic. In particular, we

consider a scenario where a wireless multimedia server is transmitting single-layer encodedH.264/SVC and background traffic to one client and two sets of background traffic to another

client. We objectively evaluate the quality of the streamed video given background traffic with

varying bit rates, contents with different spatio-temporal information encoded at different

quantization parameter levels. All packets were given equal priority.

Keywords: SVC, WLAN, Video streaming, Background traffic, Objective video quality

Introduction

With the increasing proliferation of multimedia content over the Internet and the emergence of

handheld mobile devices like tablets, smartphones and laptops capable of streaming videocontent, wireless video communication has become attractive more than ever before, receiving

significant attention from both the industry and academia. Wireless video transmission

applications are easily deployed in homes, offices and transport vehicles.

Wireless Local Area Networks (WLANs) technologies support applications such as video

streaming, VoIP and many others, especially due to mobility, good throughput, and low budgetrequirements. Currently, there are many available WLANs, including IEEE 802.11a, IEEE

802.11b, and IEEE 802.11g, etc. The IEEE 802.11 a/b/g standards support contention-based

communication mechanism of Carrier Sense Multiple Access with Collision Avoidance(CSMA/CA). Although this mechanism has become very common, they are considered

inefficient for achieving a reasonable video quality in scenarios with high background traffic,

because they provide best-effort services which restrict QoS for high critical multimedia

applications. Wireless video communications face a lot of challenges. Delivery of real-timevideo over wireless networks imposes stringent requirements, especially in terms of bandwidth,

delay constraints, latency and loss variations. Like other wireless technologies, channel

impairments can affect the IEEE 802.11 physical transmission rate assigned to mobile users. The

actual throughput achieved by a specific user can also vary, depending on the number of usersand nature of applications sharing the same channel.


2/19

The Scalable Video Coding extension of H.264/MPEG-4 AVC (Advanced Video Coding)

facilitates efficient video transmissions, especially over wireless networks, allowing the encoding

of a video sequence and streaming of same over heterogeneous networks to a variety of enddevices. With H.264/SVC, different scalability techniques can be used in order to deliver the

most appropriate video bitstream based on network characteristics and mobile device

capabilities.

Multitude of studies [1-2] have been carried out on video transmission over loss-prone wireless

channel networks. Authors in [3-4] have carried out research on SVC streaming over IEEE802.11 networks. In this paper, H.264/SVC video quality transmission over IEEE 802.11

networks in the presence of background traffic is studied. In particular, we consider a scenario

where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and

background traffic to one client and two sets of background traffic to another client. Weobjectively evaluate the quality of the streamed video given background traffic with varying bit

rates, contents with different spatio-temporal information encoded at different quantization

parameter levels. All packets were given equal priority. Results indicate that received video

quality deteriorates with increasing background traffic and high content bit rate, given no packetdifferentiation at the MAC (Media Access Control) layer level. Also, contents may be affected

differently, depending on the scene complexity and coding efficiency.

H.264/SVC Encoding and Transmission

The latest H.264/MPEG-4 AVC standard provides a scalable extension, called H.264/SVC [5],making it the first standard that defines international multi-dimensional scalability. H.264/SVC

achieves significant compression efficiency and reduction in processing complexity, as well as

very good subjective quality ratings [6]. H.264/SVC scheme is known to be very valuable in

video applications over the Internet and wireless video transmission, low resolution video

applications, multicast applications, range of quality suited for different heterogeneous receivercapabilities, and resilience in bandwidth variation scenarios [7]. The bit rate adaptability

capability which is native to the scalable codec design provides content adaptations, based onchanges to network conditions. H.264 scalable video coding reuses the key features of

H.264/MPEG-4 Advanced Video Coding and also employs other techniques to provide

scalability extensions and to improve coding gain.


3/19

Fig. 1: Diagrammatic representation of SVC scalabilities

In general SVC can provide three types of scalability, namely temporal, spatial and SNRdimensions, allowing multiple video representations, by leaving out parts of the encoded

representations, thereby adapting bit rate and quality levels during video transmission. Scalable

bit-stream is organized into a base layer and one or several enhancement layers. The base layer is

considered more important than the enhancement layers. While the base layer needs less

transmission bandwidth due to its coarser quality, the enhancement layer requires moretransmission bandwidth due to its finer quality. Consequently, SNR/spatial/temporal scalability

achieves bandwidth scalability. Fig. 1 above shows a diagrammatic representation of SVCscalabilities.

Spatial scalability refers to the possibility of representing the same video in different spatialresolutions or sizes (e.g. QCIF, CIF and 4CIF). Generally, spatially scalable video is encoded by

using spatially up-sampled pictures from a lower layer as a prediction in a higher layer. Inter-

layer prediction techniques are used to further improve the coding efficiency.

Temporal scalability refers to the possibility of representing the same video in different

temporal resolutions or frame rates, i.e. the number of frames contained in one second of thevideo, allowing video to be played at different frame rates. It is typically implemented by making

use of temporally up-sampled pictures from a lower layer as a prediction in a higher layer.

Quality scalability, also called signal-to-noise ratio (SNR) scalability, refers to the possibility of

representing the same video in different perceptual quality levels. SNR-scalable coding quantizes

the DCT coefficients to different levels of accuracy by using different quantization parameters.


4/19

Scalable Video Coding, deriving its extension from H.264/AVC, maintains the concepts of

Video Coding Layer (VCL) and Network Abstraction Layer (NAL). While the VCL acts as the

interface between the encoder and video frames, employing block-based structure and supportingdifferent scalabilities, the NAL acts as the interface between the encoder and actual network

protocol, enabling the formatting of the coded videos for transmission over the packet networks,

providing necessary header information. A NAL unit consists of a header and a payload, carryingthe actual encoded video frame and its relevance in the decoding process [8]. The NALU headerdefines different parameters, including the dependency id (DID), describing the spatial

scalability; the temporal id (TID), indicating the temporal scalability hierarchically; the quality id

(QID), which is used to define the quality scalability structure; and the priority id (PID), whichassigns priority to the stream. For more details, please consult [9]

Implementations

In this section, we describe the implementation steps, starting with video sequence encoding,

simulation methodology and objective video quality evaluation. We consider a scenario where a

wireless multimedia server is transmitting single-layer encoded H.264/SVC and backgroundtraffic to one client and two sets of background traffic to another client. We objectively evaluate

the quality of the streamed video given background traffic with varying bit rates, contents with

different spatio-temporal information encoded at different quantization parameter levels.

Test Sequences

Three sequences, each of 10 seconds duration, with different genres and characteristics covering

varying spatial and temporal complexity, namely, Foreman, News and Coastguard were selected

[10].

Fig. 2: Snap shots of the video sequences

The diagram above shows the frames of the three sequences: Foreman, News and Coastguard, in

that order.


5/19

Fig. 3: Spatial and temporal indicators of the three contents

Fig. 3 above shows the spatial (SI) and temporal Information (TI) indices on the luminance

component of the contents, respectively: Foreman: 59.38, 20.57; News: 75.41, 23.52 and

Coastguard: 76.43, 23.50. Spatial perceptual Information (SI) and Temporal PerceptualInformation (TI) based on Sobel filter from ITU-T-Rec P.910 [11] was used in order to measure

the complexity of the scene given in Eqs. (1) and (2)

(1)

(2)

WhereFn represents the luminance plane in a video frame at time n. It is observed that Foreman

has smaller SI and TI values, compared to News and Coastguard. Detailed information regarding

the three sequences and encoder configuration is summarized in Table 1. The video sequenceswere sourced from different publicly available video traffic traces, including [10].


6/19

Encoding and Simulation

Fig. 4: Implementation methodology

TABLE 1: Encoder configurations

Input YUV files Foreman, News, Coastguard

Resolution CIF

Frame Rate 30 fps

Number of frames 300

Number of layers 1

GOP size 16

Search range 32

Search mode 4

MGSControl 1

CgsnrRefinement 1

Base layer mode 0

Encode key pictures 1

The implementation methodology is shown in Fig. 4 above. The three YUV video files were firstencoded using the JSVM Software Manual [12], according to the configurations further

summarized in Table 1. A set of different QP scenarios was designed to cover a wide range of

quality levels. We encoded each video using 7 scenarios in which the QP values for the base


7/19

layer are varied for 44, 38, 32, 26, 20, 15 and 10. The coding efficiency of H.264/SVC is

dependent on the quantization parameters of each layer. Packet traces (Network Abstraction

Layer Units) of the H.264 bit streams are generated using BitStreamExtractor.

Fig. 5 Simulation topology

TABLE 2: Wireless channel configurations

Parameter Value

MAC type 802.11

Radio propagation Propagation/TwoRayGround

Interface queue Queue/DropTail

Routing DSDV

Antenna model Antenna/Omni Antenna

Data rate 11 Mbit/s

Basic rate 1 Mbit/s

Number of mobile modes 3

Interface queue 50

The NALUs are prepared for transmission over the IP network (hinting, packetization). Theresulting H.264 video trace files are hinted using MP4Box [13] which emulates the streaming of

the *.h264 video over the network based on RTP/UDP/IP protocol stack. Large NALUs are thus

split through IP layer fragmentation. Real-time Transport Protocol (RTP) is used for transfer ofreal-time data like video streaming. Existing transport protocols like UDP (User Datagram


8/19

Protocol) will run under RTP. RTP provides applications that occur in real-time with end-to-end

delivery services, such as sequence numbers, types, sizes of the video frames and the number of

UDP packets used to transmit each frame, and timestamps (for packet loss and reorderingdetection, and end-to-end delay).

We conduct the simulations of H.264/SVC video transmission over IEEE 802.11 [14] using NS-2 [15]. The wireless channel configuration is summarized in Table 2. The simulated scenario

consists of three wireless nodes, one multimedia server and two clients, all within reasonable

transmission range. The multimedia server transmits H.264/SVC video and CBR traffic to Client1, while Client 2 receives FTP and CBR traffic from the server, all happening simultaneously.

Packet sizes were set to 1500 Bytes. The network topology is depicted in Fig. 5. The background

traffic generated at the server and accessed by the two clients, while streaming video traffic,

increases the virtual collisions that occur at the servers MAC layer. All the packets wereassigned equal priority and scheduled from the same access point of the multimedia server. The

experiment is designed to study the impacts of competing background traffic with different

sending rates on the streamed video quality. In order to overload the wireless transmission, the

CBR flows for the two clients are varied from 0.1, 0.5 to 1 Mbit/s each, while streaming thedifferent video sequences of different contents and different encoding QP values.

10 different initial seeds for random number generation were chosen for simulation. Results

generated were averaged over these 10 runs. After simulation, the received trace file is

generated. The received and the original NALU trace files are further combined and processed to

generate the received NALU trace. Maximum playout buffer delay at the video client is set to 5seconds. After further processing, the received NALU trace is passed through

BitStreamExtractor which generates H.264 video, which is in turn decoded with the JSVM

H264Decoder, thus obtaining an uncompressed YUV file. The reconstructed YUV file and theoriginal one are compared with objective video quality metric, to compute the overall video

quality.

Objective Video Quality Evaluation

Objective video quality algorithms are based on mathematical models that can predict image

multimedia quality by comparing a distorted signal against a reference, typically by modeling thehuman visual system. Some existing objective criteria are Mean Error Square (MSE), Peak

Signal-to-Noise Ratio (PSNR), SSIM (structural similarity) and VIF (Visual Information

Fidelity). In this experiment, PSNR [16] is adopted as our objective metric. PSNR has beenselected because it is the most widely used metric.

PSNR can be computed for both luminance (Y-PSNR) and chrominance (U-PSNR and V-PSNR)components of the video. The human eye is considered more sensitive to luminance (brightness)than chrominance (colour), therfore the PSNR is usually evaluted only for the luminance (Y)

component. The equation below shows the relationship between the PSNR of the luminance

component Y of original image and degraded image D:


9/19

(3)

Where Vpeak= 2k-1; k denotes number of bits per pixel.Ncolrepresents the number of columns;

Nrow the number of rows in an image. PSNR computes the error between a reconstructed image

and the original one. A larger PSNR value denotes better image quality.

Results and Discussions

Fig. 6Fig. 14 depict the results obtained from this experiment. Fig. 6 depicts the quality

comparison of the encoded only video sequences, for the three contents, encoded at differentquantization parameter values. Results indicate that lower quantization parameters lead to better

perceptual quality, depicted by higher PSNR values. Fig. 7 plots the PSNR curve vs. frame

number for the Foreman sequence, encoded only at QP = 44 and 10, and Foreman encoded at QP= 10 and transmitted under 1 Mbit/s background traffic level.

Fig. 6: Impact of quantization parameter on video quality


10/19

Fig. 7: Quality comparison for encoded only and transmitted sequences


11/19

Fig. 8: Quality comparison for transmitted sequences, QP =44


12/19



13/19



14/19



15/19



16/19



17/19


Analysis of the generated bit streams (Table 3) shows that the lower the quantization parameter,the higher the generated file size and consequently higher bit rates (204 Kbit/s for Foreman at QP

= 44, 6.53 Mbit/s at QP = 10; 122 Kbit/s for News at QP = 44, 2.63 Mbit/s at QP = 10; 296

Kbit/s for Coastguard at QP = 44, 8.30 Mbit/s at QP = 10), however. The QP value may howevervary during the encoding process, depending on the position of each frame within the Group of

Pictures.

TABLE 3: QP vs. bit rates

QP

Foreman

Bit rates

[Kbit/s]

News

Bit rates

[Kbit/s]

Coastguard

Bit rates

[Kbit/s]

44 204 122 296

20 1440 645 2700

15 3240 1240 4990

10 6530 2630 8300


18/19

Fig. 8 to Fig. 14 plot the PSNR values for the three video sequences encoded at seven different

quantization levels and transmitted from same multimedia server accessed at varying background

traffic bit rate levels. Encoded only videos generally have higher PSNR values compared to theirtransmitted counterparts. At higher quantization levels (QP = 44 to 32) and lower background bit

rate level, the PSNR value of the streamed video sequences remain same as their coded only

counterparts, meaning that no video packets were lost during transmission. However, at lowerquantization levels and higher background traffic thresholds, the PSNR values of the streamedvideo decline sharply. Content-based analysis reveals that the video sequences can react

differently to competition for channel bandwidth arising from background traffic of different bit

rates. This could be attributed to different spatio-temporal complexities of the sequences. Givenno packet pritotization, contents with high bit rates (e.g. Coastguard) suffer higher PSNR

degradation, caused by collision- induced video packet loss at the MAC layer of the streaming

server, even at same encoding quantization level.

Conclusion

This paper has presented a detailed video quality evaluation in the transmission of H.264/SVCvideo over IEEE 802.11 networks in the presence of background traffic. We considered a

scenario where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and

background traffic to one client and two sets of background traffic to another client. Weobjectively evaluated the quality of the streamed video given background traffic with varying bit

rates, contents with different spatio-temporal information encoded at different quantization

parameter level. Results indicate that received video quality deteriorates with increasing

background traffic and high content bit rate, given no packet differentiation at the MAC (MediaAccess Control) layer. Also, contents may be affected differently, depending on the scene

complexity and coding efficiency. For future work, we intend to expand the studies to tradeoffs

in video quality optimization in the presence of background traffic, which includes packet

prioritization and QoS mapping, and the use of IEEE 802.11e for SVC content streaming inIEEE 802.11 networks.

Acknowledgements

This work was supported by the COST IC1003 European Network on Quality of Experience in

Multimedia Systems and ServicesQUALINET; by the COST CZ LD12018 Modeling andverification of methods for Quality of Experience (QoE) assessment in multimedia systems

MOVERIQ; by the grant No. P102/10/1320 Research and modeling of advanced methods of

image quality evaluation of the Grant Agency of the Czech Republic; and by the project of the

Student grant agency of the Czech Technical University in Prague SGS12/077/OHK3/1T/13,

Cross-Layer Quality Optimization in New Generation Heterogeneous Wireless MobileNetworks.

References

[1] Z. He, J. Cai, C.W. Chen, Joint source channel rate-distortion analysis for adaptive modeselection and rate control in wireless video coding, IEEE Trans. Circuits Syst. Video Technol.

12 (6), 2002.


19/19

[2] C.-M. Chen, C.-W. Lin, H.-C. Wei, Y.-C. Chen, Robust video streaming over wireless lans

using multiple description transcoding and prioritized retransmission, Visual Commun. Image

Represent. 18 (3) 2007.

[3] C.H. Foh, Y. Zhang, Z. Ni, J. Cai, K.N. Ngan, Optimized cross-layer design for scalable

video transmission over the IEEE 802.11e networks,IEEE Trans. Circuits Syst. Video Technol.

17 (12), 2007.[4] A. Fiandrotti, D. Gallucci, E. Masala, E. Magli, Traffic prioritization of H.264/SVC videoover 802.11e ad hoc wireless networks, Proceedings of 17th International Conference on

Computer Communications and Networks, Virgin Islands, USA, 2008.

[5] H. Schwarz, D. Marpe, and T. Wiegand, Overview of the scalable video coding extension ofthe H.264/AVC standard,IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103

1120, 2007.

[6] J. Lee, F. De Simone, and E. Ebrahimi, "Subjective quality assessment of scalable video

coding: A survey, 2011 Third International Workshop on Quality of Multimedia Experience(QoMEX),pp.25-30, 7-9 Sept. 2011.

[7] T. Schierl, T. Stockhammer, and T. Wiegand, Mobile Video Transmission Using Scalable

Video Coding, IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no.9, pp. 1204-1217, Sept. 2007.

[8] S. Wenger, Y. K. Wang, T. Schierl, and A. Eleftheriadis, RTP payload format for SVC

video,"Internet Engineering Task Force (IETF), September 2009.

[9] W. Ye-Kui, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, IEEE Transactionson System and transport interface of svc, Circuits and Systems for Video Technology, vol. 17,

no. 9, pp. 11491163, Sept. 2007.

[10] Video Trace Library,http://dbq.multimediatech.cz/[online][11] ITU T Rec. P.910, "Subjective video quality assessment methods for multimedia

applications", Geneva, Sep. 1999.

[12] JSVM Software Manual,http://evalsvc.googlecode.com/files/SoftwareManual.doc[online]

[13] MP4BOX,http://www.videohelp.com/tools/mp4box[online][14] IEEE Standard 802.11-2007, Local and metropolitan area networks-Specific requirements

Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)

Specifications, June 2007.[15] The Network Simulator-NS2,http://www.isi.edu/nsnam/ns/[online]

[16] Z. Wang, L. Lu, and A. C. Bovik, Video quality assessment based on structural distortion

measurement, Signal Processing: Image Communication, vol. 19, no. 2, pp. 121-132, 2004.
http://dbq.multimediatech.cz/http://dbq.multimediatech.cz/http://dbq.multimediatech.cz/http://evalsvc.googlecode.com/files/SoftwareManual.dochttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://www.videohelp.com/tools/mp4boxhttp://www.videohelp.com/tools/mp4boxhttp://www.videohelp.com/tools/mp4boxhttp://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.videohelp.com/tools/mp4boxhttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://dbq.multimediatech.cz/

Documents

Objective of h.264 Content