Objective of h.264 Content

Embed Size (px)

Citation preview

  • 7/30/2019 Objective of h.264 Content

    1/19

    http://access.feld.cvut.cz/view.php?cisloclanku=2013010001

    Objective Video Quality Evaluation and

    H.264/SVC Content Streaming over WLANsVydno dne 24. 01. 2013 (486 peten)

    In this article, we study the H.264/SVC video delivery and its objective quality assessment with

    respect to IEEE 802.11 networks, in the presence of background traffic. In particular, we

    consider a scenario where a wireless multimedia server is transmitting single-layer encodedH.264/SVC and background traffic to one client and two sets of background traffic to another

    client. We objectively evaluate the quality of the streamed video given background traffic with

    varying bit rates, contents with different spatio-temporal information encoded at different

    quantization parameter levels. All packets were given equal priority.

    Keywords: SVC, WLAN, Video streaming, Background traffic, Objective video quality

    Introduction

    With the increasing proliferation of multimedia content over the Internet and the emergence of

    handheld mobile devices like tablets, smartphones and laptops capable of streaming videocontent, wireless video communication has become attractive more than ever before, receiving

    significant attention from both the industry and academia. Wireless video transmission

    applications are easily deployed in homes, offices and transport vehicles.

    Wireless Local Area Networks (WLANs) technologies support applications such as video

    streaming, VoIP and many others, especially due to mobility, good throughput, and low budgetrequirements. Currently, there are many available WLANs, including IEEE 802.11a, IEEE

    802.11b, and IEEE 802.11g, etc. The IEEE 802.11 a/b/g standards support contention-based

    communication mechanism of Carrier Sense Multiple Access with Collision Avoidance(CSMA/CA). Although this mechanism has become very common, they are considered

    inefficient for achieving a reasonable video quality in scenarios with high background traffic,

    because they provide best-effort services which restrict QoS for high critical multimedia

    applications. Wireless video communications face a lot of challenges. Delivery of real-timevideo over wireless networks imposes stringent requirements, especially in terms of bandwidth,

    delay constraints, latency and loss variations. Like other wireless technologies, channel

    impairments can affect the IEEE 802.11 physical transmission rate assigned to mobile users. The

    actual throughput achieved by a specific user can also vary, depending on the number of usersand nature of applications sharing the same channel.

  • 7/30/2019 Objective of h.264 Content

    2/19

    The Scalable Video Coding extension of H.264/MPEG-4 AVC (Advanced Video Coding)

    facilitates efficient video transmissions, especially over wireless networks, allowing the encoding

    of a video sequence and streaming of same over heterogeneous networks to a variety of enddevices. With H.264/SVC, different scalability techniques can be used in order to deliver the

    most appropriate video bitstream based on network characteristics and mobile device

    capabilities.

    Multitude of studies [1-2] have been carried out on video transmission over loss-prone wireless

    channel networks. Authors in [3-4] have carried out research on SVC streaming over IEEE802.11 networks. In this paper, H.264/SVC video quality transmission over IEEE 802.11

    networks in the presence of background traffic is studied. In particular, we consider a scenario

    where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and

    background traffic to one client and two sets of background traffic to another client. Weobjectively evaluate the quality of the streamed video given background traffic with varying bit

    rates, contents with different spatio-temporal information encoded at different quantization

    parameter levels. All packets were given equal priority. Results indicate that received video

    quality deteriorates with increasing background traffic and high content bit rate, given no packetdifferentiation at the MAC (Media Access Control) layer level. Also, contents may be affected

    differently, depending on the scene complexity and coding efficiency.

    H.264/SVC Encoding and Transmission

    The latest H.264/MPEG-4 AVC standard provides a scalable extension, called H.264/SVC [5],making it the first standard that defines international multi-dimensional scalability. H.264/SVC

    achieves significant compression efficiency and reduction in processing complexity, as well as

    very good subjective quality ratings [6]. H.264/SVC scheme is known to be very valuable in

    video applications over the Internet and wireless video transmission, low resolution video

    applications, multicast applications, range of quality suited for different heterogeneous receivercapabilities, and resilience in bandwidth variation scenarios [7]. The bit rate adaptability

    capability which is native to the scalable codec design provides content adaptations, based onchanges to network conditions. H.264 scalable video coding reuses the key features of

    H.264/MPEG-4 Advanced Video Coding and also employs other techniques to provide

    scalability extensions and to improve coding gain.

  • 7/30/2019 Objective of h.264 Content

    3/19

    Fig. 1: Diagrammatic representation of SVC scalabilities

    In general SVC can provide three types of scalability, namely temporal, spatial and SNRdimensions, allowing multiple video representations, by leaving out parts of the encoded

    representations, thereby adapting bit rate and quality levels during video transmission. Scalable

    bit-stream is organized into a base layer and one or several enhancement layers. The base layer is

    considered more important than the enhancement layers. While the base layer needs less

    transmission bandwidth due to its coarser quality, the enhancement layer requires moretransmission bandwidth due to its finer quality. Consequently, SNR/spatial/temporal scalability

    achieves bandwidth scalability. Fig. 1 above shows a diagrammatic representation of SVCscalabilities.

    Spatial scalability refers to the possibility of representing the same video in different spatialresolutions or sizes (e.g. QCIF, CIF and 4CIF). Generally, spatially scalable video is encoded by

    using spatially up-sampled pictures from a lower layer as a prediction in a higher layer. Inter-

    layer prediction techniques are used to further improve the coding efficiency.

    Temporal scalability refers to the possibility of representing the same video in different

    temporal resolutions or frame rates, i.e. the number of frames contained in one second of thevideo, allowing video to be played at different frame rates. It is typically implemented by making

    use of temporally up-sampled pictures from a lower layer as a prediction in a higher layer.

    Quality scalability, also called signal-to-noise ratio (SNR) scalability, refers to the possibility of

    representing the same video in different perceptual quality levels. SNR-scalable coding quantizes

    the DCT coefficients to different levels of accuracy by using different quantization parameters.

  • 7/30/2019 Objective of h.264 Content

    4/19

    Scalable Video Coding, deriving its extension from H.264/AVC, maintains the concepts of

    Video Coding Layer (VCL) and Network Abstraction Layer (NAL). While the VCL acts as the

    interface between the encoder and video frames, employing block-based structure and supportingdifferent scalabilities, the NAL acts as the interface between the encoder and actual network

    protocol, enabling the formatting of the coded videos for transmission over the packet networks,

    providing necessary header information. A NAL unit consists of a header and a payload, carryingthe actual encoded video frame and its relevance in the decoding process [8]. The NALU headerdefines different parameters, including the dependency id (DID), describing the spatial

    scalability; the temporal id (TID), indicating the temporal scalability hierarchically; the quality id

    (QID), which is used to define the quality scalability structure; and the priority id (PID), whichassigns priority to the stream. For more details, please consult [9]

    Implementations

    In this section, we describe the implementation steps, starting with video sequence encoding,

    simulation methodology and objective video quality evaluation. We consider a scenario where a

    wireless multimedia server is transmitting single-layer encoded H.264/SVC and backgroundtraffic to one client and two sets of background traffic to another client. We objectively evaluate

    the quality of the streamed video given background traffic with varying bit rates, contents with

    different spatio-temporal information encoded at different quantization parameter levels.

    Test Sequences

    Three sequences, each of 10 seconds duration, with different genres and characteristics covering

    varying spatial and temporal complexity, namely, Foreman, News and Coastguard were selected

    [10].

    Fig. 2: Snap shots of the video sequences

    The diagram above shows the frames of the three sequences: Foreman, News and Coastguard, in

    that order.

  • 7/30/2019 Objective of h.264 Content

    5/19

    Fig. 3: Spatial and temporal indicators of the three contents

    Fig. 3 above shows the spatial (SI) and temporal Information (TI) indices on the luminance

    component of the contents, respectively: Foreman: 59.38, 20.57; News: 75.41, 23.52 and

    Coastguard: 76.43, 23.50. Spatial perceptual Information (SI) and Temporal PerceptualInformation (TI) based on Sobel filter from ITU-T-Rec P.910 [11] was used in order to measure

    the complexity of the scene given in Eqs. (1) and (2)

    (1)

    (2)

    WhereFn represents the luminance plane in a video frame at time n. It is observed that Foreman

    has smaller SI and TI values, compared to News and Coastguard. Detailed information regarding

    the three sequences and encoder configuration is summarized in Table 1. The video sequenceswere sourced from different publicly available video traffic traces, including [10].

  • 7/30/2019 Objective of h.264 Content

    6/19

    Encoding and Simulation

    Fig. 4: Implementation methodology

    TABLE 1: Encoder configurations

    Input YUV files Foreman, News, Coastguard

    Resolution CIF

    Frame Rate 30 fps

    Number of frames 300

    Number of layers 1

    GOP size 16

    Search range 32

    Search mode 4

    MGSControl 1

    CgsnrRefinement 1

    Base layer mode 0

    Encode key pictures 1

    The implementation methodology is shown in Fig. 4 above. The three YUV video files were firstencoded using the JSVM Software Manual [12], according to the configurations further

    summarized in Table 1. A set of different QP scenarios was designed to cover a wide range of

    quality levels. We encoded each video using 7 scenarios in which the QP values for the base

  • 7/30/2019 Objective of h.264 Content

    7/19

    layer are varied for 44, 38, 32, 26, 20, 15 and 10. The coding efficiency of H.264/SVC is

    dependent on the quantization parameters of each layer. Packet traces (Network Abstraction

    Layer Units) of the H.264 bit streams are generated using BitStreamExtractor.

    Fig. 5 Simulation topology

    TABLE 2: Wireless channel configurations

    Parameter Value

    MAC type 802.11

    Radio propagation Propagation/TwoRayGround

    Interface queue Queue/DropTail

    Routing DSDV

    Antenna model Antenna/Omni Antenna

    Data rate 11 Mbit/s

    Basic rate 1 Mbit/s

    Number of mobile modes 3

    Interface queue 50

    The NALUs are prepared for transmission over the IP network (hinting, packetization). Theresulting H.264 video trace files are hinted using MP4Box [13] which emulates the streaming of

    the *.h264 video over the network based on RTP/UDP/IP protocol stack. Large NALUs are thus

    split through IP layer fragmentation. Real-time Transport Protocol (RTP) is used for transfer ofreal-time data like video streaming. Existing transport protocols like UDP (User Datagram

  • 7/30/2019 Objective of h.264 Content

    8/19

    Protocol) will run under RTP. RTP provides applications that occur in real-time with end-to-end

    delivery services, such as sequence numbers, types, sizes of the video frames and the number of

    UDP packets used to transmit each frame, and timestamps (for packet loss and reorderingdetection, and end-to-end delay).

    We conduct the simulations of H.264/SVC video transmission over IEEE 802.11 [14] using NS-2 [15]. The wireless channel configuration is summarized in Table 2. The simulated scenario

    consists of three wireless nodes, one multimedia server and two clients, all within reasonable

    transmission range. The multimedia server transmits H.264/SVC video and CBR traffic to Client1, while Client 2 receives FTP and CBR traffic from the server, all happening simultaneously.

    Packet sizes were set to 1500 Bytes. The network topology is depicted in Fig. 5. The background

    traffic generated at the server and accessed by the two clients, while streaming video traffic,

    increases the virtual collisions that occur at the servers MAC layer. All the packets wereassigned equal priority and scheduled from the same access point of the multimedia server. The

    experiment is designed to study the impacts of competing background traffic with different

    sending rates on the streamed video quality. In order to overload the wireless transmission, the

    CBR flows for the two clients are varied from 0.1, 0.5 to 1 Mbit/s each, while streaming thedifferent video sequences of different contents and different encoding QP values.

    10 different initial seeds for random number generation were chosen for simulation. Results

    generated were averaged over these 10 runs. After simulation, the received trace file is

    generated. The received and the original NALU trace files are further combined and processed to

    generate the received NALU trace. Maximum playout buffer delay at the video client is set to 5seconds. After further processing, the received NALU trace is passed through

    BitStreamExtractor which generates H.264 video, which is in turn decoded with the JSVM

    H264Decoder, thus obtaining an uncompressed YUV file. The reconstructed YUV file and theoriginal one are compared with objective video quality metric, to compute the overall video

    quality.

    Objective Video Quality Evaluation

    Objective video quality algorithms are based on mathematical models that can predict image

    multimedia quality by comparing a distorted signal against a reference, typically by modeling thehuman visual system. Some existing objective criteria are Mean Error Square (MSE), Peak

    Signal-to-Noise Ratio (PSNR), SSIM (structural similarity) and VIF (Visual Information

    Fidelity). In this experiment, PSNR [16] is adopted as our objective metric. PSNR has beenselected because it is the most widely used metric.

    PSNR can be computed for both luminance (Y-PSNR) and chrominance (U-PSNR and V-PSNR)components of the video. The human eye is considered more sensitive to luminance (brightness)than chrominance (colour), therfore the PSNR is usually evaluted only for the luminance (Y)

    component. The equation below shows the relationship between the PSNR of the luminance

    component Y of original image and degraded image D:

  • 7/30/2019 Objective of h.264 Content

    9/19

    (3)

    Where Vpeak= 2k-1; k denotes number of bits per pixel.Ncolrepresents the number of columns;

    Nrow the number of rows in an image. PSNR computes the error between a reconstructed image

    and the original one. A larger PSNR value denotes better image quality.

    Results and Discussions

    Fig. 6Fig. 14 depict the results obtained from this experiment. Fig. 6 depicts the quality

    comparison of the encoded only video sequences, for the three contents, encoded at differentquantization parameter values. Results indicate that lower quantization parameters lead to better

    perceptual quality, depicted by higher PSNR values. Fig. 7 plots the PSNR curve vs. frame

    number for the Foreman sequence, encoded only at QP = 44 and 10, and Foreman encoded at QP= 10 and transmitted under 1 Mbit/s background traffic level.

    Fig. 6: Impact of quantization parameter on video quality

  • 7/30/2019 Objective of h.264 Content

    10/19

    Fig. 7: Quality comparison for encoded only and transmitted sequences

  • 7/30/2019 Objective of h.264 Content

    11/19

    Fig. 8: Quality comparison for transmitted sequences, QP =44

  • 7/30/2019 Objective of h.264 Content

    12/19

    Fig. 9: Quality comparison for transmitted sequences, QP =38

  • 7/30/2019 Objective of h.264 Content

    13/19

    Fig. 10: Quality comparison for transmitted sequences, QP =32

  • 7/30/2019 Objective of h.264 Content

    14/19

    Fig. 11: Quality comparison for transmitted sequences, QP =26

  • 7/30/2019 Objective of h.264 Content

    15/19

    Fig. 12: Quality comparison for transmitted sequences, QP =20

  • 7/30/2019 Objective of h.264 Content

    16/19

    Fig. 13: Quality comparison for transmitted sequences, QP =15

  • 7/30/2019 Objective of h.264 Content

    17/19

    Fig. 14: Quality comparison for transmitted sequences, QP =10

    Analysis of the generated bit streams (Table 3) shows that the lower the quantization parameter,the higher the generated file size and consequently higher bit rates (204 Kbit/s for Foreman at QP

    = 44, 6.53 Mbit/s at QP = 10; 122 Kbit/s for News at QP = 44, 2.63 Mbit/s at QP = 10; 296

    Kbit/s for Coastguard at QP = 44, 8.30 Mbit/s at QP = 10), however. The QP value may howevervary during the encoding process, depending on the position of each frame within the Group of

    Pictures.

    TABLE 3: QP vs. bit rates

    QP

    Foreman

    Bit rates

    [Kbit/s]

    News

    Bit rates

    [Kbit/s]

    Coastguard

    Bit rates

    [Kbit/s]

    44 204 122 296

    20 1440 645 2700

    15 3240 1240 4990

    10 6530 2630 8300

  • 7/30/2019 Objective of h.264 Content

    18/19

    Fig. 8 to Fig. 14 plot the PSNR values for the three video sequences encoded at seven different

    quantization levels and transmitted from same multimedia server accessed at varying background

    traffic bit rate levels. Encoded only videos generally have higher PSNR values compared to theirtransmitted counterparts. At higher quantization levels (QP = 44 to 32) and lower background bit

    rate level, the PSNR value of the streamed video sequences remain same as their coded only

    counterparts, meaning that no video packets were lost during transmission. However, at lowerquantization levels and higher background traffic thresholds, the PSNR values of the streamedvideo decline sharply. Content-based analysis reveals that the video sequences can react

    differently to competition for channel bandwidth arising from background traffic of different bit

    rates. This could be attributed to different spatio-temporal complexities of the sequences. Givenno packet pritotization, contents with high bit rates (e.g. Coastguard) suffer higher PSNR

    degradation, caused by collision- induced video packet loss at the MAC layer of the streaming

    server, even at same encoding quantization level.

    Conclusion

    This paper has presented a detailed video quality evaluation in the transmission of H.264/SVCvideo over IEEE 802.11 networks in the presence of background traffic. We considered a

    scenario where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and

    background traffic to one client and two sets of background traffic to another client. Weobjectively evaluated the quality of the streamed video given background traffic with varying bit

    rates, contents with different spatio-temporal information encoded at different quantization

    parameter level. Results indicate that received video quality deteriorates with increasing

    background traffic and high content bit rate, given no packet differentiation at the MAC (MediaAccess Control) layer. Also, contents may be affected differently, depending on the scene

    complexity and coding efficiency. For future work, we intend to expand the studies to tradeoffs

    in video quality optimization in the presence of background traffic, which includes packet

    prioritization and QoS mapping, and the use of IEEE 802.11e for SVC content streaming inIEEE 802.11 networks.

    Acknowledgements

    This work was supported by the COST IC1003 European Network on Quality of Experience in

    Multimedia Systems and ServicesQUALINET; by the COST CZ LD12018 Modeling andverification of methods for Quality of Experience (QoE) assessment in multimedia systems

    MOVERIQ; by the grant No. P102/10/1320 Research and modeling of advanced methods of

    image quality evaluation of the Grant Agency of the Czech Republic; and by the project of the

    Student grant agency of the Czech Technical University in Prague SGS12/077/OHK3/1T/13,

    Cross-Layer Quality Optimization in New Generation Heterogeneous Wireless MobileNetworks.

    References

    [1] Z. He, J. Cai, C.W. Chen, Joint source channel rate-distortion analysis for adaptive modeselection and rate control in wireless video coding, IEEE Trans. Circuits Syst. Video Technol.

    12 (6), 2002.

  • 7/30/2019 Objective of h.264 Content

    19/19

    [2] C.-M. Chen, C.-W. Lin, H.-C. Wei, Y.-C. Chen, Robust video streaming over wireless lans

    using multiple description transcoding and prioritized retransmission, Visual Commun. Image

    Represent. 18 (3) 2007.

    [3] C.H. Foh, Y. Zhang, Z. Ni, J. Cai, K.N. Ngan, Optimized cross-layer design for scalable

    video transmission over the IEEE 802.11e networks,IEEE Trans. Circuits Syst. Video Technol.

    17 (12), 2007.[4] A. Fiandrotti, D. Gallucci, E. Masala, E. Magli, Traffic prioritization of H.264/SVC videoover 802.11e ad hoc wireless networks, Proceedings of 17th International Conference on

    Computer Communications and Networks, Virgin Islands, USA, 2008.

    [5] H. Schwarz, D. Marpe, and T. Wiegand, Overview of the scalable video coding extension ofthe H.264/AVC standard,IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103

    1120, 2007.

    [6] J. Lee, F. De Simone, and E. Ebrahimi, "Subjective quality assessment of scalable video

    coding: A survey, 2011 Third International Workshop on Quality of Multimedia Experience(QoMEX),pp.25-30, 7-9 Sept. 2011.

    [7] T. Schierl, T. Stockhammer, and T. Wiegand, Mobile Video Transmission Using Scalable

    Video Coding, IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no.9, pp. 1204-1217, Sept. 2007.

    [8] S. Wenger, Y. K. Wang, T. Schierl, and A. Eleftheriadis, RTP payload format for SVC

    video,"Internet Engineering Task Force (IETF), September 2009.

    [9] W. Ye-Kui, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, IEEE Transactionson System and transport interface of svc, Circuits and Systems for Video Technology, vol. 17,

    no. 9, pp. 11491163, Sept. 2007.

    [10] Video Trace Library,http://dbq.multimediatech.cz/[online][11] ITU T Rec. P.910, "Subjective video quality assessment methods for multimedia

    applications", Geneva, Sep. 1999.

    [12] JSVM Software Manual,http://evalsvc.googlecode.com/files/SoftwareManual.doc[online]

    [13] MP4BOX,http://www.videohelp.com/tools/mp4box[online][14] IEEE Standard 802.11-2007, Local and metropolitan area networks-Specific requirements

    Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)

    Specifications, June 2007.[15] The Network Simulator-NS2,http://www.isi.edu/nsnam/ns/[online]

    [16] Z. Wang, L. Lu, and A. C. Bovik, Video quality assessment based on structural distortion

    measurement, Signal Processing: Image Communication, vol. 19, no. 2, pp. 121-132, 2004.

    http://dbq.multimediatech.cz/http://dbq.multimediatech.cz/http://dbq.multimediatech.cz/http://evalsvc.googlecode.com/files/SoftwareManual.dochttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://www.videohelp.com/tools/mp4boxhttp://www.videohelp.com/tools/mp4boxhttp://www.videohelp.com/tools/mp4boxhttp://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.videohelp.com/tools/mp4boxhttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://dbq.multimediatech.cz/