Upload
george-m-jacob
View
221
Download
0
Embed Size (px)
Citation preview
7/30/2019 Objective of h.264 Content
1/19
http://access.feld.cvut.cz/view.php?cisloclanku=2013010001
Objective Video Quality Evaluation and
H.264/SVC Content Streaming over WLANsVydno dne 24. 01. 2013 (486 peten)
In this article, we study the H.264/SVC video delivery and its objective quality assessment with
respect to IEEE 802.11 networks, in the presence of background traffic. In particular, we
consider a scenario where a wireless multimedia server is transmitting single-layer encodedH.264/SVC and background traffic to one client and two sets of background traffic to another
client. We objectively evaluate the quality of the streamed video given background traffic with
varying bit rates, contents with different spatio-temporal information encoded at different
quantization parameter levels. All packets were given equal priority.
Keywords: SVC, WLAN, Video streaming, Background traffic, Objective video quality
Introduction
With the increasing proliferation of multimedia content over the Internet and the emergence of
handheld mobile devices like tablets, smartphones and laptops capable of streaming videocontent, wireless video communication has become attractive more than ever before, receiving
significant attention from both the industry and academia. Wireless video transmission
applications are easily deployed in homes, offices and transport vehicles.
Wireless Local Area Networks (WLANs) technologies support applications such as video
streaming, VoIP and many others, especially due to mobility, good throughput, and low budgetrequirements. Currently, there are many available WLANs, including IEEE 802.11a, IEEE
802.11b, and IEEE 802.11g, etc. The IEEE 802.11 a/b/g standards support contention-based
communication mechanism of Carrier Sense Multiple Access with Collision Avoidance(CSMA/CA). Although this mechanism has become very common, they are considered
inefficient for achieving a reasonable video quality in scenarios with high background traffic,
because they provide best-effort services which restrict QoS for high critical multimedia
applications. Wireless video communications face a lot of challenges. Delivery of real-timevideo over wireless networks imposes stringent requirements, especially in terms of bandwidth,
delay constraints, latency and loss variations. Like other wireless technologies, channel
impairments can affect the IEEE 802.11 physical transmission rate assigned to mobile users. The
actual throughput achieved by a specific user can also vary, depending on the number of usersand nature of applications sharing the same channel.
7/30/2019 Objective of h.264 Content
2/19
The Scalable Video Coding extension of H.264/MPEG-4 AVC (Advanced Video Coding)
facilitates efficient video transmissions, especially over wireless networks, allowing the encoding
of a video sequence and streaming of same over heterogeneous networks to a variety of enddevices. With H.264/SVC, different scalability techniques can be used in order to deliver the
most appropriate video bitstream based on network characteristics and mobile device
capabilities.
Multitude of studies [1-2] have been carried out on video transmission over loss-prone wireless
channel networks. Authors in [3-4] have carried out research on SVC streaming over IEEE802.11 networks. In this paper, H.264/SVC video quality transmission over IEEE 802.11
networks in the presence of background traffic is studied. In particular, we consider a scenario
where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and
background traffic to one client and two sets of background traffic to another client. Weobjectively evaluate the quality of the streamed video given background traffic with varying bit
rates, contents with different spatio-temporal information encoded at different quantization
parameter levels. All packets were given equal priority. Results indicate that received video
quality deteriorates with increasing background traffic and high content bit rate, given no packetdifferentiation at the MAC (Media Access Control) layer level. Also, contents may be affected
differently, depending on the scene complexity and coding efficiency.
H.264/SVC Encoding and Transmission
The latest H.264/MPEG-4 AVC standard provides a scalable extension, called H.264/SVC [5],making it the first standard that defines international multi-dimensional scalability. H.264/SVC
achieves significant compression efficiency and reduction in processing complexity, as well as
very good subjective quality ratings [6]. H.264/SVC scheme is known to be very valuable in
video applications over the Internet and wireless video transmission, low resolution video
applications, multicast applications, range of quality suited for different heterogeneous receivercapabilities, and resilience in bandwidth variation scenarios [7]. The bit rate adaptability
capability which is native to the scalable codec design provides content adaptations, based onchanges to network conditions. H.264 scalable video coding reuses the key features of
H.264/MPEG-4 Advanced Video Coding and also employs other techniques to provide
scalability extensions and to improve coding gain.
7/30/2019 Objective of h.264 Content
3/19
Fig. 1: Diagrammatic representation of SVC scalabilities
In general SVC can provide three types of scalability, namely temporal, spatial and SNRdimensions, allowing multiple video representations, by leaving out parts of the encoded
representations, thereby adapting bit rate and quality levels during video transmission. Scalable
bit-stream is organized into a base layer and one or several enhancement layers. The base layer is
considered more important than the enhancement layers. While the base layer needs less
transmission bandwidth due to its coarser quality, the enhancement layer requires moretransmission bandwidth due to its finer quality. Consequently, SNR/spatial/temporal scalability
achieves bandwidth scalability. Fig. 1 above shows a diagrammatic representation of SVCscalabilities.
Spatial scalability refers to the possibility of representing the same video in different spatialresolutions or sizes (e.g. QCIF, CIF and 4CIF). Generally, spatially scalable video is encoded by
using spatially up-sampled pictures from a lower layer as a prediction in a higher layer. Inter-
layer prediction techniques are used to further improve the coding efficiency.
Temporal scalability refers to the possibility of representing the same video in different
temporal resolutions or frame rates, i.e. the number of frames contained in one second of thevideo, allowing video to be played at different frame rates. It is typically implemented by making
use of temporally up-sampled pictures from a lower layer as a prediction in a higher layer.
Quality scalability, also called signal-to-noise ratio (SNR) scalability, refers to the possibility of
representing the same video in different perceptual quality levels. SNR-scalable coding quantizes
the DCT coefficients to different levels of accuracy by using different quantization parameters.
7/30/2019 Objective of h.264 Content
4/19
Scalable Video Coding, deriving its extension from H.264/AVC, maintains the concepts of
Video Coding Layer (VCL) and Network Abstraction Layer (NAL). While the VCL acts as the
interface between the encoder and video frames, employing block-based structure and supportingdifferent scalabilities, the NAL acts as the interface between the encoder and actual network
protocol, enabling the formatting of the coded videos for transmission over the packet networks,
providing necessary header information. A NAL unit consists of a header and a payload, carryingthe actual encoded video frame and its relevance in the decoding process [8]. The NALU headerdefines different parameters, including the dependency id (DID), describing the spatial
scalability; the temporal id (TID), indicating the temporal scalability hierarchically; the quality id
(QID), which is used to define the quality scalability structure; and the priority id (PID), whichassigns priority to the stream. For more details, please consult [9]
Implementations
In this section, we describe the implementation steps, starting with video sequence encoding,
simulation methodology and objective video quality evaluation. We consider a scenario where a
wireless multimedia server is transmitting single-layer encoded H.264/SVC and backgroundtraffic to one client and two sets of background traffic to another client. We objectively evaluate
the quality of the streamed video given background traffic with varying bit rates, contents with
different spatio-temporal information encoded at different quantization parameter levels.
Test Sequences
Three sequences, each of 10 seconds duration, with different genres and characteristics covering
varying spatial and temporal complexity, namely, Foreman, News and Coastguard were selected
[10].
Fig. 2: Snap shots of the video sequences
The diagram above shows the frames of the three sequences: Foreman, News and Coastguard, in
that order.
7/30/2019 Objective of h.264 Content
5/19
Fig. 3: Spatial and temporal indicators of the three contents
Fig. 3 above shows the spatial (SI) and temporal Information (TI) indices on the luminance
component of the contents, respectively: Foreman: 59.38, 20.57; News: 75.41, 23.52 and
Coastguard: 76.43, 23.50. Spatial perceptual Information (SI) and Temporal PerceptualInformation (TI) based on Sobel filter from ITU-T-Rec P.910 [11] was used in order to measure
the complexity of the scene given in Eqs. (1) and (2)
(1)
(2)
WhereFn represents the luminance plane in a video frame at time n. It is observed that Foreman
has smaller SI and TI values, compared to News and Coastguard. Detailed information regarding
the three sequences and encoder configuration is summarized in Table 1. The video sequenceswere sourced from different publicly available video traffic traces, including [10].
7/30/2019 Objective of h.264 Content
6/19
Encoding and Simulation
Fig. 4: Implementation methodology
TABLE 1: Encoder configurations
Input YUV files Foreman, News, Coastguard
Resolution CIF
Frame Rate 30 fps
Number of frames 300
Number of layers 1
GOP size 16
Search range 32
Search mode 4
MGSControl 1
CgsnrRefinement 1
Base layer mode 0
Encode key pictures 1
The implementation methodology is shown in Fig. 4 above. The three YUV video files were firstencoded using the JSVM Software Manual [12], according to the configurations further
summarized in Table 1. A set of different QP scenarios was designed to cover a wide range of
quality levels. We encoded each video using 7 scenarios in which the QP values for the base
7/30/2019 Objective of h.264 Content
7/19
layer are varied for 44, 38, 32, 26, 20, 15 and 10. The coding efficiency of H.264/SVC is
dependent on the quantization parameters of each layer. Packet traces (Network Abstraction
Layer Units) of the H.264 bit streams are generated using BitStreamExtractor.
Fig. 5 Simulation topology
TABLE 2: Wireless channel configurations
Parameter Value
MAC type 802.11
Radio propagation Propagation/TwoRayGround
Interface queue Queue/DropTail
Routing DSDV
Antenna model Antenna/Omni Antenna
Data rate 11 Mbit/s
Basic rate 1 Mbit/s
Number of mobile modes 3
Interface queue 50
The NALUs are prepared for transmission over the IP network (hinting, packetization). Theresulting H.264 video trace files are hinted using MP4Box [13] which emulates the streaming of
the *.h264 video over the network based on RTP/UDP/IP protocol stack. Large NALUs are thus
split through IP layer fragmentation. Real-time Transport Protocol (RTP) is used for transfer ofreal-time data like video streaming. Existing transport protocols like UDP (User Datagram
7/30/2019 Objective of h.264 Content
8/19
Protocol) will run under RTP. RTP provides applications that occur in real-time with end-to-end
delivery services, such as sequence numbers, types, sizes of the video frames and the number of
UDP packets used to transmit each frame, and timestamps (for packet loss and reorderingdetection, and end-to-end delay).
We conduct the simulations of H.264/SVC video transmission over IEEE 802.11 [14] using NS-2 [15]. The wireless channel configuration is summarized in Table 2. The simulated scenario
consists of three wireless nodes, one multimedia server and two clients, all within reasonable
transmission range. The multimedia server transmits H.264/SVC video and CBR traffic to Client1, while Client 2 receives FTP and CBR traffic from the server, all happening simultaneously.
Packet sizes were set to 1500 Bytes. The network topology is depicted in Fig. 5. The background
traffic generated at the server and accessed by the two clients, while streaming video traffic,
increases the virtual collisions that occur at the servers MAC layer. All the packets wereassigned equal priority and scheduled from the same access point of the multimedia server. The
experiment is designed to study the impacts of competing background traffic with different
sending rates on the streamed video quality. In order to overload the wireless transmission, the
CBR flows for the two clients are varied from 0.1, 0.5 to 1 Mbit/s each, while streaming thedifferent video sequences of different contents and different encoding QP values.
10 different initial seeds for random number generation were chosen for simulation. Results
generated were averaged over these 10 runs. After simulation, the received trace file is
generated. The received and the original NALU trace files are further combined and processed to
generate the received NALU trace. Maximum playout buffer delay at the video client is set to 5seconds. After further processing, the received NALU trace is passed through
BitStreamExtractor which generates H.264 video, which is in turn decoded with the JSVM
H264Decoder, thus obtaining an uncompressed YUV file. The reconstructed YUV file and theoriginal one are compared with objective video quality metric, to compute the overall video
quality.
Objective Video Quality Evaluation
Objective video quality algorithms are based on mathematical models that can predict image
multimedia quality by comparing a distorted signal against a reference, typically by modeling thehuman visual system. Some existing objective criteria are Mean Error Square (MSE), Peak
Signal-to-Noise Ratio (PSNR), SSIM (structural similarity) and VIF (Visual Information
Fidelity). In this experiment, PSNR [16] is adopted as our objective metric. PSNR has beenselected because it is the most widely used metric.
PSNR can be computed for both luminance (Y-PSNR) and chrominance (U-PSNR and V-PSNR)components of the video. The human eye is considered more sensitive to luminance (brightness)than chrominance (colour), therfore the PSNR is usually evaluted only for the luminance (Y)
component. The equation below shows the relationship between the PSNR of the luminance
component Y of original image and degraded image D:
7/30/2019 Objective of h.264 Content
9/19
(3)
Where Vpeak= 2k-1; k denotes number of bits per pixel.Ncolrepresents the number of columns;
Nrow the number of rows in an image. PSNR computes the error between a reconstructed image
and the original one. A larger PSNR value denotes better image quality.
Results and Discussions
Fig. 6Fig. 14 depict the results obtained from this experiment. Fig. 6 depicts the quality
comparison of the encoded only video sequences, for the three contents, encoded at differentquantization parameter values. Results indicate that lower quantization parameters lead to better
perceptual quality, depicted by higher PSNR values. Fig. 7 plots the PSNR curve vs. frame
number for the Foreman sequence, encoded only at QP = 44 and 10, and Foreman encoded at QP= 10 and transmitted under 1 Mbit/s background traffic level.
Fig. 6: Impact of quantization parameter on video quality
7/30/2019 Objective of h.264 Content
10/19
Fig. 7: Quality comparison for encoded only and transmitted sequences
7/30/2019 Objective of h.264 Content
11/19
Fig. 8: Quality comparison for transmitted sequences, QP =44
7/30/2019 Objective of h.264 Content
12/19
Fig. 9: Quality comparison for transmitted sequences, QP =38
7/30/2019 Objective of h.264 Content
13/19
Fig. 10: Quality comparison for transmitted sequences, QP =32
7/30/2019 Objective of h.264 Content
14/19
Fig. 11: Quality comparison for transmitted sequences, QP =26
7/30/2019 Objective of h.264 Content
15/19
Fig. 12: Quality comparison for transmitted sequences, QP =20
7/30/2019 Objective of h.264 Content
16/19
Fig. 13: Quality comparison for transmitted sequences, QP =15
7/30/2019 Objective of h.264 Content
17/19
Fig. 14: Quality comparison for transmitted sequences, QP =10
Analysis of the generated bit streams (Table 3) shows that the lower the quantization parameter,the higher the generated file size and consequently higher bit rates (204 Kbit/s for Foreman at QP
= 44, 6.53 Mbit/s at QP = 10; 122 Kbit/s for News at QP = 44, 2.63 Mbit/s at QP = 10; 296
Kbit/s for Coastguard at QP = 44, 8.30 Mbit/s at QP = 10), however. The QP value may howevervary during the encoding process, depending on the position of each frame within the Group of
Pictures.
TABLE 3: QP vs. bit rates
QP
Foreman
Bit rates
[Kbit/s]
News
Bit rates
[Kbit/s]
Coastguard
Bit rates
[Kbit/s]
44 204 122 296
20 1440 645 2700
15 3240 1240 4990
10 6530 2630 8300
7/30/2019 Objective of h.264 Content
18/19
Fig. 8 to Fig. 14 plot the PSNR values for the three video sequences encoded at seven different
quantization levels and transmitted from same multimedia server accessed at varying background
traffic bit rate levels. Encoded only videos generally have higher PSNR values compared to theirtransmitted counterparts. At higher quantization levels (QP = 44 to 32) and lower background bit
rate level, the PSNR value of the streamed video sequences remain same as their coded only
counterparts, meaning that no video packets were lost during transmission. However, at lowerquantization levels and higher background traffic thresholds, the PSNR values of the streamedvideo decline sharply. Content-based analysis reveals that the video sequences can react
differently to competition for channel bandwidth arising from background traffic of different bit
rates. This could be attributed to different spatio-temporal complexities of the sequences. Givenno packet pritotization, contents with high bit rates (e.g. Coastguard) suffer higher PSNR
degradation, caused by collision- induced video packet loss at the MAC layer of the streaming
server, even at same encoding quantization level.
Conclusion
This paper has presented a detailed video quality evaluation in the transmission of H.264/SVCvideo over IEEE 802.11 networks in the presence of background traffic. We considered a
scenario where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and
background traffic to one client and two sets of background traffic to another client. Weobjectively evaluated the quality of the streamed video given background traffic with varying bit
rates, contents with different spatio-temporal information encoded at different quantization
parameter level. Results indicate that received video quality deteriorates with increasing
background traffic and high content bit rate, given no packet differentiation at the MAC (MediaAccess Control) layer. Also, contents may be affected differently, depending on the scene
complexity and coding efficiency. For future work, we intend to expand the studies to tradeoffs
in video quality optimization in the presence of background traffic, which includes packet
prioritization and QoS mapping, and the use of IEEE 802.11e for SVC content streaming inIEEE 802.11 networks.
Acknowledgements
This work was supported by the COST IC1003 European Network on Quality of Experience in
Multimedia Systems and ServicesQUALINET; by the COST CZ LD12018 Modeling andverification of methods for Quality of Experience (QoE) assessment in multimedia systems
MOVERIQ; by the grant No. P102/10/1320 Research and modeling of advanced methods of
image quality evaluation of the Grant Agency of the Czech Republic; and by the project of the
Student grant agency of the Czech Technical University in Prague SGS12/077/OHK3/1T/13,
Cross-Layer Quality Optimization in New Generation Heterogeneous Wireless MobileNetworks.
References
[1] Z. He, J. Cai, C.W. Chen, Joint source channel rate-distortion analysis for adaptive modeselection and rate control in wireless video coding, IEEE Trans. Circuits Syst. Video Technol.
12 (6), 2002.
7/30/2019 Objective of h.264 Content
19/19
[2] C.-M. Chen, C.-W. Lin, H.-C. Wei, Y.-C. Chen, Robust video streaming over wireless lans
using multiple description transcoding and prioritized retransmission, Visual Commun. Image
Represent. 18 (3) 2007.
[3] C.H. Foh, Y. Zhang, Z. Ni, J. Cai, K.N. Ngan, Optimized cross-layer design for scalable
video transmission over the IEEE 802.11e networks,IEEE Trans. Circuits Syst. Video Technol.
17 (12), 2007.[4] A. Fiandrotti, D. Gallucci, E. Masala, E. Magli, Traffic prioritization of H.264/SVC videoover 802.11e ad hoc wireless networks, Proceedings of 17th International Conference on
Computer Communications and Networks, Virgin Islands, USA, 2008.
[5] H. Schwarz, D. Marpe, and T. Wiegand, Overview of the scalable video coding extension ofthe H.264/AVC standard,IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103
1120, 2007.
[6] J. Lee, F. De Simone, and E. Ebrahimi, "Subjective quality assessment of scalable video
coding: A survey, 2011 Third International Workshop on Quality of Multimedia Experience(QoMEX),pp.25-30, 7-9 Sept. 2011.
[7] T. Schierl, T. Stockhammer, and T. Wiegand, Mobile Video Transmission Using Scalable
Video Coding, IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no.9, pp. 1204-1217, Sept. 2007.
[8] S. Wenger, Y. K. Wang, T. Schierl, and A. Eleftheriadis, RTP payload format for SVC
video,"Internet Engineering Task Force (IETF), September 2009.
[9] W. Ye-Kui, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, IEEE Transactionson System and transport interface of svc, Circuits and Systems for Video Technology, vol. 17,
no. 9, pp. 11491163, Sept. 2007.
[10] Video Trace Library,http://dbq.multimediatech.cz/[online][11] ITU T Rec. P.910, "Subjective video quality assessment methods for multimedia
applications", Geneva, Sep. 1999.
[12] JSVM Software Manual,http://evalsvc.googlecode.com/files/SoftwareManual.doc[online]
[13] MP4BOX,http://www.videohelp.com/tools/mp4box[online][14] IEEE Standard 802.11-2007, Local and metropolitan area networks-Specific requirements
Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specifications, June 2007.[15] The Network Simulator-NS2,http://www.isi.edu/nsnam/ns/[online]
[16] Z. Wang, L. Lu, and A. C. Bovik, Video quality assessment based on structural distortion
measurement, Signal Processing: Image Communication, vol. 19, no. 2, pp. 121-132, 2004.
http://dbq.multimediatech.cz/http://dbq.multimediatech.cz/http://dbq.multimediatech.cz/http://evalsvc.googlecode.com/files/SoftwareManual.dochttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://www.videohelp.com/tools/mp4boxhttp://www.videohelp.com/tools/mp4boxhttp://www.videohelp.com/tools/mp4boxhttp://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.isi.edu/nsnam/ns/http://www.videohelp.com/tools/mp4boxhttp://evalsvc.googlecode.com/files/SoftwareManual.dochttp://dbq.multimediatech.cz/