17
Signal Processing: Image Communication 22 (2007) 69–85 Accurate packet-by-packet measurement and analysis of video streams across an Internet tight link M. Paredes Farrera, M. Fleury , M. Ghanbari Electronic Systems Engineering Department, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK Received 15 March 2006; received in revised form 3 November 2006; accepted 14 November 2006 Abstract The response to a video stream across an Internet end-to-end path particularly depends on the performance at the path’s tight link, which can be examined in a simple network testbed. A packet-by-packet (PbP) measurement methodology applied to tight link analysis requires a real-time operating system to gain the desired timing resolution during traffic generation experiments. If, as is common for other purposes, the analysis was simply in terms of average packet rate per second, no burst pattern would be apparent, and without packet-level measurement of instantaneous bandwidth the differing overheads would not be apparent. An illustrative case study, based upon the H.263+video codec, confirms the advantage of the PbP methodology in determining received video characteristics according to packetization scheme, inter- packet gap, router response, and background traffic. Tests show that routers become unreliable if the packet arrival rate passes a critical threshold, one consequence of which is that reported router processor load also becomes unreliable. Video stream application programmers should take steps to reduce packet rates and aggregate packet rates may be reduced through network management. In the case study, a burst of just nine packets increased the probability of packet loss, while the video quality could be improved by packing at least two slices into a packet. The paper demonstrates that an appropriate packetization scheme has an important role in ensuring received video quality, but a physical testbed and a precise measurement methodology are needed to identify that scheme. r 2006 Elsevier B.V. All rights reserved. Keywords: Video streaming; Packet-by-packet analysis; Router response 1. Introduction The most recent Sprint inter-packet (IP) backbone survey [8] reported that 60% of traffic on some links is now generated by streaming or file sharing applications, as opposed to 30% by Web traffic. Current video applications include streaming of pre- encoded video, the exchange of personal video clips (peer-to-peer streaming), and the delivery of sports and news clips (possibly involving real-time (RT) generation of video). The Advanced Network and Services Surveyor Project [11] monitors the large- scale behavior of the Internet to determine its characteristics. A key finding [16] of this and other surveys is that most of the Internet core is relatively lightly loaded, with hotspots at intersections between ARTICLE IN PRESS www.elsevier.com/locate/image 0923-5965/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2006.11.004 Corresponding author. Tel.: +44 1026 872817; fax: +44 1026 872900. E-mail addresses: [email protected] (M. Paredes Farrera), [email protected] (M. Fleury), [email protected] (M. Ghanbari).

Accurate packet-by-packet measurement and analysis of video

  • Upload
    ronny72

  • View
    1.066

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

0923-5965/$ - se

doi:10.1016/j.im

�Correspondfax: +441026 8

E-mail addre

[email protected]

(M. Ghanbari).

Signal Processing: Image Communication 22 (2007) 69–85

www.elsevier.com/locate/image

Accurate packet-by-packet measurement and analysis of videostreams across an Internet tight link

M. Paredes Farrera, M. Fleury�, M. Ghanbari

Electronic Systems Engineering Department, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK

Received 15 March 2006; received in revised form 3 November 2006; accepted 14 November 2006

Abstract

The response to a video stream across an Internet end-to-end path particularly depends on the performance at the path’s

tight link, which can be examined in a simple network testbed. A packet-by-packet (PbP) measurement methodology

applied to tight link analysis requires a real-time operating system to gain the desired timing resolution during traffic

generation experiments. If, as is common for other purposes, the analysis was simply in terms of average packet rate per

second, no burst pattern would be apparent, and without packet-level measurement of instantaneous bandwidth the

differing overheads would not be apparent. An illustrative case study, based upon the H.263+video codec, confirms the

advantage of the PbP methodology in determining received video characteristics according to packetization scheme, inter-

packet gap, router response, and background traffic. Tests show that routers become unreliable if the packet arrival rate

passes a critical threshold, one consequence of which is that reported router processor load also becomes unreliable. Video

stream application programmers should take steps to reduce packet rates and aggregate packet rates may be reduced

through network management. In the case study, a burst of just nine packets increased the probability of packet loss, while

the video quality could be improved by packing at least two slices into a packet. The paper demonstrates that an

appropriate packetization scheme has an important role in ensuring received video quality, but a physical testbed and a

precise measurement methodology are needed to identify that scheme.

r 2006 Elsevier B.V. All rights reserved.

Keywords: Video streaming; Packet-by-packet analysis; Router response

1. Introduction

The most recent Sprint inter-packet (IP) backbonesurvey [8] reported that 60% of traffic on some linksis now generated by streaming or file sharing

e front matter r 2006 Elsevier B.V. All rights reserved

age.2006.11.004

ing author. Tel.: +441026 872817;

72900.

sses: [email protected] (M. Paredes Farrera),

.uk (M. Fleury), [email protected]

applications, as opposed to 30% by Web traffic.Current video applications include streaming of pre-encoded video, the exchange of personal video clips(peer-to-peer streaming), and the delivery of sportsand news clips (possibly involving real-time (RT)generation of video). The Advanced Network andServices Surveyor Project [11] monitors the large-scale behavior of the Internet to determine itscharacteristics. A key finding [16] of this and othersurveys is that most of the Internet core is relativelylightly loaded, with hotspots at intersections between

.

Page 2: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8570

networks and at access points. This should not besurprising, as over provisioning on packet networksis a common way [8] to protect the failure of networkelements and support traffic growth. Therefore, tofind likely causes of video quality degradation needsa careful examination of video stream behavior at thetight or bottleneck link,1 which normally occurs atnetwork boundaries.

In this study, IP video traffic is measured inisolation on a network testbed, with competingInternet traffic being represented as generatedbackground traffic across a critical bottleneck ortight link. A tight link is the link with the leastavailable bandwidth on an entire end-to-end path.In a study to test the ability of the STAB bandwidthprobing tool [21] to locate thin links,2 the link mostlikely to be tight in terms of available bandwidthacross a 15/16 hop Internet path was found to belocated close to the edge of the end-to-end path,which may well be a general conclusion. Modeling atight link in a network testbed says nothing aboutoverall delay or variation of delay across an entirepath. Nor does the model necessarily representrealistic background traffic. Measurements takendirectly from the Internet are needed for thispurpose, especially if unusual traffic events are ofinterest. However, the testbed approach is designedto stress the video stream as it passes throughtypical routers working at the limits of theirperformance range. The intention is to identifypotential problems that a video stream will en-counter with a view to guiding the design of astreaming application. Many simulation studies ofvideo stream congestion control, e.g. [22,26], use asimilar simple network topology to the one in thisstudy. Consequentially, the testbed has also beenused as a means of calibrating network simulators,although this topic is not pursued in this paper.

As video and audio packets are often closelyspaced, loss correlation is a more serious problemfor video streaming than for other applications [33],for which losses may appear as essentially random[1]. Video IP spacing in time is typically closer thanaudio (averaging around 20ms) and unlike audio,video packets vary in length. For example, for a30 frame/s stream, each frame must be delivered

1Strictly, a bottleneck link also includes the possibility of a

narrow link, one with minimum capacity on a network path. The

link with minimum capacity is not always the same as the link

with least available bandwidth on a network path [7].2A thin link has less available bandwidth than all those

preceding it on the path.

every 33ms and if individual Common IntermediateFormat (CIF)-sized pictures are broken into 18variable-sized slices, with one slice per IP, then theIP gap (IPG) is 1.9ms if packets are generated atequal intervals. Video bandwidths can clearly bemuch higher than other flows [25], which is aproblem if the video traffic takes up a sizeableproportion of the bandwidth across a tight link.Video delivery should avoid regimes that result insignificant packet losses at the router queues.However, some observers note [23] that streamingapplications often send a bursty stream, either forreasons of coding efficiency or when a none RToperating system (o.s.) falls behind its schedule andreleases a packet burst. This paper assumes that abursty stream is present at the tight link, possiblyresulting from one session amongst a series ofparallel sessions generated by a server behind a fastlink. Other sessions may not generate bursty trafficand burstiness may be reduced if the server liesbehind a slow link. In most reasonable o.s./driverimplementations, the driver is relatively immunefrom scheduling, implying that, if applicationscheduling is not applied, that coding efficiency isa cause of packet bursts.

Choice of metric is an important issue in videostreaming research. For example, in [23] a standar-dized congestion control unit and standardizedset of reported metrics is proposed. In particular,per-packet or instantaneous bandwidth is carefullydefined in [23], as the author considers that ‘‘per-packet bandwidth reporting is the most appropriatefor adaptive streaming applications’’, because ofits responsiveness to changes in available band-width. The work in [23] is of a theoretical nature,whereas the measurement methodology developedon this paper’s network testbed could directly betransferred to a congestion control unit. TCP(not generally used for video streaming) adjusts itswindow size according to the packet loss rate(from dropped acknowledgements) and round-triptime (from packet timers), and it is likely thathigher-end routers adopt a similar strategy forqueue management. The Cisco 7500 series invokesinput buffering upon finding the output queue iscongested. The lower-end Cisco router employedin this paper allows the value of metrics such aspacket loss rate and CPU usage to be reportedback to the user, and we have taken advantage ofthat in the experiments. In active queue manage-ment systems [10] other metrics are possible such asTCP goodput, TCP and User Datagram Protocol

Page 3: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 71

(UDP) loss rate, queueing delay and consecutivepacket loss.

In summary, the main objective of this paper is toprovide a measurement and analysis methodologythat will aid the design of video streaming applica-tions. If analysis is only in terms of average packetrate per second, as might be used for networkdimensioning and similar purposes, no burst patternwould be apparent, and without packet-levelmeasurement of instantaneous bandwidth the vary-ing overheads would not be visible. In particular,the methodology is applied to the study of bottle-neck links, and a case study on packetizationschemes for an H.263+ [28] encoded video streamdemonstrates the value of the approach. The PSNRof the delivered video stream is significantlyimproved if an appropriate packetization schemeis selected.

The remainder of this paper is organized asfollows. Section 2 details the video streamingnetwork testbed, the software tools employed, andthe measurement methodology. Section 3 illustratesthe need for a testbed by examining router responseat a tight link. Section 4 applies a video streamacross the congested tight link and identifies therole of appropriate packetization in improvingthe delivered video quality, given the likelyrouter response. Finally, Section 5 presents someconclusions.

Cisco

Router_Linux RouterLinux Sender

100 Mbit/s100 Mbit/s

Fig. 1. Simple network testbed employed to model the effect

Table 1

Network component settings

Linux machines

CPU Pentium-IV 1.7GHz

NIC Intel Pro 10/100

Queue policy Fast FIFO

Queue length (QL) 100

MTU 1500

OS Linux kernel v. 2.4.9

2. Traffic measurement methodology

2.1. Network testbed configuration

Fig. 1 shows the simple network testbed em-ployed in the experiments. Clearly, the bottlenecklink is located in the 2Mb/s serial link between thetwo Cisco 2600 routers. Otherwise, 100Mb/s fastEthernet links connect the testbed components toensure no other source of congestion. The sendermachine hosts the traffic generator stg (Section2.3), while the Linux router monitors and storestraffic data flowing onto the bottleneck link withtcpdump (Section 2.2). Likewise, the receivermonitors and stores traffic data arriving at thereceiver. To aid replication of the setup theconfiguration details are given in Table 1. Small-sized output queues are employed at routers toavoid delay to TCP packets, as TCP—of course, thedominant Internet protocol—relies on round-triptimes to govern flow control. Although, the videostream generated in Section 4 is carried by UDP, thedefault buffer size settings of the Cisco router wereinitially retained, as these would be the likely sizes ina realistic Internet environment. In the interests ofaccurate scheduling of packets in time, the Linuxsender o.s. kernel is run with the KURT RT patch,as further discussed in Section 2.3. Networkplanners [12] commonly recommend to clients a

A

Cisco

Router_BLinux

Receiver

2 Mbit/s 100 Mbit/s

of a tight link on an Internet path with Cisco routers.

Routers

Model Cisco 2600

Software Version 12.2 (13a) of Cisco IOS

Queue policy FIFO

Queue length (QL) In 75, out 40

MTU 1500

Page 4: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8572

T1 or E1 link with bandwidth of, respectively, 1.544or 2.048Mb/s between a LAN and the bordergateway (or a satellite link although with greaterlatency). Cost is also a significant consideration inselection of a Cisco 2600 series router, and, hence,the same router is commonly found at the LANedge in network plans [5]. A Cisco 2600 series routerhas a Motorola MPC860 40MHz CPU with a20MHz internal bus clock and a default 4MBDRAM packet memory [24].

2.2. Traffic monitoring

The monitoring tool employed was the well-known software utility tcpdump [27] layered on thelibpcap monitoring library, which runs on anEthernet interface set to promiscuous mode. Mon-itoring points were setup on the three Linux PCs inthe testbed. The tcpdump program may in somecircumstances [19] present ‘bugs’ and timing errorsthat will affect the accuracy (nearness to the truevalue) and precision (consistency of measurement)of timestamps. Example measures taken to avoiderrors were:

3

Placing tcpdump on a separate Linux routerrather than the Linux sender to avoid CPUoverload of the sender machine, which wouldresult in packet drops by the monitor process.

� Only taking relative time measurements, thus,

avoiding the need to synchronize clocks.

� Not using a high-speed link, which otherwise can

also lead to timestamp repetitions.

� Monitoring the CPU load (Pentium-IV in Table

1) to avoid packet drops while monitoring withtcpdump.

� Making a sanity check to ensure that all packets

sent could be accounted for.

Measurement errors under Linux can still occur ifthe time resolution is too brief. In order to establishconfidence in the accuracy and precision of anytimestamps, tests were carried out at the monitoringpoints in order to find the time range that curtailederrors in the measurements. Based on test results wefound that a safe range for the experiments was fortime values greater than 90–100 ms. A DAG3 card[15] with Global Positioning System (GPS) moduleto create time stamps is an alternative solution thatavoids tcpdump’s vagaries. However, a DAG card

DAG is not an acronym.

may not be a cost-effective solution if used as partof a congestion control unit.

2.3. Traffic generator

In this paper’s methodology, video and audiotraffic patterns are classified into two types,constant bit-rate (CBR) and variable bit-rate(VBR), with CBR traffic being defined as ‘‘a trafficpattern with a steady bit rate during a given timeinterval’’ and VBR as ‘‘a traffic pattern with achanging bit rate during a given time interval’’. Forthe experiments, the traffic generator componentswere stg and rtg, respectively, for sending andreceiving traffic. Both are part of the NCTUns [29]network simulator package. NCTUns is a simulatorthat employs protocol stack re-entry to allow anapplication to be accurately emulated. As the TCP/IP protocol stack is directly incorporated intoNCTUns, stg and rtg are easily transferred towork in a real network environment rather thanwithin a simulator. The generator was modified4 towork on a normal Linux system (as NCTUnsoriginally ran on the OpenBSD o.s.). One cancreate packet-by-packet (PbP) traffic patterns underthe UDP by establishing the behavior of the packetlength (PL) and IPG in every packet through aninput trace file.

A fundamental requirement of PbP analysis is theability to create extremely precise and predictabletraffic patterns. Generating packets with harddeadlines requires a RT o.s. Accordingly, the Linuxkernel on the testbed machines was patched with theKansas University RT (KURT) kernel [9]. TheKURT kernel modification allows event schedulingwith a resolution of tens of microseconds. KURTdecreases kernel latency by running Linux itself as abackground process controlled by a small RTexecutive (RTE). The desired accuracy was obtainedby running stg as a normal Linux process ratherthan a specifically RT process under the control ofthe RTE.

Precise event scheduling was established in orderto perform reliable experiments. In live applications,the PIAT may vary from the desired value becauseof application-level scheduling; prior network jitteron previous links; and smoothing by decoderbuffers. The experiments represent a ground truth,without these effects included.

4The modified version can be downloaded from: http://

privatewww.essex.ac.uk/�mpared/perf-tools/srtg.tgz.

Page 5: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 73

2.4. Traffic metrics

For the PbP video experiments, three metricsselected were: PL, Packet Inter-Arrival Time(PIAT),5 and Packet Throughput (PT). Surpris-ingly, PL is not widely utilized as a metric inmeasurement studies and traffic analysis, althoughthis metric provides an insight into commonapplication traffic patterns. In an encoded videostreaming session, the PL varies depending on thepacketization scheme employed and headers addedby the protocol used to transmit the video session.Apart from IP and UDP headers, a Real-timeProtocol (RTP) or equivalent header [25] is added atthe application layer. The PIAT is another impor-tant performance metric when observing packetspacing during a video streaming session. The PIATmetric is one of the most sensitive to networkcondition changes: transmission delays, queuingdelays, packet loss, packets routed by differentroutes, fragmentation and other hardware andsoftware processes involved during packet transfer.Hence, it is not common to observe regular patternsfor this metric. Finally, as mentioned in Section 1, itis important [23] to define PT carefully. It representsthe throughput arising from one packet. Forapplication-level studies the PT affects a router’sresponse and, hence is more relevant than theavailable bandwidth. The PT was calculated in thefollowing way. If a pair of packets is observed,the first packet’s length is divided by the timedifference between the second and first packet, i.e.by the ‘PIAT’. This can be expressed for packetnumber n at arrival time tn by Eq. (1).

PTn ¼PLn

tnþ1 � tn

¼PLn

PIATn

, (1)

in which the ftig; i ¼ 1; 2; . . . ; n are arrival times atthe receiver. Again, although used in Paxson’s well-known tcptrace tool [20], this metric is otherwisenot common in traffic analysis.

2.5. Analysis tool

In order to analyze the tcpdump tracefiles, anespecially designed tool called tcpflw was prepared.Tcpflw is actually applicable to UDP and notsimply TCP. Tcpflw categorizes traffic based on

5Note that as elsewhere in the literature, PIAT refers to the

desired PIAT as generated at the video packet source, and is

synonymous with IPG.

their IP flow characteristics, as recommended by theInternet Engineering Task Force (IETF) [34]. AnIP-flow is defined as a group of packets that sharesome or all the following characteristics: source anddestination address; source and destination ports;and protocol. Tcpflw can read tcpdump and ns-2[4] tracefiles. Every flow is visualized linearly (bytime) and by a frequency histogram. Second-orderstatistics are also obtained for every metric.

3. Testbed characterization

Two scenarios established firstly the accuracy thatthe traffic generator was capable of when generatingpackets and secondly the response to traffic of theCisco routers on either side of the bottleneck link.

3.1. Traffic generator accuracy

The stg traffic generator was operated underUDP, and in tracefile mode. For this test, the twoCisco routers and the serial link of Fig. 1 wereremoved so that the Linux sender was connected tothe Linux router, which in turn was connected tothe Linux receiver over the 100Mb/s link. PIATmeasurements were compared on a normal Linuxkernel and then the KURT patched Linux kernel,becoming an RT kernel. The traffic pattern wasCBR. The PL was fixed at 60 byte (B). The trafficgenerator generated streams of 2-min duration (astream per data point in Fig. 2), with the sourcePIAT varying from 1� 10�4 to 1� 10�1 s. There-fore, each of the streams resulted in a minimum of1200 packets transmitted. (Note also that theestimated plot in Fig. 2 is the ideal measuredPIAT.) From Fig. 2, observe that the PIATmeasured for a normal kernel is a constant valueof 0.02 s for any value fed into the traffic generatorless than 0.02 s. This implies that even if the trafficgenerator is instructed to deliver packets with (say)a 0.01 PIAT it will only be able to send packets at0.02 s. The RT kernel improves the accuracy andstability of generated UDP CBR traffic. Detailedanalysis shows that, over acceptable PIAT measure-ments, the error ranged from 0.15% to 13% for theRT kernel, while for equivalent measurements withthe normal kernel the error is considerably greater.Hence, the RT kernel was employed for the videoexperiments. Further detailed analysis of the beha-vior of the RT kernel and stg can be found in [17].

Page 6: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 2. PIAT replication by the traffic generator using a normal and RT kernel for measured (m) and estimated (e) PIAT, with limits to

resolution of each kernel indicated.

6The number of data points for the PL of 500B is reduced for

compatibility with later Fig. 10. No difference in behavior is

masked by this change.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8574

3.2. Router response

The response of the two routers to injected trafficin the testbed network of Fig. 1 will affect themeasurements. In the testbed, Router A works as atraffic shaper maximally receiving at its Ethernetinterface at most 100Mb/s and then reducing thattraffic to 2Mb/s at the serial interface. In order toreduce the rate, the router must drop any excessivepackets, usually at its output queue, as this is wherethe bandwidth constriction occurs. The CPUprocessing load was recorded for both routers bysetting ‘‘show processes cpu’’ taking only the ‘‘cpuutilization every minute’’ in the router configura-tion. The ideal setting for this analysis might be anaverage over a period shorter than 1min to increasethe measurement resolution. However, more fre-quent timings actually put more stress upon therouters. The desired outcomes were: the trafficconditions when the router becomes unstable; andthe best packet size for Internet applications basedon router response to UDP traffic. The followingEthernet frame sizes were generated: 65, 90, 130,1200 and 1500B. Thirty streams of 2-min durationwere generated for each frame/packet size. Eachstream had a constant PIAT. Then, the range1� 10�4 to 1�10�1 s was divided in equal portionsacross the 30 streams for each frame/packet size.For clarity, the 30 data points are not marked onthe plots of Fig. 3. In fact, measurements were taken

of the packet rate by simply counting packets overthe 2-min duration, which procedure increasesaccuracy. The CPU load reading was taken overthe middle 1min, ensuring that sufficient packetshad passed through the router’s buffers in the initial30 s. If other measurement metrics are employedother than packet rate then a misleading impressionresults, as was analyzed in [18].

In Fig. 3, the processing load at the router sharplyincreases when the arrival rate is in the region of4000 packet/s (PIAT ¼ 0.00025), for all PLs6 from65 to 1500B. Based on this result it appears that theCPU load is largely dependent upon the packet rateand not on the PL. After the 4000 packet/s breakpoint the router behaves erratically. Apparently theCPU load decreases with a high packet arrival rate.However, this is not the case, as the router is undersuch stress that CPU performance reporting be-comes erroneous. Other symptoms of this break-down are reported ‘failure on the serial link’ andother alarms. Further characterization of the routerbehavior is restricted as Cisco’s IOS o.s. isproprietary software.

The queuing policy in router A of Fig. 1 is First-In-First-Out (FIFO) or drop tail, which means thatwhen the output queue is full the router will become

Page 7: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 3. Router A packet rate response with different PL.

Fig. 4. Packet loss with increasing packet rate.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 75

busy discharging packets. Therefore, small packetbursts can trigger the same response as that createdby a continuous rate of around 4000 packet/s andabove. Packet loss is largely independent of CPUload, since, as Fig. 4 illustrates, the loss ratesincrease linearly for a given PL. (The plot for 1200Bat the resolution of Fig. 4 is superimposed on thatfor a PL of 1500B.)

It is likely that Cisco 2600 routers are provisionedto cope with Voice-over-IP (VoIP) traffic. However,the 65B frame size plot in Fig. 3 has a smaller

payload, discounting headers (IP/UDP/RTP20+8+12 ¼ 40B), than a typical VoIP (30B).Therefore, in practice the smaller frame size isunlikely to occur and certainly will not normallyoccur for video streams, except for fragmentedpackets or feedback messages.

In summary, the CPU load response on typical(Cisco 2600) Internet routers was measured todetermine how bit rate, PL, and packet rate affectthe router’s response. In making these observations,no special weakness of Cisco 2600 routers is

Page 8: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8576

implied, as these routers are perfectly suitable fortheir tasks. The traffic characteristics determine therouter’s CPU load and hence:

The router CPU load response is largely relatedto the packet rate and not to the PL or bit rate,and it is this aggregate packet rate that should bechecked in network management. For a givenPL, packet loss is largely independent of CPUload being linearly related to packet rate. � The recorded CPU load response may be

symptomatic of a general processing bottleneck,which may or may not be attributed to other sub-processors such as the serial interface sub-processor.

� In the experiments, after the 4000 packet/s point

the router become unstable for the default routerconfiguration in use. However, practical videostreaming applications are unlikely to require asustained rate of 4000 packet/s or above,although small packet bursts may approach thisrate.

Table 2

‘‘Interview’’ encoded video stream characteristics

Average bit-rate (kb/s) 187

Frame size (CIF) 352� 288

Frame rate (f/s) 30

Video duration (s) 60

Intra refresh period (f) 10

The best traffic conditions were found when thePL was larger, PL (1500 B) for standard Ethernetframes, because larger packets require smallerpacket rates to transmit data, which applicationprogrammers should bear in mind.

4. PbP analysis applied to video streaming

The packetization method used to stream videoon an Internet plays a vital role in controllingpacket loss and, hence, received video quality. Thisin turn will be affected by the likely router response.Some studies of packetization schemes for theH.263+video codec, for example [14], tend toassume the one slice per packet recommendationcontained in RFC 2429 Section 3 [3]. Similarly, in[6] single spatial row of macro-blocks or Group ofBlocks (GOB) is assigned per packet, when theoptional H.263 Annex K slice-structured mode isnot applied. We have set a slice to correspond to aGOB, which is similar to the MPEG-2 definition ofa slice. However, RFC 2429 points out thepossibility of rectangular slices and other arrange-ments to aid error concealment [30]. In [30], all evenGOBs and all odd GOBs are packed into twodifferent packets (called slice interleaving) at QCIFresolution. We have assumed a simple (perhapsoversimplified) packetization strategy, but the find-ings could be equally well applied to moresophisticated strategies. Although, application of a

multi-slice packetization may appear as an intuitiveimprovement (as it reduces header overhead),because of the possibility of packet loss bursts,there is uncertainty as to the relative advantages ofone scheme or another. Although not explored inthis paper, when a slice exceeds the frame size, orwhen two slices exceed the frame size and so on is anissue. The loss of part of a slice will nullify thesuccessful reception of the other part. In [13], theburst length is also identified (for the H.264 codec)as a source of degradation, as much as the averagepacket loss rate.

In this case study, a VBR H.263+ coded videosequence represented the test video stream. EveryCIF frame was split into the usual 18 macro-blockrow-wise slices, to prevent the propagation ofchannel errors within a picture by providingsynchronization markers, and then transmittedusing one or two slices per packet. If slices were tobe split between packets then the presence of theslice header in one of the packets and the use ofvariable length coding would cause more data to belost than present in any single packet. The methodof delivery was varied, either per frame packetbursts, or uniform (constant) IPG.

4.1. Video characteristics

Table 2 shows the test video characteristics, whichwas an ‘‘Interview’’ recording. This recording is a‘‘head and shoulders’’ video sequence with CIFformat that results in suitable data for the desiredpacketization lengths without causing packet frag-mentation. The frame rate was 30 frame/s, resultingin a 1-slice scheme generating a mean 540 packet/s,and a 2-slice scheme generating a mean 270 packet/s. Although, the mean rate is below the maximumrate in Fig. 3 of 4000 packet/s, nevertheless becauseof the burstiness of the source, small packet burstseasily exceed that rate. For example in Fig. 5, forframe 298 of the sequence, an instantaneous rate of115 384 packet/s occurs. A 17-B header was alsoadded to each packet to keep track of the frame

Page 9: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 5. Illustrative packet burst showing timings and packet lengths.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 77

sequence number, media type, frame number,packet number and timestamp. All these fields areused to re-construct the video at the receiver side.(An RTP header, which serves a similar purposealthough with reduced functionality, would be 12Bin size. Cisco 2600 series routers do not supportdiscriminatory treatment of RTP against UDP,although higher-end Cisco routers do, as do someEthernet drivers which perform traffic analysis.)The 10-frame refresh period implies an Intra (I)picture inserted into the Predicted (P) pictures. NoB-pictures were used in this experiment.

Figs. 6(a) and (b) show the PL frequencydistributions for, respectively, the one- and two-slice schemes (as taken from encoder packet headerinformation). (The ‘Ethernet’ bars are simply offsetby 59B representing the extra UDP and the frameheader overhead.)

Two delivery techniques were applied in theexperiments: (1) Uniform: IPG of 1/540 s for allpackets in a frame, and (2) Burst: IPG of 1/30 s. Inorder to test video delivery under difficult condi-tions, for all experiments in this section we addedbackground traffic at 1.8Mb/s with a normalprobability density function (pdf) (mean PL1000B, with standard deviation of 100B) andconstant IPG of 0.004444 s. In Fig. 7(a) and (b),observe the markedly different PL patterns betweenthe two schemes. The larger packets (against the y-axis in Fig. 7(b) are caused by the leading I-picture.The video statistics analyzed by picture type for thedifferent experiments are shown in Table 3.

4.2. Measurement results

Statistics were collected at the network level: PbPwith tcpdump and tcpflw; and at the video level:with the decoder and encoder information. It is ofinterest to observe how packet loss can affectobjective video quality, luminance peak-signal-to-noise ratio (PSNR) taken on a frame-by-framebasis, comparing source with received decodedframe. Table 4 presents packet losses for the testInterview video analyzed by picture type. There wasat the very least a twofold reduction in total packetlosses, when using the two-slice rather than the one-slice scheme. In part, this was due to the reducedheader overhead, illustrated by the constant offsetbetween the measured one- and two-slice scheme bitrates in Fig. 8 for a uniform delivery method. Table4 also shows that the uniform method reducedpacket losses, by 44% or 58%, depending on thepacketization scheme. We postulate that this effectoccurs due to router queue behavior when facedwith a sudden rush of packets. Notice that the moreimportant I-pictures are more favorably treated bythe two-slice scheme.

Now compare the best- and the worst-caseperformance for this video communication. Fig. 9plots the PSNR on a frame-by-frame basis of theworst (one-slice burst) and best (two-slice uniform)cases in terms of total packet loss. The plot marked‘‘Source’’ is the PSNR of the source video clipwithout any loss but after passing through thecodec. The best-case plot consistently tracks the

Page 10: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 6. PL frequency distribution comparison for (a) 1- and (b) 2-slice as at the encoder, and output to the network.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8578

source PSNR curve. The behavior of the one-sliceburst PSNR plot is erratic and most of the timeremains below the best-case plot, in some framesbeing 20 dB below the source PSNR. At thesePSNR levels, the one-slice video would be unwatch-able. Appropriate error resilience techniques forH.263 [32] streams were applied, namely selection ofH.263 Annex K slice structured mode, Annex Rindependent segment (slice) decoding mode, andAnnex N reference picture selection mode. How-

ever, it is possible that appropriate error conceal-ment techniques (not present in H.263), whenapplied, would significantly improve the quality ofthe one-slice plot.

4.3. Further measurement results

To check the impact of changing router queuelength (QL), and differing background traffic, afurther set of experiments were conducted. Of

Page 11: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 7. PL distribution in time for (a) 1-slice uniform and (b) burst delivery schemes.

Table 3

Slice structure characteristics by I- and P-pictures

1-Slice I 1-Slice P 2-Slice I 2-Slice P

Total slices (n) 3240 29 160 1620 14 580

Min. size (B) 159 6 345 13

Max. size (B) 750 178 1123 346

Mean size (B) 281.5 16.9 563.1 33.7

Std. dev. (B) 89.3 18.9 163.2 36.7

Median (B) 266 11 544 23

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 79

practical necessity, the experimental setup varied,although the measurement methodology, equip-ment, and compressed video source remained thesame. Linux kernel version 2.6.18 was installedallowing timing by means of the Hrtimer fromLinutronix [2], which is a successor to UTIMEemployed by KURT. As in some of the experimentshigher background traffic rates are generated,background traffic generation was delegated to asecond Linux sender, allowing the original Linux

Page 12: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 8. Bandwidth comparison (measurement at 1 s intervals) for 1-slice and 2-slice schemes.

Table 4

Packet loss numbers by slice and delivery method, and picture type

1-Slice 2-Slice

Burst Uniform Burst Uniform

I P I P I P I P

Packet loss (PLoss) 538 7992 304 4456 259 2946 41 1255

PLoss (%) 16.7 27.4 9.4 15.3 16.0 20.2 2.5 8.6

Fig. 9. PSNR comparison for the worst- and best-case packet loss schemes, over the range of frame numbers 800–900.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8580

Page 13: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 10. Router A packet rate response with PL ¼ 500 and differing QL.

Fig. 11. Packet loss with increasing packet rate and differing QL.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 81

sender to specialize in video traffic generation. TheCisco routers’ o.s. was upgraded to IOS C2600-I-M,version 12.2(13a), release fc2.

In Fig. 10, the same experiment as recorded inFig. 3 was repeated7 but with a fixed PL of 500B.This PL is close to the Maximum Transport Unit(MTU) that must be supported by all routerswithout subsequent fragmentation. In Fig. 10, the

7The number of data points is less than that of Fig. 3 but the

essential response pattern is retained.

default buffer size refers to the Cisco default settingof Table 1. The buffer size was then stepped atintervals of 100 packets until the pattern becameapparent. As the buffer size is increased the bit rateat which reporting becomes unstable (see Section3.2) is lowered. The onset of this behavior alsooccurs at a lower recorded CPU load. We surmisethat management of the buffer places a greater loadon the CPU itself or a sub-processor. Fig. 11demonstrates that packet loss rate is largelyindependent of buffer size and CPU load, as the

Page 14: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Table 5

Packet loss by queue length with 1.8Mb/s normal pdf background traffic

QL (packets) 75 100 200 300 400 500 600 700 800 900 1000

1-Slice burst PLoss (packets) 6845 5236 4011 4400 3894 3805 3609 3337 3387 2819 2938

2-Slice uniform PLoss (packets) 657 629 674 425 472 373 313 291 224 151 113

Table 6

Packet loss with normal and Pareto background traffic pdfs at various mean rates, with QL ¼ 200

Background traffic Normal 1.5Mb/s Normal 1.8Mb/s Normal 1.9Mb/s Pareto 1.5Mb/s Pareto 1.8Mb/s Pareto 1.9Mb/s

1-Slice burst

PLoss (packets) 1 3880 6593 2584 8194 8096

2-Slice uniform

PLoss (packets) 0 608 1484 269 1688 1790

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8582

QL only has a temporary effect in stemming packetlosses. In Fig. 11, the resolution of the plot does notshow small variations in loss numbers.

Table 5 records packet losses recorded whenaltering both the input and output buffers (on bothCisco routers) to the given QL. Prior experimentsestablished that altering the input buffer size (Table1) alone did not impact on packet loss numbers. Thesame background traffic as in Section 4.2, 1.8Mb/snormal pdf, was injected alongside the video stream.Setting the QLs to 75 packets equates to theexperiments in Section 4.2, when it will be seenfrom Table 5 that the packet loss numbers aresomewhat reduced on Table 4’s figures, resultingfrom the changes described in the previous para-graph. A check with input QL set to 75 and outputset to 100, exactly as in Table 1, did not appreciablyalter the loss numbers. Clearly, when the QL isincreased then there is a decreasing trend in packetlosses for both the packetization methods.

In Table 6, for a QL of 200 packets, threedifferent bitrates are selected and two differentbackground traffic densities. In other experiments,the trend of Table 6’s results was repeated for otherQLs. When aggregated to the mean input video rateof 0.187Mb/s (Table 2) mean background traffic of1.8Mb/s closely approaches the bottleneck linkcapacity of 2.0Mb/s, whereas injecting backgroundtraffic of 1.5Mb/s does not. As an alternative to anormal pdf of PLs, a Pareto pdf with shape factora ¼ 1.3 and location k ¼ 1 was applied to PIATs,

by a linear scaling so that the mean was at desiredmean bitrates. The intention of applying a Paretopdf to PIATs was to judge the effect of a differentpacket arrival pattern upon the router. No claim ismade that this distribution mimics the effect oftypical Web server traffic, for which an on-offmodel with Pareto distribution of burst length hasbeen applied [2].

From Table 6, a normal pdf background at meanrate 1.5Mb/s results in just one packet loss with oneslice per packet. Packet losses for normal pdfbackground at mean rate 1.8Mb/s differ somewhatfrom those in Table 4, as is usual due to systemeffects such as process scheduling. The main effectof introducing another density is that packet lossesare much greater, including those for a mean rate of1.5Mb/s, which indicates the burstiness of thebackground traffic source. Burstiness also affectsthe relative packet losses at rates of 1.8 and 1.9Mb/swhich are similar in the presence of Pareto back-ground traffic, and in fact for 1-slice packetizationand burst delivery actually result in more losses at alower cross traffic.

The effect of different background traffic rates onthe same sequence of Fig. 9, with the same defaultbuffer setting from Table 1, is shown in Fig. 12 for a1-slice per packet burst delivery method. A rate of1.5Mb/s normally distributed does not stress therouter and consequently the PSNR is close to thatof the original encoded video stream. The result ofthe changes noted in the first paragraph is an

Page 15: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESS

Fig. 12. PSNR comparison for differing background traffic rates with a normal pdf, with 1-slice per packet and burst delivery.

Fig. 13. PSNR comparison for differing background traffic rates with a Pareto pdf, with 1-slice per packet and burst delivery.

M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 83

improvement in the PSNR for the 1.8Mbit/sbackground traffic rate. The results also differbecause system ‘noise’ affecting the scheduling timesof both video source and background traffic packetsmeans that, unlike in a simulation, the same burstpatterns are not repeated across successive runs.However, the PSNR still remains relatively low formuch of the sample sequence at a rate of 1.8Mb/sand going beyond this rate to 1.9Mb/s drasticallyreduces quality during this particular sequence(visual inspection showed that coincidentally

degradation was particular marked over theseframes with this background rate). Fig. 13 illustratesthe impact on the received PSNR of backgroundtraffic with the Pareto pdf at various input rates. Asmight be expected by the similarity in packet losses,there is no clear distinction between the PSNR inthe face of the two higher background rates.Comparing the effect of 1.8Mb/s background trafficbetween the two background traffic pdfs, the PSNRis lower for a Pareto density background for thisconfiguration.

Page 16: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8584

4.4. Discussion

H.264/AVC is the ITU’s most recent video codec,and its picture segmentation scheme (slicing) [31]builds upon the earlier H.263+(and H.263++)standard. In H.264, a slice is normally formed frommacro-blocks in raster scan order, without formalrestriction on the number of macro-blocks in a slice.Additionally, flexible macro-block ordering (FMO)in the interests of error concealment is possible.Slice interleaving is also possible. In [31], both FMOand slice interleaving are experimented with,although the impact on delay is not formallyanalyzed and this might affect conversationalapplications such as videotelephony and videoconferencing. Ideally in H.264, a slice should matchthe MTU size, but the end-to-end MTU is verydifficult to find [31] and in the case of wirelessnetworks could be as low as 255B.

5. Conclusion

This paper has presented a PbP measurementmethodology, describing key metrics, packet cap-ture and analysis tools, and network testbedconfiguration intended to model tight link re-sponses. While individual findings in this paperhave been anticipated in other works, the whole hasnot been previously collected into a methodologyfor video stream measurement and analysis. Thesingle message that emerges from this study is thatselection of a packetization scheme has a consider-able impact on delivered PSNR, which is bestrevealed by a physical testbed and a precisemeasurement methodology.

Tests indicate that routers can become unreliableif the packet arrival rate is too great. Oneconsequence is that once the critical rate is reachedthen measurements of the throughput also becomeunreliable, as the processor workload is too great.There are practical implications as well, for videostreaming application programmers, who shouldseek to reduce the output packet rate and to trafficmanagers who should take steps to avoid excessivepacket rates, for example by setting up additionalrouters. Measurement accuracy of experiments isnot assured if the o.s. of the host machine is unableto support packet generation with the desiredresolution. This result has implications for thosemeasurement studies conducted without a RT o.s.

In a case study, a burst pattern increased theprobability of packet loss, even if the burst was

short with just 9 or 18 packets (depending onpacketization method) along with any backgroundtraffic packets. If the analysis was simply in terms ofaverage packet rate per second, no burst patternwould be apparent, and without packet-levelmeasurement of instantaneous bandwidth the dif-fering overheads would not be visible. The casestudy indicated that a two-slice packetizationscheme results in a significant improvement inPSNR over the conventional one-slice scheme forcompressed H.263+ video at the bit-rates tested. Italso reinforced the need to avoid short frame burstsif consistently high-quality video is to be delivered.Extensions to multiple-slice packing remain to beexplored.

References

[1] A.K. Aggrawala, D. Sanghi, Network dynamics: an experi-

mental study of the Internet, in: IEEE Conference on Global

Communication (GLOBECOM), 1992, pp. 782–786.

[2] P. Barford, M. Crovella, Generating representative Web

workloads for network and server performance evaluation,

in: ACM Sigmetrics/Performance, July 1998, pp. 151–160.

[3] C. Borman, et al., RFC 2429—RTP Payload Format for the

1998 Version of ITU-T Rec. H.263 Video (H.263+), 1998.

[4] L. Breaslu, K. Estrin, D. Fall, S. Floyd, J. Heidemann, A.

Helmy, P. Huang, S. McCanne, K. Varadhan, X. Ya, H. Yu,

Advances in network simulation, IEEE Comput. 33 (5)

(2000) 59–67.

[5] Cisco Systems, Inc., LAN Design Guide for the Midmarket,

San Jose, CA, 2000.

[6] G. Cote, F. Kossentini, Optimal intra coding of macro-

blocks for robust (H.263) video communication over the

Internet, Image Commun. 15 (1) (1999) 25–34.

[7] C. Dovrolis, P. Ramanathan, D. Moore, Packet-dispersion

techniques and a capacity-estimation methodology, IEEE/

ACM Trans. Networking 12 (6) (2004) 963–977.

[8] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D.

Moll, R. Rockell, T. Seely, C. Diot, Packet-level traffic

measurements from the Sprint IP backbone, IEEE Network

17 (6) (2003) 6–17.

[9] R. Hill, B. Srinivasan, S. Pather, D. Niehaus, Temporal

resolution and real-time extensions to Linux, Technical

Report ITTC-FY98-TR-11510-03, University of Kansas,

1998.

[10] G. Iannacone, M. May, C. Diot, Aggregate traffic perfor-

mance with active queue measurement and drop from tail,

Comput. Commun. Rev. 31 (3) (2001) 4–13.

[11] S. Kalidindi, M.J. Zekauska, Surveyor: an infrastructure of

Internet performance measurements, in: Proceedings of

INET Conference, June 1999.

[12] S. Kieffer, W. Spicer, A. Schmidt, S. Lyszyk, Planning a

Data Center, Technical Report, Network System Architects,

Inc., Denver, CO, 2003.

[13] Y.J. Liang., J.G. Apostopoulos, B. Girod, Analysis of

packet loss for compressed video: does burst-length matter?,

ICASSP V (2001) 684–687.

Page 17: Accurate packet-by-packet measurement and analysis of video

ARTICLE IN PRESSM. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 85

[14] E. Masala, H. Yuang, K. Rose, J.C. De Martin, Rate-

distortion optimized slicing, packetization and coding for

error resilient video transmission, in: Data Compression

Conference, 2004, pp. 182–191.

[15] J. Micheel, I. Graham, N. Brownlee, The Auckland data set:

An access link observed, in: Proceedings of 14th ITC

Specialist Seminar, 2000.

[16] A. Odlyzko, Data networks are lightly utilized, and will stay

that way, Technical Report, AT&T Labs, 1998.

[17] M. Paredes Farrera, M. Fleury, M. Ghanbari, Precision and

accuracy of network traffic generators for packet-by-packet

traffic analysis, in: Proceedings of IEEE TridentCom

Conference, March 2006, pp. 32–37.

[18] M. Paredes Farrera, M. Fleury, M. Ghanbari, Router

response to traffic at a bottleneck link, in: Proceedings of

IEEE TridentCom Conference, March 2006, pp. 38–46.

[19] V. Paxson, Measurement and analysis of end-to-end internet

dynamics, Ph.D. Dissertation, University of California,

Berkeley, 1997.

[20] V. Paxson, Automated packet trace analysis of TCP

implementations, in: Proceedings of the ACM SIGCOMM

‘97, France, September 1997, pp. 167–179.

[21] V.J. Rebeiro, R.H. Riedl, R.G. Baraniuk, Locating available

bandwidth bottlenecks, IEEE Internet Comput. 8 (5) (2004)

34–41.

[22] R. Rejaie, M. Handley, D. Estrin, RAP: an end-to-end rate-

based congestion control mechanism for realtime streams

in the Internet, in: IEEE INFOCOM’99, vol. 3, 1999,

pp. 1337–1345.

[23] R. Rejaie, On integration of congestion control with Internet

streaming applications, in: Proceedings of the PacketVideo

Workshop, April 2003.

[24] G. Sackett, Cisco Router Handbook, second ed., McGraw-

Hill, New York, 2000.

[25] H. Schulzrinne, IP networks, in: M.-T. Sun, A.R. Reibmen

(Eds.), Compressed Video Over Networks, Marcel Dekker,

New York, 2001, pp. 81–138.

[26] D. Sisalem, A. Wolisz, LDA+ TCP-friendly adaptation: a

measurement and comparison study, in: 10th International

Workshop on Network and Operating Systems Support for

Digital Audio and Video (NOSDAV), June 2000.

[27] Tcpdump Manual Pages, available from /http://

www.tcpdump.org/tcpdump_man.htmlS.

[28] D. Turaga, T. Chen, Fundamentals of video compression:

H.263 as an example, in: M.-T. Sun, A.R. Reibmen (Eds.),

Compressed Video Over Networks, Marcel Dekker, New

York, 2001, pp. 3–33.

[29] S.Y. Wang, C.L. Chou, C.H. Huang, Z.M. Yang, C.C. Chiou,

C.C. Lin, The design and implementation of the NCTUns 1.0

network simulator, Comput. Networks 42 (2) (2003) 175–197.

[30] S. Wenger, G. Cote, Using RFC2429 and H.263+ at low to

medium bit-rates for low-latency applications, in: PacketVi-

deo Workshop, New York, April 1999.

[31] S. Wenger, H.264/AVC over IP, IEEE Trans. Circuits

Systems Video Technol. 13 (7) (July 2003) 645–655.

[32] S. Wenger, G. Knorr, J. Ott, F. Kossentini, Error resilience

support in H.263+, IEEE Trans. Circuits Systems Video

Technol. 8 (7) (November 1998) 867–877.

[33] M. Yajnik, J. Kurose, D. Towsley, Packet loss correlation in

the Mbone multicast network, in: Proceedings of the Global

Internet Conference, 1996.

[34] T. Zseby, J. Quittek, Standardizing IP traffic flow measure-

ment at the IETF, in: Proceedings of the Second SCAMPI

Workshop, 2003.