Investigation into low latency live video streaming

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2019

Investigation into low latency live video streaming performance of WebRTC

JAKOB TIDESTRÖM

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Investigation into low latency live video

streaming performance of WebRTC

Author: Jakob Tideström

Supervisor: Jens Edlund

Examiner: Sten Ternström

Date: 27 March 2019

Abstract

As WebRTC is intended for peer-to-peer real time communications, it contains the capability for

streaming video at low latencies. This thesis leverages this ability to stream live video footage in

a client-server scenario. Using a local broadcaster, server, and client setup, a static video file is

streamed as live footage. The performance is compared with contemporary live streaming

techniques, HTTP Live Streaming and Dynamic Adaptive Streaming over HTTP, streaming the

same content. It is determined that WebRTC achieves lower latencies than both techniques.

However, without comparatively extensive fine tuning, the quality of the live feed suffers.

Sammanfattning

Eftersom WebRTC är menat för peer-to-peer realtidskommunikation så har den förmågan att

strömma video med låg latens. Denna avhandling utnyttjar den här förmågan för att strömma

livevideo i ett klient-server-scenario. Med en uppsättning som omfattar en lokal sändare, server,

och klient strömmas en statisk videofil som en live-video. Prestandan jämförs med hur de

samtida liveströmningsteknikerna HTTP Live Streaming respective Dynamic Adaptive

Streaming over HTTP strömmar samma innehåll. Slutsatsen är att WebRTC lyckas uppnå lägre

latens än båda de andra teknikerna men utan relativt mycket finjustering så försämras kvaliteten

på strömmen.

Table of Contents

1 Introduction 1

1.1 Objective 1

1.2 Delimitations 1

1.3 Structure 2

2 Background 3

2.1 Adaptive Bitrate Streaming 3

2.1.1 HTTP Live Streaming 5

2.1.2 Dynamic Adaptive Streaming over HTTP 5

2.2 WebRTC 6

3 Related Work 8

4 Method 9

4.1 Procedure 9

4.1.1 Browser and Video Player 9

4.1.2 Throttling 11

4.1.3 Bitrate 11

4.2 Measured Parameters 12

4.2.1 Latency 12

4.2.2 Bandwidth 12

4.2.3 Bitrate 13

4.2.4 Dropped Frames 13

4.3 Experimental Setup 14

4.3.1 Encoder Specifications 14

4.3.2 Server Specifications 14

4.3.3 Client Specifications 14

4.3.4 Source Video 15

4.3.5 Encoding 15

4.3.6 Streaming Media Server 15

4.3.7 Browsers 16

4.4 Experiments 16

4.4.1 Adaptive Bitrate Streaming 16

4.4.2 WebRTC 18

5 Results 19

5.1 WebRTC and ABR 19

5.1.1 Latency 19

5.1.2 Bandwidth 22

5.1.3 Bitrate 23


5.2 H.264 and VP8 25

5.2.1 Latency 25

5.2.2 Bandwidth 26

5.2.3 Bitrate 26


6 Discussion 28

6.1 WebRTC and ABR 28

6.2 H.264 and VP8 29

7 Conclusion 30

8 Future Work 31

Bibliography 32

1

Chapter 1

Introduction

A substantial segment of internet traffic is occupied by video streaming. This includes video on

demand where the user views prerecorded content at a time of their choosing, but also

streaming of live video. In both cases there is a need to provide media to viewers, in an

acceptable quality, on a variety of different platforms, without the viewer having to wait for the

content.

In the case of live streaming there is an additional factor that comes into play: latency. Latency

is defined as the time difference between an event being captured and the time at which it is

presented to the viewer. Latency might not be the most important factor for all live streams but

for some it is vitally important. Examples of these include sports where higher latencies make it

more likely that major incidents, such as a goal, get spoiled from outside sources, like social

media. It is also important in real time communication scenarios where latency impacts the

waiting time between responses, which become increasingly jarring to the flow of conversation.

Currently, Adaptive Bitrate Streaming (ABR) is the primary technique used to stream multimedia

online, live or otherwise. However, the focus of ABR is the quality of experience, and latency

suffers as a result. There are parameters that may be tuned to achieve lower latencies but these

adjustments all come with consequences of their own.

On the other hand, the intended purpose of WebRTC is peer-to-peer real time communications

over the web. Therefore, the primary focus is on keeping the latencies low, at the cost of a

decrease in quality from not making absolutely sure everything is received.

1.1 Objective

The aim of this thesis is to investigate the performance of WebRTC when used for live video

streaming in a client-server scenario. The same video content will be live streamed to a viewer

using both WebRTC and contemporary ABR techniques, tuned for different levels of latency.

The performance will be analyzed to primarily determine how well the low latencies of WebRTC

can be leveraged in such a scenario. Secondarily, the viability of using WebRTC as an

alternative to ABR for live video streaming will also be looked into.

1.2 Delimitations

To make the findings of this thesis as widely applicable as possible, as well as to eliminate the

impact of in-house knowledge and personal skills, only existing WebRTC products will be

investigated.

2

There are currently four primary technologies for ABR available, but Adobe’s HTTP Dynamic Streaming and Microsoft’s Smooth Streaming both require plugins, Flash and Silverlight respectively, to run. Not only do these plugins bring potential security risks [1], but browsers

such as Chrome have discontinued their support, and they no longer work on all platforms [2],

[3]. As such, comparisons will only be made between the remaining technologies, Apple’s HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (MPEG-DASH).

1.3 Structure

The remainder of this thesis is structured as follows: first, the relevant technologies and how

they work are introduced. Next, highlights of other research done in relation to reducing the

latency of a stream are presented. Afterwards, the specific setup used in this thesis is specified

and the exact method of gathering data is elaborated upon, including the motivation behind the

choices. This is followed by the experiment results and a discussion. Conclusions that can be

drawn from the results, in terms of whether WebRTC is a viable alternative to ABR or not, are

examined next. Finally, avenues that remain to be investigated will be briefly discussed.

3

Chapter 2

Background

2.1 Adaptive Bitrate Streaming

The concept behind ABR is to divide multimedia content into smaller parts, called segments.

Each segment is encoded to create multiple versions of the same segment, each version having

different properties, such as video resolution. Once a segment has been completely encoded

and is available for the client, it is advertised in what is called a manifest file.

The manifest file contains metadata about the multimedia content and provides the client with

information about, for example, in what resolutions, languages and subtitles the content is

available. Most importantly, the manifest file contains the location where the client can acquire

each segment.

When a client establishes a connection, it will request this manifest file. By analyzing factors

such as the device capabilities, the current network conditions and viewer preferences the client

determines which of the available configurations is the most suitable at the time. As the

aforementioned factors change, primarily the network conditions, the client can adjust which

segment is requested. This allows the client to adapt the quality of the stream to current

capabilities, providing the best possible experience regardless of circumstances.

One of the advantages that ABR has over its predecessors, Real-time Transport Protocol (RTP)

and Real-time Messaging Protocol (RTMP), is that it is built on top of HTTP. Since HTTP is the

foundation of the world wide web there is an extensive delivery network of servers and caches

already in place. This existing infrastructure can thus be adopted seamlessly for use with

delivering ABR segments. An additional advantage of this is that firewalls and routers are

already configured to allow HTTP traffic. Thus, ABR does not require any extra considerations

in this regard, unlike RTP and RTMP.

HTTP itself is a stateless protocol which means that the server does not need to retain any

information about previous requests. Combined with all the logic behind adapting the quality of

the stream being done by the client, the server can respond to each individual request

independently. This lowers the burden on the server, which in turn improves the scalability of

ABR as the number of clients increases.

HTTP in turn is built on the Transmission Control Protocol (TCP) which is not without its own

issues. TCP requires both parties to acknowledge the connection before the exchange of data

can begin, increasing the time required to do so. TCP also ensures that packets arrive in the

same order that they were sent so if a packet is dropped on the way, it will have to be sent

again. In addition to TCP packets thus requiring more metadata there is also the potential of

increased bandwidth usage if the connection is poor.

4

Being an HTTP-based technology, it is important to note that ABR was not designed with low

latency in mind. There are tweaks one can make to an ABR stream to decrease the latency.

However, any such tweak will have an impact on other factors of the stream.

One such tweak is the duration of a segment. Due to the nature of segments, they became an

inherent source of latency. Regardless of the duration of a segment, by the time the segment is

completed, the first frame of the segment is already the same age as the duration of the

segment itself. Thus, a shorter segment duration directly decreases the latency. Shorter

segments also grant the viewer more flexibility in adapting the stream since ABR does so on a

per-segment basis.

However, decreasing the length of segments is not without its own drawbacks. Shorter segment

durations mean that the client will need to request segments from the server at a greater

frequency. This not only increases congestion on the server but also leads to more overhead

from the increased metadata required for TCP. Also, since each segment is advertised in the

manifest, shorter segments potentially mean that more segments have to be advertised,

increasing the size of the manifest file. A bigger manifest file increases the time it takes to

download it, which at the very least will increase startup latencies.

Further, decreasing the duration of a segment runs the risk of requiring more bandwidth to

transport the stream. This is because segments are themselves made up of a sequence of a

structure known as a group of pictures (GOP). In a GOP, only the first frame contains a full

image, which is also known as an intra coded frame (I-frame). This I-frame is then followed by a

combination of predictive frames (P-frame) and bi-directional frames (B-frame). Both frame

types contain only changes based on other frames, and thus require less information than an I-

frame. An I-frame indicates the beginning of a new GOP, meaning that each GOP is

independent from each other.

For ABR to work, each segment is also required to be independent. This requires the segment

duration to be divisible by the duration of the GOPs. This means that decreasing the segment

duration might necessitate shorter GOPs. In turn, this means that the video stream will contain

more I-frames, which require more bandwidth. Another possible tweak to ABR latency is to

decrease the length of the buffer. While this is something determined by the client, and thus not

always available for configuration, ABR stores a number of segments in advance inside a buffer.

The purpose of the buffer is to make ABR more resilient to network fluctuations, if the

connection deteriorates then there are still segments available for playback. This gives the client

time to either recover, or adapt to, the connection before interrupting the viewers experience.

However, the downside of this is that the buffer further exacerbates the issue caused by the

segment duration. Each segment the client stores in a buffer multiplies the inherent latency that

segment duration already causes. This makes the buffer another factor that is intrinsically tied to

both latency and segment duration. The trade-off here is fairly simple, having a smaller buffer

decreases the latency but comes with a corresponding sensitivity to network fluctuations.

5

2.1.1 HTTP Live Streaming

HLS is Apple’s implementation of ABR and the standard was initially introduced in 2009 [4]. It is the standard currently used by iOS and other Apple devices. Since their devices are a

significant segment of the ecosystem of the internet, other developers are required to also

support HLS lest they lose that chunk of the market. As such it has emerged as the standard in

ABR.

Apple recently added support for the video codec HEVC to HLS [5]. However, even though this

codec claims superior compression over H.264, which previously was the only supported codec,

it is not a codec supported by WebRTC [6]. This means that for the purposes of this thesis, the

codec will be ignored so that the comparison will be more equal.

2.1.2 Dynamic Adaptive Streaming over HTTP

MPEG-DASH is the first international standard for ABR and was initially published in 2012 [7].

This standard is the result of the Moving Pictures Expert Group (MPEG) proposing the

development with the goal of unifying the disparate formats that were available at the time. The

proposal was made in 2009 and the resulting development was made in collaboration with other

groups and experts [8].

In contrast to the other ABR implementations MPEG-DASH is agnostic in relation to the format

of the media content itself. Thus, any media format can be used in conjunction with MPEG-

DASH but no format is universally supported by all browsers. This in turn means that the media

may have to be encoded to multiple different formats to guarantee availability.

6

2.2 WebRTC

WebRTC is a project that was opened to the public by Google in June 2011 [9]. The goal of the

project is to enable Real Time Communication (RTC) browser applications. WebRTC provides a

set of protocols that allows the support of cross-browser communications. Amongst others this

initiative is currently supported by companies such as Google, Mozilla and Opera [10].

WebRTC allows browsers to establish a connection directly with each other, without

communications necessarily going through a server. That said, there is still a need for peers to

discover each other, and negotiate a means of peer-to-peer communications. These

negotiations can in theory be done in any way deemed fit. However, in practice the peers will

probably be exchanging these messages via a signaling server.

Part of these negotiations is to determine how to circumvent the issues reintroduced by

WebRTC going back to using the User Datagram Protocol (UDP). The reason for using UDP is

that RTC is more sensitive to latency rather than packet loss. Since UDP makes no guarantees

of packets arriving in order, or at all, and sacrifices quality for performance it is a good fit for low

latency applications.

However, losing a packet is still a loss in data and will lead to a degradation of the stream. UDP

does not resend lost packets, and implementing such a functionality would increase latency.

Instead, WebRTC can make use of something called Forward Error Correction (FEC). FEC

adds redundant data so that the receiver can detect errors and attempt to correct these without

having to request a retransmission of the data. The downside of FEC is that the number of

errors that can be detected is limited, and also increases the bandwidth required. However, it is

useful in situations where retransmissions are either impossible, or costly [11].

The underlying transport protocols for WebRTC is RTP [12] and as such inherits the same

issues that RTP has. One such issue is the traversal of Network Address Translation (NAT) that

hides a number of local IP addresses behind one external IP address accessible to the rest of

the internet.

To solve this, WebRTC uses a technique called Interactive Connection Establishment (ICE).

The purpose of ICE is to find the most direct, and thus fastest, method of communication

between two peers. NAT presents a problem to achieving a direct connection and ICE attempts

to find a workaround for this.

The first issue of NAT traversal is that a client might only know their local IP address, not the

external IP address required for clients outside of the NAT to communicate with them. To

identify what their external IP address is, the first step of ICE is to use something called Session

Traversal Utilities for NAT (STUN). After sending a request to a STUN-server that is outside of

the NAT the client will receive a response containing the IP address the request to the STUN-

server originated from.

7

If a STUN-server is not enough and a direct connection cannot be established, there is also the

option of using the Traversal Using Relays around NAT (TURN) protocol. As the name implies,

clients can relay their traffic through a TURN-server instead of directly sending it to the peer.

While this essentially amounts to a detour, which slows down communications accordingly, it is

a necessary measure when peers cannot establish a direct connection [13].

Once a peer connection has been established, the peers have two methods of communication.

The first method is via a media stream which is the built-in method for transferring video and

audio to the other peer [14]. The second method is by use of a data channel, this is a more

general component of WebRTC and allows for arbitrary data transfers between peers. This data

could be anything from text, a file or even media. Due to the arbitrary nature of the data

channel, sending and receiving must be manually implemented. In return it grants greater

control over the procedure.

8

Chapter 3

Related Work

While WebRTC goes back to using UDP to achieve low latencies, other ways to achieve lower

latencies have also been explored.

Viswanathan Swaminathan and Sheng Wei were able to decouple the relationship between segment

duration and latency. They achieved this using HTTP chunked encoding which allows segments to

be further divided into fragments. This approach allows the client to request a segment before it is

finished and receive it as it is being transcoded. The client will receive a partial response, which it

can consume immediately, while the server continues pushing the fragments as they get published.

This way, the latency can be reduced to below the segment duration, but without increasing the

corresponding increase in requests [15].

The team at Periscope also took this approach of using chunked encoding to improve on HLS itself.

They called this solution Low-Latency HLS and the result was more stable than normal HLS while

still allowing for standard HLS clients to partake [16].

Since the WebRTC data channel allows for arbitrary data to be sent there have been groups,

such as Shuai Zhao, Zhu Li, and Deep Medhi who have had success in combining ABR with

WebRTC. This is accomplished by using WebRTC as the transport protocol to send segments

to peers. The group specifically did so in a bid to avoid common problems with TCP, such as

TCP window collapse, and achieve a more stable transfer of data. Using WebRTC also allowed

them to implement a push-based solution where the client would not need to request each

segment individually [17].

9

Chapter 4

Method

4.1 Procedure

This section will cover the motivation and reasons behind the final structure and scope of this

thesis.

4.1.1 Browser and Video Player

In the pursuit of homogeneity in testing the disparate protocols, a number of web browser and

video player combinations were investigated. Those that had the relevant data readily available

were analysed in more detail. The combinations, and the codecs they support, are listed in table

1 below.

In this table, “Wowza" means the WebRTC player provided by Wowza itself. They provide players for the other protocols as well. However, these players are very basic, and provide no

additional statistics, these were thus ignored. “Eyevinn” is a player created by Eyevinn Technology [18].

10

(Browser) (Player)

WebRTC DASH HLS

Chrome dash.js

NS W NS

Chrome hls.js

NS NS W

Chrome Shaka Player

NS W NS

Chrome Wowza

S NS NS

Chrome Eyevinn

NS W W

Firefox dash.js

NS W NS

Firefox hls.js

NS NS W

Firefox Shaka Player

NS W NS

Firefox Wowza

W NS NS

Firefox Eyevinn

NS W W

Edge Shaka Player

NS W W

Table 1: Compatible combinations of browser and video player. “W” indicates that the protocol is supported, and works, while “NS” means that that protocol is not supported. The sole “S” signifies that while the combination is supported, it had issues in this setup.

Since Wowza currently only supports WebRTC in Chrome and Firefox, WebRTC will be limited

to those browsers. However, for this particular setup, the Chrome implementation did not

function and was not able to receive video or audio. As the cause for this was not found, it had

to be excluded and Firefox was chosen for WebRTC by default.

Looking at ABR, there are only two players that support both DASH and HLS. The Eyevinn

player in either Chrome or Firefox, or the Shaka Player in Microsoft Edge. For the sake of

homogeneous comparisons, using the Eyevinn player in Firefox seems like the obvious choice.

However, looking at the backend of the Eyevinn player, it uses hls.js for HLS streams, and

Shaka Player for DASH streams. This presents a number of issues. The first is that there is no

11

guarantee that the metrics are calculated the same in both players, which has the potential of a

significant difference in output. The second is that hls.js also does not appear to provide all of

the metrics that are sought after, which Shaka Player does.

As such, the choice was made to use Shaka Player in Microsoft Edge instead, despite requiring

the use of a second browser. There is still the potential of this disparity causing differences, but

it was deemed to be a lesser risk than the use of two completely different players altogether.

4.1.2 Throttling

Originally, network throttling was planned to be included. This was for the purpose of

determining how well WebRTC is able to adapt to sudden changes in network quality, if at all.

Initially, various software solutions to this were investigated. This included software such as

NetLimiter and Charles Proxy. Unfortunately, no software was successfully able to throttle the

network in this setup. The only exception to this was Chrome DevTools, but due to the issues

covered in chapter 4.1.1 above, this was not a viable option.

While some hardware was also looked into, these options also had to be discarded due to

prohibitive costs or because they did not appear to be able to throttle the network in the manner

that was required. As such, this aspect unfortunately had to be discarded.

4.1.3 Bitrate

When it comes to streaming video, the setup process for WebRTC is less straightforward than

for ABR. Not all implementations of WebRTC support the same video profiles. A video profile

tells the application what features the decoder must be able to decode to view the stream

properly. Examples of this would be the ability to decode up to a certain resolution and frame

rate. This means that depending on the platform, the video is subjected to different constraints,

such as maximum resolution or framerate.

That streaming video is not the primary intended purpose of WebRTC is apparent in the

volatility of the setup process. While WebRTC can, overall, use the same source stream for its

output, some options for the source stream works better than others. As such, finding a

combination that works for the specific WebRTC setup in question requires more fine tuning

than ABR does.

An example of this fine tuning is the bitrate at which the video is streamed. Wowza itself

currently only supports a limit of 2 mbps, which thus became the goal to reach. However, this

goal was met to varying degrees, depending on the codec. As the bitrate increased, playback

issues, such as stuttering, would occur more often. Therefore, the challenge was to find a

balance between a high bitrate and a good quality of experience.

The proper bitrate for each codec was determined through systematic trial and error. H.264

required the most work. Starting at a bitrate of 2 mbps, the video stuttered to an unacceptable

12

degree. Decreasing the bitrate to 1.5 mbps allowed the video to play with one or two stutters of

approximately one seconds throughout the entire playback period. Increasing the bitrate from

here quickly increased the amount of stuttering, and was thus left at 1.5 mbps.

WebRTC also has support for other video codecs, namely VP8 and VP9. To allow for a more

comprehensive investigation of the performance of WebRTC, these codecs were also

attempted. Fortunately, VP8 was able to perform acceptably, as outlined in the previous

paragraph, at 2 mbps, requiring no further investigation. However, using the same procedure for

VP9 produced less desirable results. Despite decreasing the bitrate to below 1mbps, at which

point the video quality began to suffer, it was not possible to avoid playback issues. Therefore,

VP9 was also discarded as an option for this setup.

4.2 Measured Parameters

As a result of these issues and compromises, the experiment parameters were determined as

the following:

4.2.1 Latency

Latency is the primary parameter of interest for this thesis. Generally speaking, latency is

merely the delay between two correlated points in a chain of events. As such, there are many

different ways to calculate latency and it may not always mean the same thing in every context.

However, for the purposes of this thesis, latency is the time difference between an event being

captured and when it is displayed to the viewer, as mentioned previously.

To standardize this measurement across technologies, an overlay of the current time is added

into the video itself. This means that no matter what may happen in the interim, once the frame

is displayed, it will be possible to determine when the frame was captured. While the clock of

the origin, the server, is readily available, the time displayed in the video is embedded and will

have to be read manually. As such, this method sacrifices granularity for fairness and control.

4.2.2 Bandwidth

All computers, or networks, are limited in the amount of data that can travel through it at any

one point in time. The number of transmissions, and the size of each transmission, will therefore

be a burden on any computer. The more bandwidth an individual connection consumes, the less

connections a server can serve simultaneously. Thus, this is an important metric to know in

evaluating the performance of each technology.

In terms of measurement, a tool called Wireshark will be used to capture all packets sent

between the server and the client. Wireshark can also produce graphs to visualize various

metrics of a connection. The raw data of the bandwidth graph provided by Wireshark will be

used to plot the graphs. This measurement can also be gathered by other means, often in the

video player itself. However, using an external tool will standardize how the bandwidth is

calculated, or estimated, across players.

13

4.2.3 Bitrate

Bitrate represents how many bits are processed in a period of time, generally one second. Thus,

a higher bitrate means that more data can be included, leading to higher quality video, or audio.

As such, this metric will be the primary indicator of the quality.

For ABR, the bitrates are readily available from the manifest, and the current bitrate can be read

from the segment chosen by the client. For WebRTC, this measurement is not as

straightforward. WebRTC’s getStats API exposes the number of bytes that have been received, as well as a timestamp representing when the data was sampled. Using the time differences

between samples, as well as the difference in bytes received, one can calculate the bitrate.

4.2.4 Dropped Frames

Because of the GOP structure of video streams, most of the frames are dependent on the

preceding ones. Therefore, dropping a frame will cause any future frames to be applied to an

image for which it was not intended, impacting the quality of the stream. Fortunately, since each

GOP is independent to each other, this degradation will not be permanent. As such, the

occasional dropped frame, while not desired, is not fatal. However, if the connection is dropping

a lot of frames, there will be a noticeable drop in quality, making it a metric of interest.

The ABR client player keeps track of the number of dropped frames. While the WebRTC API

also exposes a framesDropped attribute, it is not defined the same as outlined here. Instead, it

appears to simply be the difference between the attributes framesReceived and

framesDecoded. Thus, this value varies wildly at any point in time, and also has no impact on

quality, rather than a stable number that only increases over time. Rather, the equivalent value

from the WebRTC API is packetsLost.

14

4.3 Experimental Setup

4.3.1 Encoder Specifications

The computer that will serve as the encoder of the source video is a laptop with the following

specifications

Operating System 64-bit Windows 10 Home version 1709

Processor Intel i5-7200U

RAM 8,0 GB

Table 2: The specifications of the encoding computer.

4.3.2 Server Specifications

An ASUS ROG Strix GL502VM will be used as the server, with similar specifications as the

Encoder

Operating System 64-bit Windows 10 Home version 1709

Processor Intel i7-6700HQ

RAM 8,0 GB

Table 3: The specifications of the server.

4.3.3 Client Specifications

Finally, the client is a stationary Dell computer and has these specifications

Operating System 64-bit Windows 10 Pro version 1709

Processor Intel i5-6500

RAM 16,0 GB

Table 4: The specifications of the client computer.

15

4.3.4 Source Video

It is more important that what is being streamed is the same content across each protocol,

rather than exactly what content is being streamed. As such, a static video file is used as the

source but streamed as if it was being played live.

The video chosen is Big Buck Bunny. It is a short computer animated movie by the Blender

Foundation from 2008 [19]. It has since become a standard video sample for video related

experiments. The video is the 2D version running in 1080p at 30 frames per second.

4.3.5 Encoding

The encoder uses FFMPEG version 3.4.2 to encode the source video into a live-output, which

will then be sent to the streaming media server on the server.

While MPEG-DASH is codec agnostic, HLS and the WebRTC media streams are not. However,

both of those technologies support the H.264 video codec [20], [21]. As such, to eliminate the

influence of the video codec, the video is encoded with the H.264 codec.

Unfortunately, there is no such overlap between HLS and WebRTC when it comes to the audio

codec [20], [21]. Consequently, the AAC audio codec will be used for MPEG-DASH and HLS

while WebRTC will use the OPUS codec. However, these changes to codecs will be done by

the transcoder that provides the outgoing streams. This means that the stream output by the

encoder is able to be the same, regardless of how the outgoing streams will be viewed.

As such, the output stream from the encoder was the following:

ffmpeg.exe -re -i <filename>.mp4 -pix_fmt yuv420p -vsync 1 -sn -vf

scale=1280:720,drawtext="fontsize=72:fontcolor=white:fontfile=/Windows/Fonts/arial.ttf:text='%{

localtime\:%X}':boxcolor=0x000000AA:box=1:x=10:y=10" -threads 0 -vcodec libx264 -r 30 -g 60

-b:v 2572008 -bufsize 3611010 -maxrate 3215010 -preset veryfast -profile:v baseline -tune film -

sc_threshold 0 -acodec aac -b:a 396000 -ac 2 -ar 48000 -af

aresample=async=1:min_hard_comp=0.100000:first_pts=0 -map_metadata -1 -f flv

rtmp://<IP>:<Port>/<Application>/<Stream Name>

In addition to setting the video and audio codec, and their bitrates, it also sets the GOP duration

to 2 seconds. This is a duration that is divisible by each of the intended segment durations,

which is a requirement in ABR. The command also embeds the current time inside of a

rectangle into the video itself, which will be used to measure end-to-end latency.

4.3.6 Streaming Media Server

The streaming media server that the server is running is Wowza Streaming Engine 4.7.4.01. It

receives an input stream from the encoder and uses a built-in transcoder to create the

16

necessary output streams, outlined in chapters 4.4.1 and 4.4.2 below. It also automatically

creates the required manifest files for both HLS and DASH for each stream as well.

4.3.7 Browsers

As outlined in chapter 4.1.1, the client will receive the relevant output stream(s), sent from the

streaming media server on the server, using two different browsers. ABR streams are received

by Shaka Player version 2.3.3 running on Microsoft Edge 41.16299.248.0. The WebRTC stream

is received by the WebRTC player provided by WebRTC running on a 64-bit Mozilla Firefox

59.0.2.

4.4 Experiments

The encoder, server and client were all connected to the same local area network. This to

minimize the influence of the transportation itself.

The experiments began when the encoder began encoding the source video, outputting it to the

server as a live-stream. The server provides the relevant output streams that the client can

connect to, using one of the protocols.

To avoid any edge cases of connecting to the stream too early, the client waited 30 seconds

before attempting to connect to the stream. The primary area of interest would be the viewing of

a longer, or continuous, existing stream. This could, for example, be a television broadcast. In

such a situation, those edge cases would be very rare, or even non-existent.

Measurements began once the video player began playback of the content, and were measured

until three minutes had elapsed. The client automatically disconnected once this time elapsed,

which is before the stream itself terminated. As such, edge cases similar to that mentioned

previously are also avoided.

The clocks on both the encoder and client were synced so that the client can be recorded, using

OBS version 21.1.0, and the system time used to determine the end-to-end latency. The latency

was recorded at five second intervals.

To lessen the impact of outlying results, each test was run three times. The average values of

these runs were what was then used.

4.4.1 Adaptive Bitrate Streaming

Since MPEG-DASH is left entirely up to the implementation it provides no recommendations

that can be used as a basis for deciding these parameters. However, this is not the case for

HLS and Apple provide some recommendations as to what they deem as reasonable [20].

For segment length, Bitmovin has evaluated that for ABR, the optimal segment duration for

effective throughput lies at 2-3 seconds [22]. This segment duration corresponds with

17

recommendations seen elsewhere on the internet. At the upper end will be the segment

duration recommended by Apple as a good compromise between network overhead and

latency, 10 seconds [20].

As previously mentioned, the buffer also directly increases the latency and as such, the optimal

solution for low latency would be to have no buffer at all. However, it also provides stability to

the stream, which is vital, so it cannot be removed entirely. Thus, the smallest possible buffer

size would still be one segment long. While no source could be found, it would appear that

colloquially, a buffer of three segments is what is recommended. This also appears to correlate

with what Apple states, although the buffer size is never mentioned directly [20]. This is

probably because the buffer is controlled by the player and is not normally under the direct

control of the streamer.

However, the player chosen for ABR allows for control over the length of the buffer. This will be

used in an attempt to decrease the latency for ABR by buffering either three segments, or one

segment. Together with the previously outlined segment lengths, the ABR experiments will be

performed with the following settings for both HLS and DASH:

Scenario Segment Duration Segments in Buffer

1 10 3

2 2 3

3 2 1

Table 5: Defines the settings used for each ABR scenario, attempting to achieve lower

latencies.

In addition, the following resolutions will be provided to the ABR clients, at the indicated bitrates.

While none of the lower resolutions are expected to be used, they are provided so as to

simulate a standard workload on the server, for a more realistic ABR result. The resolutions

were chosen because they represent HD ready, SD and LD respectively. Full HD was initially

included as well but this proved too heavy for the server to handle, and thus it had to be

dropped.

These resolutions conveniently correspond to the resolutions used by Youtube. The video

bitrates were then based on the bitrates that Youtube uses for their live streaming for these

same resolutions [23].

Resolution Video Bitrate (kbps) Audio Bitrate (kbps)

720p 2000 192

480p 1250 128

18

360p 700 96

Table 6: The transcoding settings the server uses when creating streams for ABR.

4.4.2 WebRTC

Much of WebRTC is determined automatically and does not leave many parameters of interest

available for tweaking. While the data channel would allow for more granular control, the

streaming media server currently only provides the use of the media stream. Parameters that

are available to the media stream, however, are which codecs to use, or the bitrate of the video.

Both of these parameters were outlined back in chapter 4.1.3 and the codecs H.264 and VP8

will be used, at the bitrates determined therein. H.264 will allow for a more homogenous

comparison to the ABR protocols. As such, H.264 will be the only codec to be directly compared

to the ABR protocols. The performance of VP8 will then be compared to that of H.264 to give a

better picture of the internal performance of WebRTC itself.

Codec Video Bitrate (kbps) Audio Bitrate (kbps)

H.264 1500 128

VP8 2000 128

Table 7: The transcoding settings the server uses when creating streams for WebRTC.

19

Chapter 5

Results

5.1 WebRTC and ABR

5.1.1 Latency

Graph 1: Comparing the latencies between WebRTC and ABR for ABR scenario 1.

20

Graph 2: Comparing the latencies between WebRTC and ABR for ABR scenario 2.

Graph 3: The amount of time spent buffering by each ABR protocol for ABR scenario 2.

21

Graph 4: Comparing the latencies between WebRTC and DASH for ABR scenario 3.

Graph 5: The amount of time spent buffering by DASH for ABR scenario 3.

22

5.1.2 Bandwidth

Graph 6: Comparing bandwidth used between WebRTC and ABR for ABR scenario 1.

Graph 7: Comparing bandwidth used between WebRTC and ABR for ABR scenario 2.

23

Graph 8: Comparing bandwidth used between WebRTC and DASH for ABR scenario 3.

5.1.3 Bitrate

Graph 9: Comparing bitrate achieved between WebRTC and ABR.

24


Graph 10: Comparing number of dropped frames WebRTC and ABR for ABR scenario 1.

Graph 11: Comparing number of dropped frames WebRTC and ABR for ABR scenario 2.

25

Graph 12: Comparing number of dropped frames WebRTC and DASH for ABR scenario 3.

5.2 H.264 and VP8

5.2.1 Latency

Graph 13: Comparing the latencies for WebRTC using the H.264 and VP8 codecs.

26

5.2.2 Bandwidth

Graph 14: Comparing the bandwidth usage of WebRTC using the H.264 and VP8 codecs.

5.2.3 Bitrate

Graph 15: Comparing bitrate WebRTC achieved using the H.264 and VP8 codecs.

27


Graph 16: Comparing number of frames WebRTC dropped using the H.264 and VP8 codecs.

28

Chapter 6

Discussion

6.1 WebRTC and ABR

Looking at latency in graph 13, WebRTC definitively delivers. While the encoding and

transcoding adds to the latency, WebRTC is able to keep the latency at a low 3 seconds almost

throughout. If it ever stalls, it will simply discard what it missed and continues from where the

stream is now, guaranteeing that the latency is as low as possible.

However, even in an optimal setup such as this, ABR was not able to deliver a comparable

latency. As seen in graphs 2 and 4, DASH appears to hit a bottleneck at a latency of around 9

seconds. Looking at the same graphs, HLS was able to reach lower latencies for scenario 2, but

was not able to perform at all on scenario 3. This made it impossible to gather the necessary

data to determine if HLS would also have hit a bottleneck.

Even during these circumstances, it was still possible to see the consequences of attempting to

push ABR to latencies this low. For scenario 2, DASH had to stop playback and spend time

buffering early on before it could smoothly resume playback for the remainder of the time. This

was also the case when DASH was pushed further for scenario 3, where it now also had to

resort to dropping frames. This can be seen in graph 3, graph 5, and graph 12.

On the bandwidth front, WebRTC performs at a comparable level to ABR. Graph 6 shows that

while ABR is all or nothing, WebRTC requires constant bandwidth but is much more stable as a

result. These spikes in the bandwidth usage of ABR makes the bandwidth usage seem higher at

times, but as latency goes down, the usage comes more in line with WebRTC, this is most

clearly seen in graphs 7 and 8. Even then, there is still a difference in favor of WebRTC.

However, as graph 9 shows, WebRTC was not able to deliver as high a bitrate as ABR was,

and the difference is approximately the same as the difference in bitrate.

Graph 15 shows that the performance of WebRTC unfortunately suffers from the congestion

control. It takes approximately 20 seconds to reach the sought-after bitrate while it is figuring out

the capacity of the connection. During this time, it can be observed that playback suffers

noticeably. The video never stalls entirely, and is thus not visible in the data, however, video is

still visibly stuttering.

In general, WebRTC does not deliver on quality. As graphs 2 and 4 show, where ABR is able to

provide a stable playback throughout, WebRTC stalls quite often in comparison. Graphs 10

through 12 also show that WebRTC drops a massive number of frames, almost one frame per

second on average. While this does not always impact the stream noticeably, the particularly

egregious instances do correlate with the bigger spikes in latency.

29

As far as implementation goes, it can generally use the same input stream as ABR. However,

performance of WebRTC appears to be more volatile. As such, WebRTC would require more

fine tuning to deliver a more comparable quality of experience.

6.2 H.264 and VP8

When comparing these two video codecs for use with WebRTC, it is more straightforward.

Graph 13 shows that VP8 has approximately the same latency as H.264. However, VP8 is

much better at sticking to that low latency, while simultaneously dropping up to a quarter fewer

frames in graph 16.

In graph 14 it can be seen that VP8 does perform worse when it comes to bandwidth usage.

However, as with ABR, the higher bandwidth usage is again approximately the same as the

difference in bitrate. Therefore, if H.264 could deliver the same bitrate as VP8 could, the

difference should be moot.

30

Chapter 7

Conclusion

If latency is the only thing that matters, then WebRTC is absolutely viable. With the use of a

transcoder to change the audio codec, WebRTC could most likely simply be added on top of

existing infrastructure. Using VP8 as the video codec instead of H.264 is definitively the way to

go, delivering superior performance under the same circumstances. However, using WebRTC

as is, even with VP8, will negatively impact the quality of the stream. Therefore, more work will

be required for fine tuning the system to bring WebRTC up to par.

Unfortunately for WebRTC, due to the nature of the connection, it cannot leverage the existing

infrastructure of the internet like ABR can. Setting up a new infrastructure to achieve the same

reach as existing content delivery networks would be required. This will require more extensive

work but the potential for WebRTC to be an alternative in the future is definitively there.

31

Chapter 8

Future Work

As for future work, the biggest shortcoming of this thesis is its limited scope. It is done both

locally, and involves only one connecting client.

Since an online stream will definitely be viewed by more than one client, determining the impact

of multiple peers on the performance of WebRTC is of utmost importance. While this

performance hit can be split across multiple servers, the limit for one individual server is very

important for the viability of widespread use of WebRTC.

Similarly, it is unlikely that all WebRTC streams will be viewed only in a relatively local area.

Therefore, how the distance between peers impact performance, primarily latency, is also

important. WebRTC has a wide margin in this setup but unless it can maintain this advantage as

the scope increases, potentially to a global one, the appeal will suffer accordingly.

Finally, it is important to acknowledge that these experiments were all done using the built-in

media stream of WebRTC. Once implementations that use the data channel are more

widespread and available, investigating the performance of these is also of interest. This would

allow for a more direct comparison with ABR, changing only the method of delivery, or

something entirely new.

32

Bibliography

[1] Oracle, “Oracle Security Blog”. [Online]. Available: https://blogs.oracle.com/oraclesecurity/.

[Accessed: 10 October, 2018].

[2] Anthony Laforge, “Saying goodbye to Flash in Chrome”, The Keyword, 25 July, 2017.

[Online].

Available: https://www.blog.google/products/chrome/saying-goodbye-flash-chrome/.


[3] Benjamin Smedberg, “Firefox Roadmap for Flash End-of-Life”, moz://a, 25 July, 2017.

[Online].

Available: https://blog.mozilla.org/futurereleases/2017/07/25/firefox-roadmap-flash-end-

life/.


[4] R. Pantos, “HTTP Live Streaming”, Internet Engineering Task Force, draft-pantos-http-

live-streaming-00, May 1, 2009. [Online].

Available: https://tools.ietf.org/html/draft-pantos-http-live-streaming-00.


[5] Athar Shah. “Introducing HEIF and HEVC”, WWDC 2017, June 7, 2017. Available: https://developer.apple.com/videos/play/wwdc2017/503.


[6] A.B. Roach, “WebRTC Video Processing and Codec Requirements”, Internet

Engineering Task Force, RFC 7742, March 2016. [Online].

Available: https://www.rfc-editor.org/rfc/pdfrfc/rfc7742.txt.pdf.

[Accessed: 10 October 2018].

[7] International Organization for Standardization, “Information technology -- Dynamic

adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and

segment formats”, International Organization for Standardization, ISO/IEC 23009-

1:2012, April 2012. [Online].

Available: https://www.iso.org/standard/57623.html.


[8] Iraj Sodagar, “MPEG-DASH: The Standard for Multimedia Streaming Over Internet”, The

Moving Picture Experts Group, ISO/IEC 23009, April 2012. [Online].

Available: https://mpeg.chiariglione.org/standards/mpeg-dash.


[9] Harald Alvestrand, “Google release of WebRTC source code”, 1 June 2011. [Online]. Available: https://lists.w3.org/Archives/Public/public-webrtc/2011May/0022.html.


[10] “WebRTC Home”. [Online].

Available: https://webrtc.org/.

[Accessed: 10 October, 2018]

https://blogs.oracle.com/oraclesecurity/

https://www.blog.google/products/chrome/saying-goodbye-flash-chrome/

https://blog.mozilla.org/futurereleases/2017/07/25/firefox-roadmap-flash-end-life/

https://blog.mozilla.org/futurereleases/2017/07/25/firefox-roadmap-flash-end-life/

https://tools.ietf.org/html/draft-pantos-http-live-streaming-00

https://developer.apple.com/videos/play/wwdc2017/503

https://www.rfc-editor.org/rfc/pdfrfc/rfc7742.txt.pdf

https://www.iso.org/standard/57623.html

https://mpeg.chiariglione.org/standards/mpeg-dash

https://lists.w3.org/Archives/Public/public-webrtc/2011May/0022.html

https://webrtc.org/

33

[11] J. Uberti, “WebRTC Forward Error Correction Requirements”, Internet Engineering Task

Force, draft-ietf-rtcweb-fec-07, December 2017. [Online].

Available: https://tools.ietf.org/html/draft-ietf-rtcweb-fec-07#section-3.2.


[12] C. Perkins, M. Westerlund, and J. Ott, “Web Real-Time Communication (WebRTC):

Media Transport and Use of RTP”, Internet Engineering Task Force, draft-ietf-rtcweb-rtp-

usage-26, March 2016. [Online].

Available: https://tools.ietf.org/html/draft-ietf-rtcweb-rtp-usage-26#section-4.1.


[13] Jimmy Zöger and Marcus Wallstersson, “Peer Assisted Live Video Streaming in Web

Browsers using WebRTC”, Software and Computer systems, School of Information and

Communication Technology, KTH Royal Institute of Technology, Stockholm, Sweden. 10

June, 2014.

[14] W3C WebRTC Working Group, “Media Capture and Streams”, World Wide Web

Consortium. [Online].

Available: https://w3c.github.io/mediacapture-main/.


[15] Viswanathan Swaminathan and Sheng Wei, “Low latency live video streaming using

HTTP chunked encoding” in 2011 IEEE 13th International Workshop on Multimedia

Signal Processing, 17-19 October 2011, Hangzhou, China. [Online].

Available: IEEE Xplore, https://ieeexplore.ieee.org/document/6093825.

[Accessed: 10 October. 2018].

[16] Periscope engineering team, “Introducing LHLS Media Streaming”, 21 July, 2017. [Online].

Available: https://medium.com/@periscopecode/introducing-lhls-media-streaming-

eb6212948bef.


[17] Shuai Zhao, Zhu Li, and Deep Medhi, “Low delay MPEG DASH streaming over the

WebRTC data channel” in 2016 IEEE International Conference on Multimedia & Expo

Workshops, 11-15 July, 2016, Seattle, USA. [Online].

Available: IEEE Xplore, https://ieeexplore.ieee.org/document/7574765.


[18] Jonas Birmé, “HTML5 Player with support for HLS, MPEG-DASH and Smooth

Streaming”, Eyevinn Technology. [Online].

Available: https://github.com/Eyevinn/html-player.


[19] Blender Foundation, “Big Buck Bunny”. [Online]. Available: https://peach.blender.org/about/.

[Accessed: 10 October, 2018],

[20] Apple Inc, “Frequently Asked Questions”. [Online]. Available:

https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptu

al/StreamingMediaGuide/FrequentlyAskedQuestions/FrequentlyAskedQuestions.html.


https://tools.ietf.org/html/draft-ietf-rtcweb-fec-07#section-3.2

https://tools.ietf.org/html/draft-ietf-rtcweb-fec-07#section-3.2

https://tools.ietf.org/html/draft-ietf-rtcweb-rtp-usage-26#section-4.1

https://w3c.github.io/mediacapture-main/

https://ieeexplore.ieee.org/document/6093825

https://medium.com/@periscopecode/introducing-lhls-media-streaming-eb6212948bef

https://medium.com/@periscopecode/introducing-lhls-media-streaming-eb6212948bef

https://ieeexplore.ieee.org/document/7574765

https://github.com/Eyevinn/html-player

https://peach.blender.org/about/

https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/StreamingMediaGuide/FrequentlyAskedQuestions/FrequentlyAskedQuestions.html




34

[21] Wowza, “How to use WebRTC with Wowza Streaming Engine”. [Online]. Available: https://www.wowza.com/docs/how-to-use-webrtc-with-wowza-streaming-

engine.


[22] Stefan Lederer, “Optimal Adaptive Streaming Formats MPEG-DASH & HLS Segment

Length”, Bitmovin, 9 April, 2015. [Online].

Available: https://bitmovin.com/mpeg-dash-hls-segment-length/.


[23] “Live encoder settings, bitrates, and resolutions”, Youtube. [Online]. Available: https://support.google.com/youtube/answer/2853702.


https://www.wowza.com/docs/how-to-use-webrtc-with-wowza-streaming-engine



https://bitmovin.com/mpeg-dash-hls-segment-length/

https://support.google.com/youtube/answer/2853702

TRITA -EECS-EX-2019:79

www.kth.se

Documents

Investigation into low latency live video streaming