© Copyright by James David Wong, 1999choices.cs.illinois.edu/Papers/Theses/MS.Wong.1999.pdf · JAMES DAVID WONG B.A., Rice University, 1994 THESIS Submitted in partial fulfillment

AN EXTENSIBLE FRAMEWORK FOR RTSPAPPLICATIONS

BY

JAMES DAVID WONG

B.A., Rice University, 1994

THESIS

Submitted in partial fulfillment of the requirementsfor the degree of Master of Science in Computer Science

in the Graduate College of theUniversity of Illinois at Urbana-Champaign, 1999

Urbana, Illinois

iii

Abstract

With advances in processor performance, memory sizes, storage capacity and

telecommunications technologies, networked multimedia applications such as streaming

video have become increasingly popular. Early systems used proprietary protocols to

transport control information and multimedia data, hindering interoperation between

d ifferent implementations, or adapted existing protocols, sacrificing quality and

flexibility. The Real-Time Streaming Protocol (RTSP) addresses these concerns by

providing a standardized, open architecture for controlling playback and recording

functionality in multimedia systems. This thesis presents an object-oriented framework

for constructing applications that make use of RTSP, describes the integration of the

framework into a pair of multimedia applications, and documents its performance. The

presentation closes with a discussion of opportunities for future development of the

architecture.

iv

For Julie

v

Acknowledgements

I am indebted to Professor Roy H. Campbell for his guidance on the long, sometimes

strange odyssey that resulted in this thesis.

I must also offer thanks to my friends and co-workers at Vosaic, particularly Zhigang

Chen, Drew MacGregor, See-Mong Tan, Chuck Thompson and Miguel Valdez. Their

encouragement, support, insight and advice have been invaluable.

Likewise, I am grateful to Joe Shidle, David Hyatt, Rebecca Hyatt, Dan Gaines, Tim

Fraser, Garry Sittler, Mei-Ling Tong and numerous others for helping me to maintain my

sense of purpose and perspective and for providing unending motivation.

Finally, heartfelt thanks to my parents, Richard and Neva Wong, and brother, William

Wong, without whom, none of this would be possible.

vi

Table of Contents

Page

Chapter 1 Introduction .................................................................................................. 1

1.1 Multimedia on the Web........................................................................................ 2

1.2 Hardware Technology .......................................................................................... 3

1.3 Software for Internet Multimedia......................................................................... 4

1.4 The Framework .................................................................................................... 5

1.5 Organization ......................................................................................................... 6

Chapter 2 Background................................................................................................... 7

2.1 RTSP Evolution.................................................................................................... 7

2.2 RTSP Features...................................................................................................... 8

2.2.1 Setup and Control.................................................................................. 9

2.2.2 Timing and Synchronization ............................................................... 10

2.2.3 Security................................................................................................ 10

2.2.4 Extensibility......................................................................................... 11

2.2.5 Flexibility ............................................................................................ 12

2.2.6 Ease of Use.......................................................................................... 12

2.3 RTSP Applications............................................................................................. 13

vii

2.3.1 Video-on-Demand............................................................................... 13

2.3.2 Live Broadcasts ................................................................................... 13

2.3.3 Near Video-on-Demand ...................................................................... 14

2.3.4 Virtual Presentations ........................................................................... 14

2.3.5 Conferencing and Telephony .............................................................. 15

2.3.6 Distributed Digital Editing .................................................................. 15

2.4 Related Technologies ......................................................................................... 15

2.4.1 Hypertext Transport Protocol.............................................................. 15

2.4.2 PNA..................................................................................................... 18

2.4.3 H.323 ................................................................................................... 18

2.4.4 Session Initiation Protocol .................................................................. 24

2.4.5 Real-Time Transport Protocol............................................................. 28

2.4.6 Video Datagram Protocol.................................................................... 32

2.4.7 IP Multicast and the MBONE ............................................................. 35

2.4.8 Session Description Protocol and Session Announcement Protocol... 37

2.4.9 Synchronized Multimedia Integration Language................................ 39

2.5 Other RTSP Implementations ............................................................................ 40

2.5.1 Real Networks Reference Implementation.......................................... 40

2.5.2 RealSystem G2.................................................................................... 41

viii

2.5.3 CERN WrtpVoD ................................................................................. 43

2.5.4 IBM RTSP Toolkit .............................................................................. 44

2.5.5 Apple QuickTime................................................................................ 46

2.5.6 Academic Implementations................................................................. 47

Chapter 3 RTSP in Detail ............................................................................................ 48

3.1 RTSP Resources ................................................................................................. 49

3.2 RTSP Sessions.................................................................................................... 51

3.3 RTSP Messages.................................................................................................. 52

3.3.1 Requests .............................................................................................. 54

3.3.2 Request Methods ................................................................................. 56

3.3.3 Responses ............................................................................................ 60

3.3.4 Response Status Codes........................................................................ 62

3.3.5 Headers................................................................................................ 63

3.4 In-Line Data ....................................................................................................... 64

3.5 Extension Mechanisms....................................................................................... 65

Chapter 4 An Extensible Framework for RTSP Applications................................. 67

4.1 Implementation Language.................................................................................. 68

4.1.1 Exception Handling............................................................................. 68

ix

4.1.2 Run-time Type Information................................................................. 69

4.2 Support Libraries................................................................................................ 69

4.2.1 GPLib .................................................................................................. 69

4.2.2 Connection .......................................................................................... 70

4.2.3 OSLib .................................................................................................. 71

4.3 Class Models ...................................................................................................... 71

4.3.1 Overall Class Model............................................................................ 72

4.3.2 Request Class Model ........................................................................... 73

4.3.3 Exceptions ........................................................................................... 74

4.4 Functional Model ............................................................................................... 75

4.5 Dynamic Models ................................................................................................ 76

4.5.1 RTSPStream ........................................................................................ 78

4.5.2 RTSPInlineDataQueue ........................................................................ 79

4.5.3 RTSPMessageQueue........................................................................... 80

4.5.4 RTSPReader ........................................................................................ 81

4.5.5 RTSPDataReader................................................................................. 82

4.5.6 RTSPMessageReader .......................................................................... 83

4.5.7 RTSPRequestFactory .......................................................................... 84

4.5.8 RTSPMissive....................................................................................... 85

x

4.5.9 RTSPInlineData................................................................................... 85

4.5.10 RTSPMessage ..................................................................................... 86

4.5.11 RTSPRequest ...................................................................................... 87

4.5.12 RTSPAnnounceRequest ...................................................................... 89

4.5.13 RTSPDescribeRequest ........................................................................ 89

4.5.14 RTSPOptionsRequest.......................................................................... 90

4.5.15 RTSPPauseRequest ............................................................................. 90

4.5.16 RTSPPlayRequest ............................................................................... 91

4.5.17 RTSPRecordRequest ........................................................................... 92

4.5.18 RTSPSetupRequest ............................................................................. 92

4.5.19 RTSPTeardownRequest ...................................................................... 92

4.5.20 RTSPChannelListRequest ................................................................... 93

4.5.21 RTSPUserListRequest......................................................................... 94

4.5.22 RTSPGetUserDataRequest.................................................................. 94

4.5.23 RTSPUpdateUserDataRequest............................................................ 94

4.5.24 RTSPResponse .................................................................................... 95

4.5.25 RTSPHeader........................................................................................ 96

4.5.26 RTSPTransportHeader ........................................................................ 97

4.5.27 RTSP Exception Subclasses................................................................ 98

xi

4.6 Programming Interfaces ................................................................................... 100

4.6.1 RTSPStream Interface....................................................................... 100

4.6.2 RTSPMessage Interface .................................................................... 101

4.6.3 RTSPRequest Interface ..................................................................... 102

4.6.4 RTSPResponse Interface................................................................... 103

4.6.5 The RTSPHeader Interface ............................................................... 104

Chapter 5 Applications .............................................................................................. 105

5.1 The Vosaic Reflector........................................................................................ 105

5.1.1 Role of RTSP .................................................................................... 106

5.1.2 Integration ......................................................................................... 107

5.2 Vosaic IP Hoot ................................................................................................. 109

5.2.1 Role of RTSP .................................................................................... 110

5.2.2 Integration ......................................................................................... 111

5.3 Performance Data ............................................................................................. 112

5.3.1 Test Environment .............................................................................. 114

5.3.2 Experiments and Results ................................................................... 115

5.3.3 Analysis ............................................................................................. 119

Chapter 6 Framework Evolution .............................................................................. 121

xii

6.1 Implementation Issues...................................................................................... 121

6.1.1 Use of Standard C++ Features .......................................................... 122

6.1.2 Efficiency Concerns .......................................................................... 122

6.1.3 RTSPHeader Support ........................................................................ 123

6.1.4 Protocol Logic ................................................................................... 124

6.2 Architectural Enhancements............................................................................. 124

6.2.1 Client Interface .................................................................................. 125

6.2.2 Server Interface ................................................................................. 127

Chapter 7 Conclusions ............................................................................................... 132

References .................................................................................................................... 134

xiii

List of Tables

Table 3.1 Response Classes.....................................................................................................62

Table 5.1: Experimental Results ...........................................................................................115

xiv

List of Figures

Figure 4.1: Overall Class Model .............................................................................................72

Figure 4.2: Request Hierarchy.................................................................................................73

Figure 4.3: Exception Hierarchy .............................................................................................74

Figure 4.4: Functional Model..................................................................................................75

Figure 4.5: RTSPStream Dynamic Model ..............................................................................77

Figure 4.6: RTSPInlineDataQueue Dynamic Model ..............................................................79

Figure 4.7: RTSPMessageQueue Dynamic Model .................................................................80

Figure 4.8: RTSPReader Dynamic Model ..............................................................................81

Figure 4.9: RTSPDataReader Dynamic Model.......................................................................82

Figure 4.10: RTSPMessageReader Dynamic Model ..............................................................83

Figure 4.11: RTSPRequestFactory Dynamic Model...............................................................84

Figure 4.12: RTSPMissive Dynamic Model ...........................................................................85

Figure 4.13: RTSPInlineData Dynamic Model.......................................................................86

Figure 4.14: RTSPMessage Dynamic Model..........................................................................87

Figure 4.15: RTSPRequest Dynamic Model...........................................................................88

Figure 4.16: RTSPAnnounceRequest Dynamic Model ..........................................................89

Figure 4.17: RTSPDescribe Request Dynamic Model............................................................90

xv

Figure 4.18: RTSPPlayRequest Dynamic Model....................................................................91

Figure 4.19: RTSPSetupRequest Dynamic Model..................................................................92

Figure 4.20: RTSPChannelListRequest Dynamic Model .......................................................93

Figure 4.21: RTSPResponse Dynamic Model ........................................................................95

Figure 4.22: RTSPHeader Dynamic Model ............................................................................96

Figure 4.23: RTSPTransportHeader Dynamic Model.............................................................97

Figure 4.24: Exception Class Dynamic Model .......................................................................98

Figure 4.25: RTSPStream Interface ......................................................................................100

Figure 4.26: RTSPMessage Interface....................................................................................101

Figure 4.27: RTSPRequest Interface.....................................................................................102

Figure 4.28: RTSPResponse Interface ..................................................................................103

Figure 4.29: RTSPHeader Interface ......................................................................................104

Figure 5.1: Reflector Main Loop...........................................................................................107

Figure 5.2: Reflector Class Model ........................................................................................108

Figure 5.3: Reflector Functional Model ................................................................................109

Figure 5.4: IP Hoot Class Model...........................................................................................112

Figure 5.5: IP Hoot Functional Model ..................................................................................113

Figure 5.6: Test Request........................................................................................................116

Figure 5.7: Test Response .....................................................................................................117

xvi

Figure 6.1: Revamped Client Interface .................................................................................126

Figure 6.2: Revamped Server Interface.................................................................................128

1

Chapter 1 Introduction

As improving hardware and software technology has made the Internet a more accessible

and rich environment, considerable effort has been expended on attempts to broaden the

scope of the World Wide Web to incorporate audio and video in addition to still images

and text. A number of software architectures and network protocols have been created to

address the difficulties that arise in attempting to do so. The Real Time Streaming

Protocol (RTSP) is one such protocol, an open standard intended to serve as a common

language for continuous media clients and servers; RTSP is a simple, extensible control

protocol for managing the playback and recording of multimedia data streams over

computer networks. This thesis presents the specification and design of an object-oriented

framework for constructing applications that make use of RTSP to provide access to

continuous media services. The framework described herein facilitates the creation of

applications that provide networked multimedia functionality by adopting the same

principles of simplicity and flexibility that underlie the protocol’s design and manifesting

them in the form of a collection of collaborating objects and classes that provide the

functionality necessary to implement RTSP.

2

1.1 Multimedia on the Web

The motivation for the RTSP framework, and of the protocol itself, lies in the growth of

the Internet’s popularity and usage during the 1990’s. With the availability of

inexpensive, high-speed modems and the release of the NCSA Mosaic web browser,

usage of the Internet has exploded, particularly among groups that previously had neither

access to the Internet nor need for such access. The dramatic increase in the exposure and

use of the Internet has qualified it as a mass medium, analogous to television, radio and

newspapers, while the unique aspects of the Internet make it possible to provide targeted

content geared toward specific audiences to a degree never before possible.

As a natural consequence of more pervasive use of the Internet and the World Wide Web,

demand has grown for the kind of content users have become accustomed to seeing in

CD-ROM-based software titles: highly graphical, interactive multimedia productions.

Furthermore, the reach and scalability of the Internet has made it an attractive platform

for communications applications such as electronic webcasts and point-to-point and

multi-party conferencing. Implementing these applications in the Internet environment

presents a number of challenges: users must have access to sufficient bandwidth to

support the demands of media-rich applications; their computer systems must have

enough processing capability to decode compressed media streams in real time; and the

application software must include mechanisms for transmitting and decoding continuous

media data streams.

3

1.2 Hardware Technology

The first two challenges referenced above have been addressed by addressed by the

evolution of hardware technology. The amount of bandwidth available to end-users has

increased steadily since the popularization of the web, with modem bitrates increasing

from 14.4 kb/s to 56.6 kb/s, enough bandwidth to support high quality audio or low

resolution video with medium quality audio. Other connection technologies, such as

Integrated Services Digital Network (ISDN), cable modems and Asymmetric Digital

Subscriber Line (ADSL), have made headway as well, promising to bring additional

bandwidth to users and enable higher quality video in the not-too-distant future.

Likewise, processor performance has increased significantly from year to year. In 1995, a

Dell Dimension XPS personal computer achieved a score of 3.16 on the SPECint95

benchmark, a widely-used benchmark suite for evaluating computer systems’

performance on integer operations; in 1998, a Dell Precision Workstation 610 scored 19.0

on the same benchmark [1], a more than six-fold improvement. The infusion of new

technologies into the mainstream personal computer market, such as superscalar, RISC-

based processing and special purpose instruction set enhancements like the MMX

multimedia extensions to the Intel Architecture, have also increased the amount of

computing power available to consumers. As a result, the computer systems available to

most users are more than capable of decoding the highly compressed audio and video

data streams generated by networked multimedia applications.

4

The pace of advancement in hardware performance shows no signs of slowing, either.

Products and technologies on the horizon, such as Very Long Instruction Word (VLIW)

computing and accelerated hardware implementations of multimedia algorithms promise

to ensure that users will have enough processing power to handle the higher bitrate data

streams made possible by the new connection technologies described above.

1.3 Software for Internet Multimedia

The improvements in processing power and Internet connectivity have driven the

development of software architectures that take advantage of the available technology.

Various commercial software vendors have released proprietary software products that

provide differing levels of interactivity and richness. Among others, these include

Macromedia, whose Flash and Shockwave products enable the display of simple

animations and interactive presentations; Real Networks, whose Real System is the

dominant platform for delivery of multimedia over the Internet; Microsoft, which offers a

variety of live and on-demand multimedia and conferencing applications, and Apple

Computer, which distributes software that enables QuickTime movies to be played over

the Internet. Typically, these products are integrated with users’ web browsers through a

plug-in mechanism defined by Netscape Communications Corporation in its Navigator

web browser, a commercial follow-on to the original Mosaic browser.

In their first incarnations, the software solutions for providing interactive multimedia

over the Internet have been, to varying degrees, dependent on proprietary transport and

control protocols. As a result of this dependence on proprietary techniques,

5

interoperability between different vendors’ implementations has been impossible. RTSP

and several related specifications seek to address the problem of interoperation by

providing a standard platform upon which multimedia applications can be built. These

specifications, which include RTSP for control over the playback of multimedia; the

Real-Time Transport Protocol (RTP) for delivery of continuous media data streams; and

the Session Description Protocol (SDP), which facilitates the communication of

information required to display a multimedia presentation, are open standards

administered by the Internet Engineering Task Force. As such, they enable any developer

to create clients and servers that work transparently with applications by other authors,

while still allowing for differentiation through documented extension mechanisms.

1.4 The Framework

The framework described in this thesis is intended to provide a portable, flexible

implementation of RTSP for use in a variety of applications. Although there are a number

of other implementations of the protocol that are available for development purposes,

none adequately capture both the simplicity and versatility of the specification. Some

offer flexibility, but are difficult to use and extend, while others offer ease of use, but

provide only limited functionality. This framework makes use of object-oriented

techniques to provide easy-to-use, high-level abstractions, while still providing the user

with the flexibility and access needed to implement complex, customized applications.

6

1.5 Organization

The remainder of the thesis is organized as follows. Chapter 2 introduces RTSP, covering

its design goals, evolution and features, and explores related protocols and technologies

as they pertain to RTSP. Chapter 3 follows with a more detailed discussion of the

protocol, providing an explanation of its message structure and semantics sufficient to

understand the workings of the framework. Chapter 4 then presents the design of the

framework itself, detailing the classes and objects that make up the library and the ways

in which they interact, and Chapter 5 elaborates on the design by illustrating how the

components of the framework are integrated into a pair of multimedia applications.

Chapter 5 closes with performance data illustrating that the framework’s message

processing architecture is robust and efficient enough to serve as the basis for heavy-duty

multimedia servers. Finally, Chapter 6 considers issues raised by the current design and

implementation of the framework and highlights directions future development might

take.

7

Chapter 2 Background

This chapter presents background information central to an understanding of RTSP’s role

in networked multimedia systems in general and the design of this RTSP framework in

particular. Section 2.1 discusses the origins, evolution and status of the RTSP

specification. Section 2.2 describes the functionality offered by the protocol, and section

2.3 outlines some of the applications for which RTSP was designed. Section 2.4 explores

related protocols and standards for networked multimedia, and the closing section of this

chapter discusses other implementations of RTSP that are available at the time of this

writing.

2.1 RTSP Evolution

RTSP is an open standard published by the Internet Engineering Task Force (IETF) in a

standards-track Request for Comments (RFC 2326) [2]. RTSP is currently classified as a

Proposed Standard. Its development followed from the proliferation of proprietary

protocols for control and transport of multimedia data over the Internet and the elusive

goal of interoperability. The intent was to develop a flexible standard that enabled not

just interoperation between similar products from different vendors, but the ability to use

8

the same tools, file formats and protocols for telephony and conferencing applications as

in video-on-demand and webcasting environments.

With these goals in mind, Anup Rao of Netscape Communications Corporation and Rob

Lanphier of RealNetworks, Inc. (then called Progressive Networks) submitted a draft

proposal to the IETF Multiparty Multimedia Session Control (MMUSIC) working group

in November of 1996. Their proposal outlined a simple protocol based on binary, non-

human-readable messages with support for requesting live or on-demand playback of

multimedia. Shortly thereafter, Henning Schulzrinne of Columbia University submitted a

counterproposal detailing a protocol making use of HTTP-like, textual messages and

incorporating more general extension mechanisms, a more abstract treatment of

multimedia transport protocols, and support for recording functionality, as well as

playback. This specification evolved in series of Internet-Drafts released by the IETF,

and the effort culminated with the release of the RTSP RFC in April 1998. The final

specification is derived from Schulzrinne’s counterproposal, and contains contributions

from researchers and developers at Netscape, RealNetworks, Columbia University,

International Business Machines Corp., the French National Institute for Research in

Computer Science and Control (INRIA), and Microsoft Corporation, among others.

2.2 RTSP Features

This section contains a high-level discussion of some of the features and characteristics

of RTSP. Section 2.5.6 describes how these features are implemented by protocol.

9

2.2.1 Setup and Control

At its core, RTSP is a protocol that enables applications to set up and control the

playback and recording of multimedia data over a computer network. Setup consists of

arriving at an agreement as to the kind of data that will be played or recorded and the

mechanism through which it will be transported, and control offers the end-user the

ability to interactively manage the flow of data to or from the multimedia server. The

capabilities of RTSP with respect to each of these dimensions of functionality are

described below.

The setup process begins with a simple exchange of the capabilities supported by the

client and server, including any non-standard extensions to the protocol that one party or

the other has implemented or requires. It continues with negotiation of the transport

mechanism to be used to carry the multimedia stream; this is done in advance of the

initiation of delivery of the stream in order to ensure that neither participant is presented

with a stream it cannot handle. Once an appropriate means of transport has been selected,

playback or recording can commence, under the user’s control.

RTSP offers the user the ability to start, suspend and restart the transmission of the

multimedia stream as needed. In addition, RTSP-based applications can offer familiar,

VCR-like features, such as the ability to scan backwards or forwards through a

presentation and to seek to arbitrary points within a stored clip. RTSP also allows for

aggregate control over separate streams, so that presentations consisting of several

10

distinct tracks, such as a recording of a videoconference with separate audio and video

tracks, can be played and manipulated as a unified whole.

2.2.2 Timing and Synchronization

As described above, RTSP enables applications (and thus users) to begin playback at

arbitrary points within a multimedia clip or presentation; likewise, applications can

indicate that playback should stop at an arbitrary point. The desired starting and stopping

points can be specified in seconds, relative to the start of the presentation or, in the case

of archived recordings of live events, in wall clock time. In addition, clients can instruct

servers to begin or stop playback at a specified wall clock time. Thus, an RTSP client

could tell a server to play the third through sixth seconds of a stored video clip at some

moment in the future, provided the RTSP session is still active at the time.

RTSP-based applications can also use Society of Motion Picture and Television

Engineers (SMPTE) timestamps to express offsets from the beginning of clips. SMPTE

allows for frame-level control over playback and recording, thereby enabling RTSP

clients and servers to be used to perform professional-quality distributed editing of

multimedia presentations.

2.2.3 Security

RTSP provides flexible, open mechanisms for clients to interact in a secure manner with

RTSP servers. Authentication and encryption of client-server interactions are supported

11

through Internet standards, allowing implementations to provide as much or as little

security is required for a particular application.

In addition, RTSP is designed to be friendly to the firewall and proxy software in place at

many companies and public institutions that provide Internet access to employees and

patrons. The protocol itself readily lends itself to handling by transport-level proxy

services, such as SOCKS [3], and its status as an Internet standard makes it easy for

vendors of packet-filtering firewalls to allow legitimate RTSP requests to pass

unhindered while blocking the packets of hackers and other intruders.

2.2.4 Extensibility

In addition to supporting basic multimedia delivery through the features described above,

RTSP was designed to accommodate unforeseen applications and usage scenarios

through a variety of extension mechanisms. As needed, RTSP implementations can

modify the behavior and semantics of the basic operations defined by the protocol; add

entirely new operations; or, in the event the current protocol is completely unsuitable for

a particular problem, but developers wish to maintain some degree of backwards

compatibility with older software, just about every aspect of the protocol may be

changed.

Of course, the utility of changing the behavior of the protocol is greatly reduced if doing

so results in an application that is unable to inter-operate with other implementations of

RTSP. To address this problem, RTSP requires that clients and servers support a standard

12

means of feeling each other out in order to determine which non-standard options and

enhancements the other supports.

2.2.5 Flexibility

By design, and unlike some multimedia control protocols that have preceded it, RTSP is

flexible in its support for alternative mechanisms for tasks not directly related to the

control of delivery of multimedia data. In particular, RTSP is agnostic with respect to

such decisions as the choice of transport protocol used to deliver audio and video data

and the representation format used to describe multimedia presentations. The protocol

allows clients and servers to state the formats and protocols they support and arrive at a

mutually acceptable decision.

2.2.6 Ease of Use

RTSP is designed to be easy to implement and use. It is text-based and human-readable,

making debugging clients and servers easier. Its structure is modeled closely after that of

HTTP [4] and MIME [5], allowing the reuse of existing code in new RTSP

implementations. This similarity to HTTP also allows RTSP-based applications to take

advantage of standard extensions to HTTP, such digest access authentication [6] and

PICS [7, 8], a system for associating labels and ratings with content.

13

2.3 RTSP Applications

This section describes some of the applications for which RTSP was designed and

provides some insight into how particular features of the protocol are utilized in various

settings.

2.3.1 Video-on-Demand

Video-on-demand is one of the core applications RTSP was designed to support. Through

its basic command set, RTSP allows for the creation of sophisticated streaming video

applications with support for many advanced features, including:

• VCR-style control over the delivery and playback of multimedia,

• the ability to start playback at an arbitrary point within a presentation,

• independence from specific transport mechanisms and media types,

• interoperation of clients and servers from different vendors,

• tight integration with web browsers, and

• pay-per-view and other logging and billing methods.

2.3.2 Live Broadcasts

RTSP provides many of the same benefits to live audio and video applications that it

brings to on-demand systems. Administrators can easily use the protocol’s authentication

features to construct secure commercial systems based on RTSP-aware clients and

servers, and its support for varying transport mechanisms makes it feasible to support

14

both small-scale events in which the data stream is unicast directly to clients and large-

scale, high-profile broadcasts supporting many thousands of clients via IP Multicast.

Broadcasts can also combine the distribution methodologies to take advantage of the

bandwidth efficiency of multicast transmission while allowing users whose access

providers don’t support multicast traffic to participate.

2.3.3 Near Video-on-Demand

In addition to simple live and on-demand applications, RTSP supports near-on-demand

delivery, an amalgamation of the two approaches discussed in [9]. In this usage scenario,

a multimedia presentation is multicast several times at staggered intervals. This allows

for the use of IP multicast so that bandwidth is utilized efficiently, while maintaining

some of the flexibility and convenience of on-demand services. The multicast addresses

used by the staggered multicasts can be determined dynamically by the RTSP server, so

clients automatically receive the most recently started signal.

2.3.4 Virtual Presentations

Moving beyond the realm of simple streaming media applications, RTSP facilitates the

creation of virtual presentations incorporating live, stored and interactive multimedia.

The protocol’s support for playback of arbitrary segments of clips and controlling the rate

of playback makes it easy to integrate several segments into a single presentation, and the

ability to access seamlessly multiple servers simplifies the creation of interactive works

incorporating disparate types of multimedia content.

15

2.3.5 Conferencing and Telephony

RTSP can also be utilized in network-based conferencing and telephony applications.

Although it does not provide the signaling functionality required of the protocols that

form the foundation of these applications – those functions are left to dedicated protocols

like H.323 [10] and SIP [11] – RTSP can be used to play prerecorded content into an

active call or to record a conference for later retrieval and playback. In addition, RTSP

can be used as the basis for IP-based voice mail and menu systems.

2.3.6 Distributed Digital Editing

RTSP’s explicit support for recording operations, together with the provisions it makes

for frame-level timing, make it possible to implement distributed digital editing systems

for multimedia. Such a system could make use of multiple networked playback and

recording devices coordinated by an RTSP-based software application.

2.4 Related Technologies

2.4.1 Hypertext Transport Protocol

Developed in 1990 at the European Laboratory for Particle Physics (CERN) in Geneva,

Switzerland, the Hypertext Transfer Protocol (HTTP) provides access to a vast collection

of inter-linked Internet resources, primarily consisting of graphics and text [12], called

the World Wide Web (WWW, or web). Since the release of the NCSA (then the National

Center for Supercomputer Applications) web server software and graphical browser,

16

Mosaic, the web has experienced phenomenal rates of growth in size and usage. In

February 1994, the NCSA web server [13] handled one million requests per week; by

December of the same year, its load had grown to four million per week [12]. In

September 1998, some estimates placed the number of users of the web at 39 million

people [14].

As the web’s usage and accessibility grew, content providers sought to enrich their pages

by moving beyond hypertext and graphics and incorporating new media, including audio

and video, into their web sites. In November 1995, the web was estimated to contain over

eleven million distinct resources hosted on more than 225,000 servers [15]. At that point,

there were approximately 36,000 video files available on the web [16], and although

audio and video accounted for only 1% of the requests handled by NCSA’s server, they

were responsible for 28% of the bytes transferred.

Many early efforts aimed at integrating audio and video into the web made use of HTTP

to retrieve audio and video data in the same fashion as text and images. Several

difficulties arise with this approach. First, HTTP is oriented towards whole-file

downloads: the simplest way to access video via the web is to download an entire video,

storing it locally on the user’s hard disk before playback begins. Because of the large size

of files containing video data, however, this often results in a long wait between the time

a user decides to view a video and the time it actually begins playing. Various “fast start”

solutions have been adopted to address this issue [17], but they do not solve a more

fundamental problem: HTTP’s use of TCP for data transport. As described in Section

17

2.4.5 below, TCP is inappropriate for transport of audio and video data because it

introduces jitter, degrades picture quality by making sure each and every data packet is

delivered in order, and imposes inappropriate flow control on the data stream [18].

In addition, HTTP itself lacks features that are useful in the context of providing

multimedia services; the protocol includes a very limited command set and requires that

servers handle requests in a stateless fashion. As a consequence, there is no mechanism

for clients to retrieve descriptions of available resources in standard formats; the

transmission of a data stream cannot be paused and resumed in place at the user’s

request; the parameters used to encode and transmit the data stream cannot be changed on

the fly; and clients provide fine-grain, time-based control over the portions of the data

stream that it receives. Newer revisions of the HTTP specification provide support for

partial downloads [19], but at the level of byte ranges, which is inappropriate for

multimedia due the fact that many encoding formats make it difficult or impossible to

map time offsets to byte ranges without processing and decoding the entire data file.

As a result of these issues, HTTP enjoys a complementary relationship with RTSP. Web

pages served via HTTP can incorporate video accessed via RTSP, and HTTP can be used

to access meta-data such as session descriptions for use within RTSP-based applications.

HTTP is an Internet Engineering Task Force standard protocol described in full in a

series of IETF Request for Comments documents [4, 19].

18

2.4.2 PNA

PNA is the control and transport protocol used by the popular RealAudio, RealPlayer,

and RealServer streaming multimedia applications until the release of RealSystem G2,

comprising RealPlayer G2 and RealServer G2, in November 1998. PNA is a proprietary

protocol and as such, the full specification is not publicly available, although Real

Networks has released enough information to enable firewall and proxy software authors

to incorporate support for recognizing and relaying PNA traffic. With the release of Real

System G2, PNA has been replaced by RTSP (for control traffic) and RTP (for

multimedia data), though the client and server components continue to support PNA for

the sake of interoperability with the installed base of servers and clients.

2.4.3 H.323

Approved in 1996 by Study Group 15 of the International Telecommunications Union,

ITU Recommendation H.323 [10] describes protocols and components for systems that

provide conferencing and telecommunications services over packet-based networks. The

H.323 Recommendation encompasses standards for call control, multimedia and

bandwidth management and interfaces between packet-based communications systems

and circuit-switched networks. H.323 is one of a series of recommendations, known as

H.32X, that propose standards for conferencing applications over a variety of networks,

notably ISDN and the standard voice telephone network, which are addressed by

Recommendations H.320 and H.324, respectively [20]. Intended to provide

interoperability among compliant products and devices, H.323 has garnered considerable

19

support among vendors in the computing and telecommunications industries. Among

others, Intel, Microsoft Corporation and Netscape Communications Corporation have

pledged to support the standard in their products.

An H.323-based conferencing and communications system comprises a collection of

components, each of which provides a distinct service to the other components of the

system. The types of components defined in the Recommendation are as follows:

Terminals, which constitute the end-user endpoints of H.323 conferences; Gateways,

which provide interfaces between H.323-based systems and other communications

architectures; Gatekeepers, which provide access control services; and Multipoint Control

Units, which work with Multipoint Processors and Multipoint Controllers to enable

conferences involving more than two parties. These components are logical, rather than

physical, constructs; a single piece of hardware might contain one or many H.323

components. In addition, not all environments require every type of component; in the

simplest case, that of a two-party, endpoint-to-endpoint call within a single local area

network, terminals may be all that is required.

In addition to describing the hardware components that make up an H.323 system, the

Recommendation specifies communications protocols and encoding mechanisms through

which the components interact. These include a number of other ITU recommendations,

including standards for audio and video encoding algorithms (H.261, H.263, G.711,

G.722, G.728, G.729, MPEG1, G.723.1, etc.) and protocols for the transmission of binary

data within conferences (T.120), packetization of media streams for transmission over

20

computer networks (H.225), capability negotiation and channel allocation (H.245) and

call signaling (Q.931). The Recommendation specifies the use of Internet standard

protocols, including RTP, RSVP and IP Multicast, to enable the transmission of audio

and video data over IP-based networks. In addition, H.323 defines a protocol,

Registration/Admission/Status (RAS), used to control allocation of network resources,

regulate access to local and remote systems, and perform address translation

A more in-depth description of the elements of an H.323 system and the ways in which

those elements interact using the above-mentioned protocols and standards follows.

Terminals: Terminals represent the endpoints of an H.323 conference or call; they

provide for real-time, two-way communications between the user and other parties

involved in the call. The specification requires that all compliant Terminals support audio

encoded using one of the standard mechanisms; optionally, Terminals can also support

video and data conferencing. Encoded audio and video are packetized as dictated by

H.225 and transported via RTP, whereas data is encoded and transmitted as specified in

T.120. The encoding algorithms and data types to be included in a call are negotiated,

along with other considerations, such as the bitrates of the data streams generated by each

party, as described in H.245. Terminals must also support RAS in order to communicate

with Gatekeepers and may optionally incorporate a Multipoint Control Unit, described

below, in order to facilitate participation in multi-party conferences.

Gateways: H.323 Gateways enable H.323-enabled devices and software to communicate

with other ITU-compliant conferencing terminals residing on circuit switched networks

21

such as ISDN and the public telephone network. Providing this functionality entails

translating between different transmission formats, call control mechanisms, and

encoding formats, as well as establishing and maintaining connections on both of the

involved networks. Terminals communicate with Gateways using the H.245 and Q.931

call setup and signaling protocols.

Gatekeepers: H.323 systems can optionally include a Gatekeeper component that

provides call control services that help preserve the integrity and usability of the network

on which the system resides. The simplest of these services is translation of locally-

known aliases for H.323 endpoints and gateways into transport-level network addresses;

in systems including a Gateway, a Gatekeeper component is also used to translate

incoming E.164 (ISDN) addresses into the corresponding packet network addresses.

In addition to performing address translation, Gatekeepers control conferencing

endpoints’ access to the network. Because Terminals and other endpoints are required to

make use of a Gatekeeper if one is present in a system, Gatekeeper components can limit

the number of simultaneously active calls, thereby managing the amount of network

bandwidth consumed by conferencing applications. Furthermore, Gatekeeper can provide

call authorization functionality; for example, a Gatekeeper could enable or disable calls

to or from certain endpoints or restrict users’ ability to make calls to certain hours of the

day. The vendor of the hardware including the Gatekeeper determines the actual criteria

used to determine whether a call is allowed or disallowed.

22

Finally, a Gatekeeper component can also be used to facilitate the ad hoc creation of

multipoint conferences. When a point-to-point call is established between two parties, the

Gatekeeper for one of the parties can be configured to receive the H.245 signaling

information associated with the call. No processing of the H.245 data is required of the

Gatekeeper; it simply passes the data from one endpoint to the other. Then, when one of

the participants elects to expand the conference to include a third party, the Gatekeeper

directs the H.245 traffic to a Multipoint Controller as well, which then establishes the

parameters under which the expanded call will operate.

Gatekeepers provide their services to Terminals and other endpoints via the RAS

protocol.

Multipoint Control Unit: Like Terminals, Multipoint Control Units (MCU) are

endpoints – entities which users can call directly. A Multipoint Control Unit consists of a

Multipoint Controller and some number of Multipoint Processors; these components

work together to enable conferences involving three or more parties. Multipoint Control

Units can be located within Gateways and Gatekeepers, but in such an instance

implement wholly separate functions that merely happen to be located in the same piece

of equipment.

H.323 supports two types of multipoint calls, referred to in the recommendation as

centralized and decentralized conferences. In centralized conferences, all the Terminals

or other endpoints involved transmit audio, video, data and control information directly to

a Multipoint Control Unit. The MCU is responsible for multiplexing the incoming audio

23

streams, enabling endpoints to select a video feed in which they’re interested, and

distributing the appropriate streams to the participants. In contrast, in decentralized

conferences, participants use IP Multicast or a comparable technology to simultaneously

deliver audio and video content to all of the participants in a call, while call signaling and

shared data such as whiteboard information are still processed in a centralized fashion by

a Multipoint Control Unit.

Multipoint Controller: A Multipoint Controller resides within a Multipoint Control Unit

and implements control functionality that enables conferences involving three or more

parties. Multipoint Controllers use H.245 to determine the capabilities of each endpoint

involved in the conference and inform the participants of the encoding formats and

communications parameters acceptable to the group as a whole; this set of capabilities is

revised as users join or leave the call.

Multipoint Processors: The remaining component of a Multipoint Control Unit, a

Multipoint Processors (MP) is responsible for transforming the audio and video data

streams generated by participants in a centralized multipoint conference into the

appropriate form and returning the resulting streams to the connected parties. The

processing performed by an MP typically consists of some combination of switching and

mixing of the incoming data, and can also entail converting audio or video data into

alternative formats for display or playback on terminals that can’t decode the primary

format in use within the call.

24

Although H.323 is designed for advanced communications services, it can also be used

for simple media playback applications like those at which RTSP is targeted; Microsoft

Corporation’s NetShow and Media Player streaming video applications use H.323 in

exactly this fashion. Using H.323 for simple media playback and recording operations is

not without disadvantages, however. The ITU Recommendation that defines H.323 is

long and complex, making the specification difficult to implement [21]; likewise, the

protocols that make up the specification are themselves complex, resulting in higher call

setup latencies than are possible with a more lightweight protocol like RTSP. Finally, the

complex nature of H.323 results in more complicated and larger implementations, which

makes them inappropriate for some use in some applications, such as within a

downloadable Java applet.

2.4.4 Session Initiation Protocol

Like H.323, the Session Initiation Protocol (SIP) is a signaling protocol that enables the

creation of point-to-point and multi-party conferences and allows users to invite servers

and other users to participate in active calls [11]. SIP is designed as a more lightweight,

flexible and modular alternative to H.323, with a lineage consisting of standard Internet

protocols like HTTP, RTP and RTSP and without the baggage of the circuit-switched

ISDN protocols upon which H.323 is based. The creator of SIP provides a detailed

comparison of the two architectures in [21]; the protocol itself is a work-in-progress

described in draft form in an Internet Draft [22]. A brief synopsis of the design and

underlying philosophy of SIP follows. A comprehensive description of an Internet

25

telecommunications architecture based on SIP and related protocols like RTP and RTSP

is presented in [23].

One of the primary objectives of the SIP design is simplicity: the protocol is intended to

be easy to parse and debug. This objective is accomplished by using much of the syntax

of HTTP, extending that protocol to allow bi-directional messaging. This enables the

reuse of existing code for parsing HTTP-style messages and the multitude of HTTP

extension mechanisms, and the fact that the protocol is text-based facilitates debugging

SIP-based applications. In contrast, H.323 messages are encoded in binary form using

relatively complicated ASN.1 packed encoding rules.

The simplicity of SIP’s design also extends to the semantics of the messages exchanged

by participants in a conference. Rather than requiring applications to utilize a Byzantine

collection of interrelated protocols in order to provide communications services, SIP is

based on a small number of orthogonal commands that can be composed to provide high-

level call signaling functionality.

SIP’s simplicity brings with it a measure of focus and modularity to the specification.

The protocol’s scope extends solely to call setup and control functions; the mechanisms

through which features like service discovery and quality-of-service are provided are left

unspecified, so any of a number of alternative approaches can be utilized. The protocol

leverages existing Internet infrastructure, such as the Domain Name System and

electronic mail address formats, where possible rather than invent new solutions to old

problems. Furthermore, the features provided by SIP are not interdependent; for example,

26

an application might use SIP’s call setup functionality to locate the target of a call and

then use H.323’s to establish and maintain the call.

Extensibility and scalability are the other main goals of SIP’s design. Whereas H.323 can

be extended to support application- and vendor-specific functionality primarily through

nonstandardParam fields included in specific locations in its protocols’ grammars, SIP

builds on the mechanisms used in Internet protocols like HTTP and Simple Mail Transfer

Protocol (SMTP) [24] to provide significantly more flexibility. SIP’s architecture allows

clients to specify the exact features they require and for servers to accept or reject

requests based on their support for clients’ needs. In addition, SIP allows implementers to

add significant functionality to the protocol while preserving compatibility with existing

implementations: by default, SIP agents ignore unknown headers within requests and

reply messages, so older implementations can handle messages from applications

supporting extensions to the protocol transparently.

SIP’s extensibility also extends to the critical subject of audio and video encoding

formats. The primary mechanism through which information about encoding formats is

conveyed in SIP is SDP, which uses textual names to identify the codecs the participants

in a SIP session can understand. These names can be registered by individuals or groups

with the Internet Assigned Numbers Authority (IANA), which provides contact

information for the registrant to interested parties so that implementers can incorporate

support for any registered format in their applications. The H.323 Recommendation, on

the other hand, mandates that supported codecs must be centrally registered and

27

standardized with the ITU. As of July 1998, the only encoding formats approved for use

were ITU-developed, and many of them incorporated significant amounts of proprietary

intellectual property, making developing inexpensive H.323-based systems a challenging

proposition.

Finally, SIP improves upon H.323 in its utility for large-scale communications systems

by dint of its support for stateless and multicast signaling. Use of stateless call processing

enables SIP gateways and servers to handle a larger number of calls by reducing the

amount of memory and processing overhead associated with setting up and maintaining a

call. Multicast signaling allows SIP conferences to scale transparently from two to a

multitude of participants by removing the need for a central location at which all call

processing occurs.

Although it is possible to provide multimedia playback and recording services via SIP,

the signaling functionality that is the primary focus of the specification is superfluous in

these kinds of applications. RTSP, which is geared exclusively towards providing access

to stored and live multimedia data streams, is more appropriate. In fact, RTSP and SIP

are designed to complement each other nicely: SIP-aware conferencing systems can use

RTSP to enable features like voice mail, recording of conferences for future retrieval in

on-demand fashion, and playback of previously recorded material into an active call.

28

2.4.5 Real-Time Transport Protocol

One of the key challenges of providing advanced multimedia services over TCP/IP-based

Internets is delivering audio, video and other forms of data to viewers and listeners

quickly and efficiently, so that playback can begin promptly and proceed smoothly,

without delays or artifacts. The Real-Time Transport Protocol (RTP) is designed to

facilitate delivery of data with real-time characteristics, like audio and video, providing

functionality such as payload type identification, sequence numbering, source

identification, timestamping, and receiver-generated feedback for quality of service

monitoring. RTP is discussed at a high level in [9] and in detail in a standards track

Internet Engineering Task Force RFC [25].

Internet and web-based multimedia applications are well suited to a specialized protocol

like RTP, rather than the more common HTTP, described above, because of the network

performance requirements imposed by the characteristics of continuous media. In order to

play back high quality audio or video, multimedia-enabled clients and servers depend on

the network to provide predictable, if not necessarily minimal, delay. In addition, it is

typically unnecessary for every last bit of multimedia data streams to be delivered intact;

the delays and artifacts caused by retransmitting lost data packets are often more

noticeable than the effects of ignoring them altogether. For these reasons, reusing

existing protocols like HTTP, which make use of TCP as the underlying transport

mechanism, for multimedia data is usually inappropriate. Because it is designed to deliver

data packets reliably, TCP wastes time retransmitting dropped data packets, even when

the application would be better served by ignoring them. Furthermore, TCP provides

29

windowing and congestion control mechanisms that, though effective for many data

streams, can introduce latencies that obscure the temporal relationships between packets

in continuous media streams. Finally, TCP-based protocols cannot take advantage of IP

Multicast, described in section 2.4.7, which makes them ill-suited for large-scale

conferencing and live multimedia applications [9, 18, 26].

The origins of RTP can be found in earlier work on NVP[27] and PVP[28]. The protocol

is designed to serve as a common basis for a variety of real-time continuous media

services; RTP is application-independent, with application-specific profiles, or

specializations, providing additional functionality needed for particular real-time

applications. In a sense, the RTP specification is incomplete: in addition to the basic

protocol, an application must incorporate an appropriate profile and payload formats in

order to make effective use of RTP. The profile for audio and video conferences is

described in [29].

Because RTP is designed to be application-independent, implementations can be

packaged into reusable code libraries and incorporated directly into clients, servers, and

associated tools. Thus, protocol processing is performed at the application level, rather

than in a separate layer. This approach, called application level framing and integrated

layer processing, is described in [30]. In addition, the application-independence of RTP

also enables the creation of generic tools for monitoring, tracing and providing quality-

of-service information about real-time traffic without regard for the specific application

that generated it.

30

RTP traffic is divided into data packets and control packets. Continuous media data is

carried in data packets, while information about the performance of the network and other

non-continous data are communicated via the control mechanism. When, as is typically

the case in IP-based environments, both data and control information are carried in UDP

packets, sequential port numbers are used, with the lower, even-numbered port the target

for packets containing multimedia data and the higher, odd-numbered port used as the

destination for control packets.

RTP data packets begin with a number of fixed headers that provide information common

across the spectrum of applications supported by RTP. These include a sequence number,

payload type, timestamp, and identifiers specifying the source or sources of the data

contained in the packet. The timestamp is a thirty-two-bit value whose meaning is

dependent on the profile in use and the payload type; source identifiers are also thirty-two

bits in length, but do not correspond to a particular address format, such as four-byte IP

addresses. Rather, individual sources select random identifiers when they first generate

data to be transmitted via RTP, and conflicts are detected and eliminated as they occur.

The fixed header data is followed immediately by whatever profile-specific headers are

required by the profile for a particular application; these headers, if present, are then

followed by data of the variety specified in the payload type header field.

As described above, RTP control packets are used to convey information regarding the

level of service being provided to users by the network and real-time application. In

addition, RTP control packets can be used to communicate further information about the

31

party generating RTP traffic and to establish rough synchronization relative to wall clock

time among the participants in a live event or conference.

The format and communications parameters for RTP control packets are defined as a

subsidiary protocol within the RTP specification [25] called the RTP Control Protocol

(RTCP). RTCP packets are similar in structure to RTP data packets, but are usually

carried over a separate transport-level “connection.” The parties using an RTP-based

application periodically multicast RTCP packets to the other participants in order to

provide feedback about their state even when no real-time traffic is being generated.

Both producers and consumers of RTP data must distribute RTCP packets containing

sender or receiver reports. Receiver reports contain information useful to senders such as

the highest sequence number yet received; timestamps, which allow senders to compute

round-trip times; and a measure of jitter, or the variance in packet arrival times within a

data stream. Sender reports enable listeners to estimate the actual data rate of the real-

time data stream and establish a relationship between the timestamp values in the RTP

data packets they are receiving and wall clock time. Sender reports also contain

additional information identifying generators of RTP data above and beyond the thirty-

two-bit identifier included in RTP data packets. This information allows recipients to

provide readable names for senders to users and to maintain sender-specific information

when identifier conflicts occur.

Participants in a conference or broadcast decide how often to send RTCP messages based

on they bandwidth consumed by the data they are sending or receiving, thereby limiting

32

the amount of control traffic to a known, fixed percentage of the overall load. A value of

5% is suggested in the RTP specification, but the establishment of a mandatory fraction is

left to the creators of profiles. So that newcomers to a session can quickly identify the

parties generating RTP traffic, senders of real-time data are collectively allocated a

quarter of the fraction of the load assigned to control traffic. The remaining three-quarters

are split evenly among all the receiving participants. The specification includes an

algorithm for determining the appropriate interval at which RTCP reports should be

generated based on the available bandwidth and the number of participants [25].

RTP and RTSP complement each other nicely, with the former providing transport

services for networked multimedia applications and the latter providing control

functionality for those same applications. This is somewhat unsurprising, considering that

both protocols are defined in standards-track Internet RFC’s that share key contributors.

Although RTSP is designed to be neutral with respect to transport mechanisms, RTP is

the only multimedia transport protocol for which a means of specifying its use is

described in the RTSP specification [2].

2.4.6 Video Datagram Protocol

Like RTP, the Video Datagram Protocol (VDP) is a transport protocol geared to the

delivery of continuous media data over computer networks. Unlike RTP, VDP is

designed to take advantage of the point-to-point connections between clients and servers

in order to provide VCR-style control functionality, such as play, pause, rewind and fast-

forward commands, and to enable optimized data transmission appropriate to the levels

33

of network congestion and CPU utilization experienced by the participating systems.

VDP was developed by researchers at the University of Illinois in 1995 and 1996, and is

described in a paper presented at the Fourth International World Wide Web Conference

[18] and in a U.S. patent application [31]. Motivated by the difficulties posed by the use

of HTTP for the transmission of multimedia data in Internet applications, VDP is

designed to address the variability of Internet performance and client load, while

providing real-time, on-demand delivery of audio and video streams.

VDP is an asymmetric protocol involving two endpoints, a server and a client, which

communicate using two distinct data channels. The first is a reliable control channel used

by the client to convey user commands and connection management information to the

server; the second is an unreliable channel used for the transmission of multimedia data

from the server to the client and non-critical feedback in the opposite direction. This

feedback mechanism provides VDP-based systems with the ability to dynamically adjust

the data stream in order to accommodate changes in network performance and client CPU

utilization.

During playback, a VDP client measures performance by estimating packet round-trip

times and monitoring the rate at which the system is displaying received video frames

and the percentage of incoming data packets which are being lost in transit. In the event

the client system is unable to display the video stream due to insufficient CPU power or a

sufficiently high percentage of data packets are being lost, the VDP software generates a

feedback message which is sent from the client to the server over the unreliable data

34

channel. Upon receipt of such a message, the server addresses the situation by thinning

the data stream, reducing the frame rate of the video feed sent to the client and as a result,

the amount of network bandwidth and processing power necessary to deliver and display

the content. Should performance continue to suffer, additional feedback messages are

generated that trigger further thinning of the data stream. Conversely, should the

measured levels of packet loss or CPU utilization drop to more acceptable levels, the

client generates a feedback message that instructs the server to restore the data being

removed from the data stream. In this fashion, VDP clients and servers adapt to changing

conditions by altering the data stream as needed to ensure that video quality degrades

gracefully in the presence of problematic phenomena.

In addition to enabling video clients and servers to adapt dynamically to difficult

situations, VDP incorporates a demand re-send algorithm that improves the quality of

video encoded in media formats that include inter-frame dependencies, such as MPEG.

Because the communications channel used by VDP for video data is unreliable, it is

possible that a packet containing frame data upon which subsequent or previous frames

depend could be lost. In this event, VDP clients can request that the lost packet be re-

sent; it is the responsibility of the client to maintain an internal data buffer sufficiently

large to ensure that the re-sent frame arrives in time to be displayed and to identify which

frames are important enough to re-send based on the media format in use.

The VCR-style control functions supported by VDP overlap significantly with the scope

of RTSP’s functionality. As RTSP is designed to function independently of a designer’s

35

choice of transport protocol, it would be possible to construct a system that utilized RTSP

to convey control information and VDP to carry audio and video data, but the value of

doing so might be somewhat limited. RTP is presently in more widespread use than VDP,

is implemented in a wider variety of applications and ancillary tools, and offers support

for a broader base of media encoding formats. VDP’s ability to allow applications to

adapt to changes in the environment is critical, however; similar functionality might be

incorporated into RTP-based applications through the RTCP receiver report mechanism.

Further information about VDP and related research, including discussion of a

comprehensive architecture for structuring streamed multimedia presentations and

enhancements to VDP supporting frame-level addressing and in-line hyperlinking, is

presented in [32].

2.4.7 IP Multicast and the MBONE

In the simplest of network-based, real-time multimedia distribution systems, user clients

contact a server to request access to the feed and the multimedia data stream is sent

directly to them over the network. If 1,000 clients are interested in a particular broadcast,

1,000 identical copies of the stream are generated at the server and distributed over the

network simultaneously. Thus, when many users desire to access a live multimedia

broadcast, it is more efficient to generate a single data stream instead, which is then

replicated as needed when the network paths to recipients of the broadcast diverge [9].

The former approach is called unicasting; the latter, multicasting.

36

An early approach to multicasting over the Internet was ST-II [33]. ST-II utilized a

sender-oriented approach, in which each sender established a set of appropriate

connections based on the currently active recipients. Unfortunately, this approach is

unsuitable for large-scale distribution of streaming multimedia, because each recipient

must inform every potential sender of his participation in order to be added to the list of

active endpoints [9].

To address this problem, IP Multicast, defined as an Internet standard in [34] and

extended in [35], takes a receiver-oriented tack. Clients interested in receiving a multicast

advertise that fact using the Internet Group Management Protocol (IGMP). A multicast-

aware router on the client’s local network receives the advertisement, notes the client’s

interest, and uses the Distance Vector Multicast Routing Protocol (DVMRP) to

communicate the client’s interest to other multicast routers on the Internet. Multicast-

aware routers use DVMRP to ensure that multicast data is routed in such a way that it

reaches all interested parties without unnecessarily congesting networks that don’t

contain participating clients. Because the distribution and replication of multicast data is

handled transparently by cooperating routers, there is no need for senders to maintain a

record of all participating clients. As a consequence, the IP Multicast approach is far

more scalable and robust in environments with large numbers of listeners that can come

and go with relative frequency. However, IP Multicast’s reliance on router intelligence is

also a hindrance, as a significant fraction of Internet backbone routers are not multicast-

aware [9].

37

Because so much of the Internet does not support IP Multicast, the MBONE, or Multicast

Backbone, was created. The MBONE is an overlay network that connects islands of

networks that support multicast by encapsulating multicast packets in normal, unicast

UDP packets and transmitting them over the unsuspecting Internet. Described in depth in

[36, 37], the MBONE connects multicast routers using unicast tunnels over which

multicast data and DVMRP traffic is passed.

Although a control protocol like RTSP is not required in order to make use of IP

Multicast, there are number of ways in which RTSP and IP Multicast can be used

effectively together. For example, using RTSP enables multicast sessions to be named via

a simple URL, which client applications can use to obtain a full specification of a

session’s parameters. This mechanism can also be used to control access to multicast

sessions, using RTSP to authenticate users and provide them with the encryption keys

necessary to decrypt a private multicast. In addition, an RTSP server providing on-

demand audio or video services can be used to play stored multimedia clips into an IP

Multicast session already in progress or to record its contents for later viewing. The

participants in such a conference can use RTSP clients as a virtual remote control,

stopping and starting playback or recording as needed.

2.4.8 Session Description Protocol and Session Announcement Protocol

Described fully in [38], Session Description Protocol (SDP), is designed to enable the

announcement of the existence of multimedia broadcasts or conferences to clients and to

convey the information necessary for interested parties to participate in such a session.

38

An SDP description encapsulates such information as the name and purpose of a

multimedia session; the time or times during which the session will be active; the types of

media included in the session, and the formats in which they are encoded; and addressing

information such as URL’s, ports and internet addresses that describe how clients should

obtain access to the session. In addition, descriptions can include additional information,

such as contact data for the individual responsible for administrating the session and

details of the resource requirements of the broadcast or conference.

SDP is simply a data format, despite its name; no mechanism for transporting a session

description is described in the specification. Rather, SDP descriptions are carried over

other protocols, such as RTSP, SAP, and MIME-based electronic mail. Furthermore, SDP

session descriptions are text-based; this facilitates portability, the encapsulation of

descriptions within other text-based Internet protocols, and automated generation of

descriptions using scripting languages such as TCL and Perl.

The Session Announcement Protocol (SAP) is a simple protocol used to disseminate

announcements of multimedia sessions over the Internet. Session announcements take the

form of a single UDP packet containing a SAP header and a textual payload, which is a

single SDP session description. Announcements are multicast to a well-known multicast

address and port and can be received by any individual with multicast-capable hardware

and software. SAP is fully specified in an Internet-Draft [39] which has expired, but

should be the subject of an IETF RFC in the near future.

39

SDP is complementary to RTSP. An RTSP server can use SDP to provide a client with a

description of a particular multimedia resource in response to a DESCRIBE request, and

the client can then use the information contained in the description to set up appropriate

encoders and decoders to allow the user to participate in the session. Alternatively, a user

might receive an SDP session description via electronic mail or from some other source.

The description can be provided manually to client software, which can then use RTSP to

initiate transmission and playback of the session. In either case, the mechanism used to

communicate the characteristics of the broadcast or conference (SDP) is distinct from the

mechanism used to control the user’s participation in the session (RTSP). In that SAP

constitutes a mechanism for the distribution of session announcements containing SDP

descriptions to potential participants, RTSP and SAP are complementary as well.

2.4.9 Synchronized Multimedia Integration Language

Synchronized Multimedia Integration Language, or SMIL, is an HTML-like language

designed to allow users to use a text editor to create streaming multimedia presentations

for playback over the WWW. Described in a recommendation of the World Wide Web

Consortium [40], SMIL enables users to assemble multimedia resources to form a

presentation, to describe how the presentation should be displayed on-screen, and to

associate hyperlinks with multimedia objects. The multimedia resources that make up a

SMIL-based presentation are specified as Uniform Resource Identifiers (URI’s) and can

therefore include multimedia content accessible via RTSP as well as other access

40

protocols, like HTTP and FTP. A number of multimedia and network software companies

have promised support for SMIL in current and future products.

2.5 Other RTSP Implementations

2.5.1 Real Networks Reference Implementation

Real Networks Corporation provides a reference implementation of RTSP based on the

July 30, 1997 draft of the RTSP specification; it has not been updated to reflect the final

version of the standard. Released under the terms of the GNU general public license,

documentation and source code for the reference implementation are available from Real

Networks’ web site [41]. The package includes basic client and server implementations,

as well as several sample applications, all written in the C programming language. The

sample applications support playback of audio files in several formats and incorporate a

simple implementation of RTP, a graphical client interface, and server management via a

configuration file.

The reference implementation is intended to server as a test platform for other

implementations rather than as the basis for full-featured applications. As such, the

source code is not designed to be particularly flexible or extensible. In particular:

• the implementa

Documents

© Copyright by James David Wong, 1999choices.cs.illinois.edu/Papers/Theses/MS.Wong.1999.pdf · JAMES DAVID WONG B.A., Rice University, 1994 THESIS Submitted in partial fulfillment