154
© Copyright by James David Wong, 1999

© Copyright by James David Wong, 1999choices.cs.illinois.edu/Papers/Theses/MS.Wong.1999.pdf · JAMES DAVID WONG B.A., Rice University, 1994 THESIS Submitted in partial fulfillment

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • © Copyright by James David Wong, 1999

  • AN EXTENSIBLE FRAMEWORK FOR RTSPAPPLICATIONS

    BY

    JAMES DAVID WONG

    B.A., Rice University, 1994

    THESIS

    Submitted in partial fulfillment of the requirementsfor the degree of Master of Science in Computer Science

    in the Graduate College of theUniversity of Illinois at Urbana-Champaign, 1999

    Urbana, Illinois

  • iii

    Abstract

    With advances in processor performance, memory sizes, storage capacity and

    telecommunications technologies, networked multimedia applications such as streaming

    video have become increasingly popular. Early systems used proprietary protocols to

    transport control information and multimedia data, hindering interoperation between

    d ifferent implementations, or adapted existing protocols, sacrificing quality and

    flexibility. The Real-Time Streaming Protocol (RTSP) addresses these concerns by

    providing a standardized, open architecture for controlling playback and recording

    functionality in multimedia systems. This thesis presents an object-oriented framework

    for constructing applications that make use of RTSP, describes the integration of the

    framework into a pair of multimedia applications, and documents its performance. The

    presentation closes with a discussion of opportunities for future development of the

    architecture.

  • iv

    For Julie

  • v

    Acknowledgements

    I am indebted to Professor Roy H. Campbell for his guidance on the long, sometimes

    strange odyssey that resulted in this thesis.

    I must also offer thanks to my friends and co-workers at Vosaic, particularly Zhigang

    Chen, Drew MacGregor, See-Mong Tan, Chuck Thompson and Miguel Valdez. Their

    encouragement, support, insight and advice have been invaluable.

    Likewise, I am grateful to Joe Shidle, David Hyatt, Rebecca Hyatt, Dan Gaines, Tim

    Fraser, Garry Sittler, Mei-Ling Tong and numerous others for helping me to maintain my

    sense of purpose and perspective and for providing unending motivation.

    Finally, heartfelt thanks to my parents, Richard and Neva Wong, and brother, William

    Wong, without whom, none of this would be possible.

  • vi

    Table of Contents

    Page

    Chapter 1 Introduction .................................................................................................. 1

    1.1 Multimedia on the Web........................................................................................ 2

    1.2 Hardware Technology .......................................................................................... 3

    1.3 Software for Internet Multimedia......................................................................... 4

    1.4 The Framework .................................................................................................... 5

    1.5 Organization ......................................................................................................... 6

    Chapter 2 Background................................................................................................... 7

    2.1 RTSP Evolution.................................................................................................... 7

    2.2 RTSP Features...................................................................................................... 8

    2.2.1 Setup and Control.................................................................................. 9

    2.2.2 Timing and Synchronization ............................................................... 10

    2.2.3 Security................................................................................................ 10

    2.2.4 Extensibility......................................................................................... 11

    2.2.5 Flexibility ............................................................................................ 12

    2.2.6 Ease of Use.......................................................................................... 12

    2.3 RTSP Applications............................................................................................. 13

  • vii

    2.3.1 Video-on-Demand............................................................................... 13

    2.3.2 Live Broadcasts ................................................................................... 13

    2.3.3 Near Video-on-Demand ...................................................................... 14

    2.3.4 Virtual Presentations ........................................................................... 14

    2.3.5 Conferencing and Telephony .............................................................. 15

    2.3.6 Distributed Digital Editing .................................................................. 15

    2.4 Related Technologies ......................................................................................... 15

    2.4.1 Hypertext Transport Protocol.............................................................. 15

    2.4.2 PNA..................................................................................................... 18

    2.4.3 H.323 ................................................................................................... 18

    2.4.4 Session Initiation Protocol .................................................................. 24

    2.4.5 Real-Time Transport Protocol............................................................. 28

    2.4.6 Video Datagram Protocol.................................................................... 32

    2.4.7 IP Multicast and the MBONE ............................................................. 35

    2.4.8 Session Description Protocol and Session Announcement Protocol... 37

    2.4.9 Synchronized Multimedia Integration Language................................ 39

    2.5 Other RTSP Implementations ............................................................................ 40

    2.5.1 Real Networks Reference Implementation.......................................... 40

    2.5.2 RealSystem G2.................................................................................... 41

  • viii

    2.5.3 CERN WrtpVoD ................................................................................. 43

    2.5.4 IBM RTSP Toolkit .............................................................................. 44

    2.5.5 Apple QuickTime................................................................................ 46

    2.5.6 Academic Implementations................................................................. 47

    Chapter 3 RTSP in Detail ............................................................................................ 48

    3.1 RTSP Resources ................................................................................................. 49

    3.2 RTSP Sessions.................................................................................................... 51

    3.3 RTSP Messages.................................................................................................. 52

    3.3.1 Requests .............................................................................................. 54

    3.3.2 Request Methods ................................................................................. 56

    3.3.3 Responses ............................................................................................ 60

    3.3.4 Response Status Codes........................................................................ 62

    3.3.5 Headers................................................................................................ 63

    3.4 In-Line Data ....................................................................................................... 64

    3.5 Extension Mechanisms....................................................................................... 65

    Chapter 4 An Extensible Framework for RTSP Applications................................. 67

    4.1 Implementation Language.................................................................................. 68

    4.1.1 Exception Handling............................................................................. 68

  • ix

    4.1.2 Run-time Type Information................................................................. 69

    4.2 Support Libraries................................................................................................ 69

    4.2.1 GPLib .................................................................................................. 69

    4.2.2 Connection .......................................................................................... 70

    4.2.3 OSLib .................................................................................................. 71

    4.3 Class Models ...................................................................................................... 71

    4.3.1 Overall Class Model............................................................................ 72

    4.3.2 Request Class Model ........................................................................... 73

    4.3.3 Exceptions ........................................................................................... 74

    4.4 Functional Model ............................................................................................... 75

    4.5 Dynamic Models ................................................................................................ 76

    4.5.1 RTSPStream ........................................................................................ 78

    4.5.2 RTSPInlineDataQueue ........................................................................ 79

    4.5.3 RTSPMessageQueue........................................................................... 80

    4.5.4 RTSPReader ........................................................................................ 81

    4.5.5 RTSPDataReader................................................................................. 82

    4.5.6 RTSPMessageReader .......................................................................... 83

    4.5.7 RTSPRequestFactory .......................................................................... 84

    4.5.8 RTSPMissive....................................................................................... 85

  • x

    4.5.9 RTSPInlineData................................................................................... 85

    4.5.10 RTSPMessage ..................................................................................... 86

    4.5.11 RTSPRequest ...................................................................................... 87

    4.5.12 RTSPAnnounceRequest ...................................................................... 89

    4.5.13 RTSPDescribeRequest ........................................................................ 89

    4.5.14 RTSPOptionsRequest.......................................................................... 90

    4.5.15 RTSPPauseRequest ............................................................................. 90

    4.5.16 RTSPPlayRequest ............................................................................... 91

    4.5.17 RTSPRecordRequest ........................................................................... 92

    4.5.18 RTSPSetupRequest ............................................................................. 92

    4.5.19 RTSPTeardownRequest ...................................................................... 92

    4.5.20 RTSPChannelListRequest ................................................................... 93

    4.5.21 RTSPUserListRequest......................................................................... 94

    4.5.22 RTSPGetUserDataRequest.................................................................. 94

    4.5.23 RTSPUpdateUserDataRequest............................................................ 94

    4.5.24 RTSPResponse .................................................................................... 95

    4.5.25 RTSPHeader........................................................................................ 96

    4.5.26 RTSPTransportHeader ........................................................................ 97

    4.5.27 RTSP Exception Subclasses................................................................ 98

  • xi

    4.6 Programming Interfaces ................................................................................... 100

    4.6.1 RTSPStream Interface....................................................................... 100

    4.6.2 RTSPMessage Interface .................................................................... 101

    4.6.3 RTSPRequest Interface ..................................................................... 102

    4.6.4 RTSPResponse Interface................................................................... 103

    4.6.5 The RTSPHeader Interface ............................................................... 104

    Chapter 5 Applications .............................................................................................. 105

    5.1 The Vosaic Reflector........................................................................................ 105

    5.1.1 Role of RTSP .................................................................................... 106

    5.1.2 Integration ......................................................................................... 107

    5.2 Vosaic IP Hoot ................................................................................................. 109

    5.2.1 Role of RTSP .................................................................................... 110

    5.2.2 Integration ......................................................................................... 111

    5.3 Performance Data ............................................................................................. 112

    5.3.1 Test Environment .............................................................................. 114

    5.3.2 Experiments and Results ................................................................... 115

    5.3.3 Analysis ............................................................................................. 119

    Chapter 6 Framework Evolution .............................................................................. 121

  • xii

    6.1 Implementation Issues...................................................................................... 121

    6.1.1 Use of Standard C++ Features .......................................................... 122

    6.1.2 Efficiency Concerns .......................................................................... 122

    6.1.3 RTSPHeader Support ........................................................................ 123

    6.1.4 Protocol Logic ................................................................................... 124

    6.2 Architectural Enhancements............................................................................. 124

    6.2.1 Client Interface .................................................................................. 125

    6.2.2 Server Interface ................................................................................. 127

    Chapter 7 Conclusions ............................................................................................... 132

    References .................................................................................................................... 134

  • xiii

    List of Tables

    Table 3.1 Response Classes.....................................................................................................62

    Table 5.1: Experimental Results ...........................................................................................115

  • xiv

    List of Figures

    Figure 4.1: Overall Class Model .............................................................................................72

    Figure 4.2: Request Hierarchy.................................................................................................73

    Figure 4.3: Exception Hierarchy .............................................................................................74

    Figure 4.4: Functional Model..................................................................................................75

    Figure 4.5: RTSPStream Dynamic Model ..............................................................................77

    Figure 4.6: RTSPInlineDataQueue Dynamic Model ..............................................................79

    Figure 4.7: RTSPMessageQueue Dynamic Model .................................................................80

    Figure 4.8: RTSPReader Dynamic Model ..............................................................................81

    Figure 4.9: RTSPDataReader Dynamic Model.......................................................................82

    Figure 4.10: RTSPMessageReader Dynamic Model ..............................................................83

    Figure 4.11: RTSPRequestFactory Dynamic Model...............................................................84

    Figure 4.12: RTSPMissive Dynamic Model ...........................................................................85

    Figure 4.13: RTSPInlineData Dynamic Model.......................................................................86

    Figure 4.14: RTSPMessage Dynamic Model..........................................................................87

    Figure 4.15: RTSPRequest Dynamic Model...........................................................................88

    Figure 4.16: RTSPAnnounceRequest Dynamic Model ..........................................................89

    Figure 4.17: RTSPDescribe Request Dynamic Model............................................................90

  • xv

    Figure 4.18: RTSPPlayRequest Dynamic Model....................................................................91

    Figure 4.19: RTSPSetupRequest Dynamic Model..................................................................92

    Figure 4.20: RTSPChannelListRequest Dynamic Model .......................................................93

    Figure 4.21: RTSPResponse Dynamic Model ........................................................................95

    Figure 4.22: RTSPHeader Dynamic Model ............................................................................96

    Figure 4.23: RTSPTransportHeader Dynamic Model.............................................................97

    Figure 4.24: Exception Class Dynamic Model .......................................................................98

    Figure 4.25: RTSPStream Interface ......................................................................................100

    Figure 4.26: RTSPMessage Interface....................................................................................101

    Figure 4.27: RTSPRequest Interface.....................................................................................102

    Figure 4.28: RTSPResponse Interface ..................................................................................103

    Figure 4.29: RTSPHeader Interface ......................................................................................104

    Figure 5.1: Reflector Main Loop...........................................................................................107

    Figure 5.2: Reflector Class Model ........................................................................................108

    Figure 5.3: Reflector Functional Model ................................................................................109

    Figure 5.4: IP Hoot Class Model...........................................................................................112

    Figure 5.5: IP Hoot Functional Model ..................................................................................113

    Figure 5.6: Test Request........................................................................................................116

    Figure 5.7: Test Response .....................................................................................................117

  • xvi

    Figure 6.1: Revamped Client Interface .................................................................................126

    Figure 6.2: Revamped Server Interface.................................................................................128

  • 1

    Chapter 1 Introduction

    As improving hardware and software technology has made the Internet a more accessible

    and rich environment, considerable effort has been expended on attempts to broaden the

    scope of the World Wide Web to incorporate audio and video in addition to still images

    and text. A number of software architectures and network protocols have been created to

    address the difficulties that arise in attempting to do so. The Real Time Streaming

    Protocol (RTSP) is one such protocol, an open standard intended to serve as a common

    language for continuous media clients and servers; RTSP is a simple, extensible control

    protocol for managing the playback and recording of multimedia data streams over

    computer networks. This thesis presents the specification and design of an object-oriented

    framework for constructing applications that make use of RTSP to provide access to

    continuous media services. The framework described herein facilitates the creation of

    applications that provide networked multimedia functionality by adopting the same

    principles of simplicity and flexibility that underlie the protocol’s design and manifesting

    them in the form of a collection of collaborating objects and classes that provide the

    functionality necessary to implement RTSP.

  • 2

    1.1 Multimedia on the Web

    The motivation for the RTSP framework, and of the protocol itself, lies in the growth of

    the Internet’s popularity and usage during the 1990’s. With the availability of

    inexpensive, high-speed modems and the release of the NCSA Mosaic web browser,

    usage of the Internet has exploded, particularly among groups that previously had neither

    access to the Internet nor need for such access. The dramatic increase in the exposure and

    use of the Internet has qualified it as a mass medium, analogous to television, radio and

    newspapers, while the unique aspects of the Internet make it possible to provide targeted

    content geared toward specific audiences to a degree never before possible.

    As a natural consequence of more pervasive use of the Internet and the World Wide Web,

    demand has grown for the kind of content users have become accustomed to seeing in

    CD-ROM-based software titles: highly graphical, interactive multimedia productions.

    Furthermore, the reach and scalability of the Internet has made it an attractive platform

    for communications applications such as electronic webcasts and point-to-point and

    multi-party conferencing. Implementing these applications in the Internet environment

    presents a number of challenges: users must have access to sufficient bandwidth to

    support the demands of media-rich applications; their computer systems must have

    enough processing capability to decode compressed media streams in real time; and the

    application software must include mechanisms for transmitting and decoding continuous

    media data streams.

  • 3

    1.2 Hardware Technology

    The first two challenges referenced above have been addressed by addressed by the

    evolution of hardware technology. The amount of bandwidth available to end-users has

    increased steadily since the popularization of the web, with modem bitrates increasing

    from 14.4 kb/s to 56.6 kb/s, enough bandwidth to support high quality audio or low

    resolution video with medium quality audio. Other connection technologies, such as

    Integrated Services Digital Network (ISDN), cable modems and Asymmetric Digital

    Subscriber Line (ADSL), have made headway as well, promising to bring additional

    bandwidth to users and enable higher quality video in the not-too-distant future.

    Likewise, processor performance has increased significantly from year to year. In 1995, a

    Dell Dimension XPS personal computer achieved a score of 3.16 on the SPECint95

    benchmark, a widely-used benchmark suite for evaluating computer systems’

    performance on integer operations; in 1998, a Dell Precision Workstation 610 scored 19.0

    on the same benchmark [1], a more than six-fold improvement. The infusion of new

    technologies into the mainstream personal computer market, such as superscalar, RISC-

    based processing and special purpose instruction set enhancements like the MMX

    multimedia extensions to the Intel Architecture, have also increased the amount of

    computing power available to consumers. As a result, the computer systems available to

    most users are more than capable of decoding the highly compressed audio and video

    data streams generated by networked multimedia applications.

  • 4

    The pace of advancement in hardware performance shows no signs of slowing, either.

    Products and technologies on the horizon, such as Very Long Instruction Word (VLIW)

    computing and accelerated hardware implementations of multimedia algorithms promise

    to ensure that users will have enough processing power to handle the higher bitrate data

    streams made possible by the new connection technologies described above.

    1.3 Software for Internet Multimedia

    The improvements in processing power and Internet connectivity have driven the

    development of software architectures that take advantage of the available technology.

    Various commercial software vendors have released proprietary software products that

    provide differing levels of interactivity and richness. Among others, these include

    Macromedia, whose Flash and Shockwave products enable the display of simple

    animations and interactive presentations; Real Networks, whose Real System is the

    dominant platform for delivery of multimedia over the Internet; Microsoft, which offers a

    variety of live and on-demand multimedia and conferencing applications, and Apple

    Computer, which distributes software that enables QuickTime movies to be played over

    the Internet. Typically, these products are integrated with users’ web browsers through a

    plug-in mechanism defined by Netscape Communications Corporation in its Navigator

    web browser, a commercial follow-on to the original Mosaic browser.

    In their first incarnations, the software solutions for providing interactive multimedia

    over the Internet have been, to varying degrees, dependent on proprietary transport and

    control protocols. As a result of this dependence on proprietary techniques,

  • 5

    interoperability between different vendors’ implementations has been impossible. RTSP

    and several related specifications seek to address the problem of interoperation by

    providing a standard platform upon which multimedia applications can be built. These

    specifications, which include RTSP for control over the playback of multimedia; the

    Real-Time Transport Protocol (RTP) for delivery of continuous media data streams; and

    the Session Description Protocol (SDP), which facilitates the communication of

    information required to display a multimedia presentation, are open standards

    administered by the Internet Engineering Task Force. As such, they enable any developer

    to create clients and servers that work transparently with applications by other authors,

    while still allowing for differentiation through documented extension mechanisms.

    1.4 The Framework

    The framework described in this thesis is intended to provide a portable, flexible

    implementation of RTSP for use in a variety of applications. Although there are a number

    of other implementations of the protocol that are available for development purposes,

    none adequately capture both the simplicity and versatility of the specification. Some

    offer flexibility, but are difficult to use and extend, while others offer ease of use, but

    provide only limited functionality. This framework makes use of object-oriented

    techniques to provide easy-to-use, high-level abstractions, while still providing the user

    with the flexibility and access needed to implement complex, customized applications.

  • 6

    1.5 Organization

    The remainder of the thesis is organized as follows. Chapter 2 introduces RTSP, covering

    its design goals, evolution and features, and explores related protocols and technologies

    as they pertain to RTSP. Chapter 3 follows with a more detailed discussion of the

    protocol, providing an explanation of its message structure and semantics sufficient to

    understand the workings of the framework. Chapter 4 then presents the design of the

    framework itself, detailing the classes and objects that make up the library and the ways

    in which they interact, and Chapter 5 elaborates on the design by illustrating how the

    components of the framework are integrated into a pair of multimedia applications.

    Chapter 5 closes with performance data illustrating that the framework’s message

    processing architecture is robust and efficient enough to serve as the basis for heavy-duty

    multimedia servers. Finally, Chapter 6 considers issues raised by the current design and

    implementation of the framework and highlights directions future development might

    take.

  • 7

    Chapter 2 Background

    This chapter presents background information central to an understanding of RTSP’s role

    in networked multimedia systems in general and the design of this RTSP framework in

    particular. Section 2.1 discusses the origins, evolution and status of the RTSP

    specification. Section 2.2 describes the functionality offered by the protocol, and section

    2.3 outlines some of the applications for which RTSP was designed. Section 2.4 explores

    related protocols and standards for networked multimedia, and the closing section of this

    chapter discusses other implementations of RTSP that are available at the time of this

    writing.

    2.1 RTSP Evolution

    RTSP is an open standard published by the Internet Engineering Task Force (IETF) in a

    standards-track Request for Comments (RFC 2326) [2]. RTSP is currently classified as a

    Proposed Standard. Its development followed from the proliferation of proprietary

    protocols for control and transport of multimedia data over the Internet and the elusive

    goal of interoperability. The intent was to develop a flexible standard that enabled not

    just interoperation between similar products from different vendors, but the ability to use

  • 8

    the same tools, file formats and protocols for telephony and conferencing applications as

    in video-on-demand and webcasting environments.

    With these goals in mind, Anup Rao of Netscape Communications Corporation and Rob

    Lanphier of RealNetworks, Inc. (then called Progressive Networks) submitted a draft

    proposal to the IETF Multiparty Multimedia Session Control (MMUSIC) working group

    in November of 1996. Their proposal outlined a simple protocol based on binary, non-

    human-readable messages with support for requesting live or on-demand playback of

    multimedia. Shortly thereafter, Henning Schulzrinne of Columbia University submitted a

    counterproposal detailing a protocol making use of HTTP-like, textual messages and

    incorporating more general extension mechanisms, a more abstract treatment of

    multimedia transport protocols, and support for recording functionality, as well as

    playback. This specification evolved in series of Internet-Drafts released by the IETF,

    and the effort culminated with the release of the RTSP RFC in April 1998. The final

    specification is derived from Schulzrinne’s counterproposal, and contains contributions

    from researchers and developers at Netscape, RealNetworks, Columbia University,

    International Business Machines Corp., the French National Institute for Research in

    Computer Science and Control (INRIA), and Microsoft Corporation, among others.

    2.2 RTSP Features

    This section contains a high-level discussion of some of the features and characteristics

    of RTSP. Section 2.5.6 describes how these features are implemented by protocol.

  • 9

    2.2.1 Setup and Control

    At its core, RTSP is a protocol that enables applications to set up and control the

    playback and recording of multimedia data over a computer network. Setup consists of

    arriving at an agreement as to the kind of data that will be played or recorded and the

    mechanism through which it will be transported, and control offers the end-user the

    ability to interactively manage the flow of data to or from the multimedia server. The

    capabilities of RTSP with respect to each of these dimensions of functionality are

    described below.

    The setup process begins with a simple exchange of the capabilities supported by the

    client and server, including any non-standard extensions to the protocol that one party or

    the other has implemented or requires. It continues with negotiation of the transport

    mechanism to be used to carry the multimedia stream; this is done in advance of the

    initiation of delivery of the stream in order to ensure that neither participant is presented

    with a stream it cannot handle. Once an appropriate means of transport has been selected,

    playback or recording can commence, under the user’s control.

    RTSP offers the user the ability to start, suspend and restart the transmission of the

    multimedia stream as needed. In addition, RTSP-based applications can offer familiar,

    VCR-like features, such as the ability to scan backwards or forwards through a

    presentation and to seek to arbitrary points within a stored clip. RTSP also allows for

    aggregate control over separate streams, so that presentations consisting of several

  • 10

    distinct tracks, such as a recording of a videoconference with separate audio and video

    tracks, can be played and manipulated as a unified whole.

    2.2.2 Timing and Synchronization

    As described above, RTSP enables applications (and thus users) to begin playback at

    arbitrary points within a multimedia clip or presentation; likewise, applications can

    indicate that playback should stop at an arbitrary point. The desired starting and stopping

    points can be specified in seconds, relative to the start of the presentation or, in the case

    of archived recordings of live events, in wall clock time. In addition, clients can instruct

    servers to begin or stop playback at a specified wall clock time. Thus, an RTSP client

    could tell a server to play the third through sixth seconds of a stored video clip at some

    moment in the future, provided the RTSP session is still active at the time.

    RTSP-based applications can also use Society of Motion Picture and Television

    Engineers (SMPTE) timestamps to express offsets from the beginning of clips. SMPTE

    allows for frame-level control over playback and recording, thereby enabling RTSP

    clients and servers to be used to perform professional-quality distributed editing of

    multimedia presentations.

    2.2.3 Security

    RTSP provides flexible, open mechanisms for clients to interact in a secure manner with

    RTSP servers. Authentication and encryption of client-server interactions are supported

  • 11

    through Internet standards, allowing implementations to provide as much or as little

    security is required for a particular application.

    In addition, RTSP is designed to be friendly to the firewall and proxy software in place at

    many companies and public institutions that provide Internet access to employees and

    patrons. The protocol itself readily lends itself to handling by transport-level proxy

    services, such as SOCKS [3], and its status as an Internet standard makes it easy for

    vendors of packet-filtering firewalls to allow legitimate RTSP requests to pass

    unhindered while blocking the packets of hackers and other intruders.

    2.2.4 Extensibility

    In addition to supporting basic multimedia delivery through the features described above,

    RTSP was designed to accommodate unforeseen applications and usage scenarios

    through a variety of extension mechanisms. As needed, RTSP implementations can

    modify the behavior and semantics of the basic operations defined by the protocol; add

    entirely new operations; or, in the event the current protocol is completely unsuitable for

    a particular problem, but developers wish to maintain some degree of backwards

    compatibility with older software, just about every aspect of the protocol may be

    changed.

    Of course, the utility of changing the behavior of the protocol is greatly reduced if doing

    so results in an application that is unable to inter-operate with other implementations of

    RTSP. To address this problem, RTSP requires that clients and servers support a standard

  • 12

    means of feeling each other out in order to determine which non-standard options and

    enhancements the other supports.

    2.2.5 Flexibility

    By design, and unlike some multimedia control protocols that have preceded it, RTSP is

    flexible in its support for alternative mechanisms for tasks not directly related to the

    control of delivery of multimedia data. In particular, RTSP is agnostic with respect to

    such decisions as the choice of transport protocol used to deliver audio and video data

    and the representation format used to describe multimedia presentations. The protocol

    allows clients and servers to state the formats and protocols they support and arrive at a

    mutually acceptable decision.

    2.2.6 Ease of Use

    RTSP is designed to be easy to implement and use. It is text-based and human-readable,

    making debugging clients and servers easier. Its structure is modeled closely after that of

    HTTP [4] and MIME [5], allowing the reuse of existing code in new RTSP

    implementations. This similarity to HTTP also allows RTSP-based applications to take

    advantage of standard extensions to HTTP, such digest access authentication [6] and

    PICS [7, 8], a system for associating labels and ratings with content.

  • 13

    2.3 RTSP Applications

    This section describes some of the applications for which RTSP was designed and

    provides some insight into how particular features of the protocol are utilized in various

    settings.

    2.3.1 Video-on-Demand

    Video-on-demand is one of the core applications RTSP was designed to support. Through

    its basic command set, RTSP allows for the creation of sophisticated streaming video

    applications with support for many advanced features, including:

    • VCR-style control over the delivery and playback of multimedia,

    • the ability to start playback at an arbitrary point within a presentation,

    • independence from specific transport mechanisms and media types,

    • interoperation of clients and servers from different vendors,

    • tight integration with web browsers, and

    • pay-per-view and other logging and billing methods.

    2.3.2 Live Broadcasts

    RTSP provides many of the same benefits to live audio and video applications that it

    brings to on-demand systems. Administrators can easily use the protocol’s authentication

    features to construct secure commercial systems based on RTSP-aware clients and

    servers, and its support for varying transport mechanisms makes it feasible to support

  • 14

    both small-scale events in which the data stream is unicast directly to clients and large-

    scale, high-profile broadcasts supporting many thousands of clients via IP Multicast.

    Broadcasts can also combine the distribution methodologies to take advantage of the

    bandwidth efficiency of multicast transmission while allowing users whose access

    providers don’t support multicast traffic to participate.

    2.3.3 Near Video-on-Demand

    In addition to simple live and on-demand applications, RTSP supports near-on-demand

    delivery, an amalgamation of the two approaches discussed in [9]. In this usage scenario,

    a multimedia presentation is multicast several times at staggered intervals. This allows

    for the use of IP multicast so that bandwidth is utilized efficiently, while maintaining

    some of the flexibility and convenience of on-demand services. The multicast addresses

    used by the staggered multicasts can be determined dynamically by the RTSP server, so

    clients automatically receive the most recently started signal.

    2.3.4 Virtual Presentations

    Moving beyond the realm of simple streaming media applications, RTSP facilitates the

    creation of virtual presentations incorporating live, stored and interactive multimedia.

    The protocol’s support for playback of arbitrary segments of clips and controlling the rate

    of playback makes it easy to integrate several segments into a single presentation, and the

    ability to access seamlessly multiple servers simplifies the creation of interactive works

    incorporating disparate types of multimedia content.

  • 15

    2.3.5 Conferencing and Telephony

    RTSP can also be utilized in network-based conferencing and telephony applications.

    Although it does not provide the signaling functionality required of the protocols that

    form the foundation of these applications – those functions are left to dedicated protocols

    like H.323 [10] and SIP [11] – RTSP can be used to play prerecorded content into an

    active call or to record a conference for later retrieval and playback. In addition, RTSP

    can be used as the basis for IP-based voice mail and menu systems.

    2.3.6 Distributed Digital Editing

    RTSP’s explicit support for recording operations, together with the provisions it makes

    for frame-level timing, make it possible to implement distributed digital editing systems

    for multimedia. Such a system could make use of multiple networked playback and

    recording devices coordinated by an RTSP-based software application.

    2.4 Related Technologies

    2.4.1 Hypertext Transport Protocol

    Developed in 1990 at the European Laboratory for Particle Physics (CERN) in Geneva,

    Switzerland, the Hypertext Transfer Protocol (HTTP) provides access to a vast collection

    of inter-linked Internet resources, primarily consisting of graphics and text [12], called

    the World Wide Web (WWW, or web). Since the release of the NCSA (then the National

    Center for Supercomputer Applications) web server software and graphical browser,

  • 16

    Mosaic, the web has experienced phenomenal rates of growth in size and usage. In

    February 1994, the NCSA web server [13] handled one million requests per week; by

    December of the same year, its load had grown to four million per week [12]. In

    September 1998, some estimates placed the number of users of the web at 39 million

    people [14].

    As the web’s usage and accessibility grew, content providers sought to enrich their pages

    by moving beyond hypertext and graphics and incorporating new media, including audio

    and video, into their web sites. In November 1995, the web was estimated to contain over

    eleven million distinct resources hosted on more than 225,000 servers [15]. At that point,

    there were approximately 36,000 video files available on the web [16], and although

    audio and video accounted for only 1% of the requests handled by NCSA’s server, they

    were responsible for 28% of the bytes transferred.

    Many early efforts aimed at integrating audio and video into the web made use of HTTP

    to retrieve audio and video data in the same fashion as text and images. Several

    difficulties arise with this approach. First, HTTP is oriented towards whole-file

    downloads: the simplest way to access video via the web is to download an entire video,

    storing it locally on the user’s hard disk before playback begins. Because of the large size

    of files containing video data, however, this often results in a long wait between the time

    a user decides to view a video and the time it actually begins playing. Various “fast start”

    solutions have been adopted to address this issue [17], but they do not solve a more

    fundamental problem: HTTP’s use of TCP for data transport. As described in Section

  • 17

    2.4.5 below, TCP is inappropriate for transport of audio and video data because it

    introduces jitter, degrades picture quality by making sure each and every data packet is

    delivered in order, and imposes inappropriate flow control on the data stream [18].

    In addition, HTTP itself lacks features that are useful in the context of providing

    multimedia services; the protocol includes a very limited command set and requires that

    servers handle requests in a stateless fashion. As a consequence, there is no mechanism

    for clients to retrieve descriptions of available resources in standard formats; the

    transmission of a data stream cannot be paused and resumed in place at the user’s

    request; the parameters used to encode and transmit the data stream cannot be changed on

    the fly; and clients provide fine-grain, time-based control over the portions of the data

    stream that it receives. Newer revisions of the HTTP specification provide support for

    partial downloads [19], but at the level of byte ranges, which is inappropriate for

    multimedia due the fact that many encoding formats make it difficult or impossible to

    map time offsets to byte ranges without processing and decoding the entire data file.

    As a result of these issues, HTTP enjoys a complementary relationship with RTSP. Web

    pages served via HTTP can incorporate video accessed via RTSP, and HTTP can be used

    to access meta-data such as session descriptions for use within RTSP-based applications.

    HTTP is an Internet Engineering Task Force standard protocol described in full in a

    series of IETF Request for Comments documents [4, 19].

  • 18

    2.4.2 PNA

    PNA is the control and transport protocol used by the popular RealAudio, RealPlayer,

    and RealServer streaming multimedia applications until the release of RealSystem G2,

    comprising RealPlayer G2 and RealServer G2, in November 1998. PNA is a proprietary

    protocol and as such, the full specification is not publicly available, although Real

    Networks has released enough information to enable firewall and proxy software authors

    to incorporate support for recognizing and relaying PNA traffic. With the release of Real

    System G2, PNA has been replaced by RTSP (for control traffic) and RTP (for

    multimedia data), though the client and server components continue to support PNA for

    the sake of interoperability with the installed base of servers and clients.

    2.4.3 H.323

    Approved in 1996 by Study Group 15 of the International Telecommunications Union,

    ITU Recommendation H.323 [10] describes protocols and components for systems that

    provide conferencing and telecommunications services over packet-based networks. The

    H.323 Recommendation encompasses standards for call control, multimedia and

    bandwidth management and interfaces between packet-based communications systems

    and circuit-switched networks. H.323 is one of a series of recommendations, known as

    H.32X, that propose standards for conferencing applications over a variety of networks,

    notably ISDN and the standard voice telephone network, which are addressed by

    Recommendations H.320 and H.324, respectively [20]. Intended to provide

    interoperability among compliant products and devices, H.323 has garnered considerable

  • 19

    support among vendors in the computing and telecommunications industries. Among

    others, Intel, Microsoft Corporation and Netscape Communications Corporation have

    pledged to support the standard in their products.

    An H.323-based conferencing and communications system comprises a collection of

    components, each of which provides a distinct service to the other components of the

    system. The types of components defined in the Recommendation are as follows:

    Terminals, which constitute the end-user endpoints of H.323 conferences; Gateways,

    which provide interfaces between H.323-based systems and other communications

    architectures; Gatekeepers, which provide access control services; and Multipoint Control

    Units, which work with Multipoint Processors and Multipoint Controllers to enable

    conferences involving more than two parties. These components are logical, rather than

    physical, constructs; a single piece of hardware might contain one or many H.323

    components. In addition, not all environments require every type of component; in the

    simplest case, that of a two-party, endpoint-to-endpoint call within a single local area

    network, terminals may be all that is required.

    In addition to describing the hardware components that make up an H.323 system, the

    Recommendation specifies communications protocols and encoding mechanisms through

    which the components interact. These include a number of other ITU recommendations,

    including standards for audio and video encoding algorithms (H.261, H.263, G.711,

    G.722, G.728, G.729, MPEG1, G.723.1, etc.) and protocols for the transmission of binary

    data within conferences (T.120), packetization of media streams for transmission over

  • 20

    computer networks (H.225), capability negotiation and channel allocation (H.245) and

    call signaling (Q.931). The Recommendation specifies the use of Internet standard

    protocols, including RTP, RSVP and IP Multicast, to enable the transmission of audio

    and video data over IP-based networks. In addition, H.323 defines a protocol,

    Registration/Admission/Status (RAS), used to control allocation of network resources,

    regulate access to local and remote systems, and perform address translation

    A more in-depth description of the elements of an H.323 system and the ways in which

    those elements interact using the above-mentioned protocols and standards follows.

    Terminals: Terminals represent the endpoints of an H.323 conference or call; they

    provide for real-time, two-way communications between the user and other parties

    involved in the call. The specification requires that all compliant Terminals support audio

    encoded using one of the standard mechanisms; optionally, Terminals can also support

    video and data conferencing. Encoded audio and video are packetized as dictated by

    H.225 and transported via RTP, whereas data is encoded and transmitted as specified in

    T.120. The encoding algorithms and data types to be included in a call are negotiated,

    along with other considerations, such as the bitrates of the data streams generated by each

    party, as described in H.245. Terminals must also support RAS in order to communicate

    with Gatekeepers and may optionally incorporate a Multipoint Control Unit, described

    below, in order to facilitate participation in multi-party conferences.

    Gateways: H.323 Gateways enable H.323-enabled devices and software to communicate

    with other ITU-compliant conferencing terminals residing on circuit switched networks

  • 21

    such as ISDN and the public telephone network. Providing this functionality entails

    translating between different transmission formats, call control mechanisms, and

    encoding formats, as well as establishing and maintaining connections on both of the

    involved networks. Terminals communicate with Gateways using the H.245 and Q.931

    call setup and signaling protocols.

    Gatekeepers: H.323 systems can optionally include a Gatekeeper component that

    provides call control services that help preserve the integrity and usability of the network

    on which the system resides. The simplest of these services is translation of locally-

    known aliases for H.323 endpoints and gateways into transport-level network addresses;

    in systems including a Gateway, a Gatekeeper component is also used to translate

    incoming E.164 (ISDN) addresses into the corresponding packet network addresses.

    In addition to performing address translation, Gatekeepers control conferencing

    endpoints’ access to the network. Because Terminals and other endpoints are required to

    make use of a Gatekeeper if one is present in a system, Gatekeeper components can limit

    the number of simultaneously active calls, thereby managing the amount of network

    bandwidth consumed by conferencing applications. Furthermore, Gatekeeper can provide

    call authorization functionality; for example, a Gatekeeper could enable or disable calls

    to or from certain endpoints or restrict users’ ability to make calls to certain hours of the

    day. The vendor of the hardware including the Gatekeeper determines the actual criteria

    used to determine whether a call is allowed or disallowed.

  • 22

    Finally, a Gatekeeper component can also be used to facilitate the ad hoc creation of

    multipoint conferences. When a point-to-point call is established between two parties, the

    Gatekeeper for one of the parties can be configured to receive the H.245 signaling

    information associated with the call. No processing of the H.245 data is required of the

    Gatekeeper; it simply passes the data from one endpoint to the other. Then, when one of

    the participants elects to expand the conference to include a third party, the Gatekeeper

    directs the H.245 traffic to a Multipoint Controller as well, which then establishes the

    parameters under which the expanded call will operate.

    Gatekeepers provide their services to Terminals and other endpoints via the RAS

    protocol.

    Multipoint Control Unit: Like Terminals, Multipoint Control Units (MCU) are

    endpoints – entities which users can call directly. A Multipoint Control Unit consists of a

    Multipoint Controller and some number of Multipoint Processors; these components

    work together to enable conferences involving three or more parties. Multipoint Control

    Units can be located within Gateways and Gatekeepers, but in such an instance

    implement wholly separate functions that merely happen to be located in the same piece

    of equipment.

    H.323 supports two types of multipoint calls, referred to in the recommendation as

    centralized and decentralized conferences. In centralized conferences, all the Terminals

    or other endpoints involved transmit audio, video, data and control information directly to

    a Multipoint Control Unit. The MCU is responsible for multiplexing the incoming audio

  • 23

    streams, enabling endpoints to select a video feed in which they’re interested, and

    distributing the appropriate streams to the participants. In contrast, in decentralized

    conferences, participants use IP Multicast or a comparable technology to simultaneously

    deliver audio and video content to all of the participants in a call, while call signaling and

    shared data such as whiteboard information are still processed in a centralized fashion by

    a Multipoint Control Unit.

    Multipoint Controller: A Multipoint Controller resides within a Multipoint Control Unit

    and implements control functionality that enables conferences involving three or more

    parties. Multipoint Controllers use H.245 to determine the capabilities of each endpoint

    involved in the conference and inform the participants of the encoding formats and

    communications parameters acceptable to the group as a whole; this set of capabilities is

    revised as users join or leave the call.

    Multipoint Processors: The remaining component of a Multipoint Control Unit, a

    Multipoint Processors (MP) is responsible for transforming the audio and video data

    streams generated by participants in a centralized multipoint conference into the

    appropriate form and returning the resulting streams to the connected parties. The

    processing performed by an MP typically consists of some combination of switching and

    mixing of the incoming data, and can also entail converting audio or video data into

    alternative formats for display or playback on terminals that can’t decode the primary

    format in use within the call.

  • 24

    Although H.323 is designed for advanced communications services, it can also be used

    for simple media playback applications like those at which RTSP is targeted; Microsoft

    Corporation’s NetShow and Media Player streaming video applications use H.323 in

    exactly this fashion. Using H.323 for simple media playback and recording operations is

    not without disadvantages, however. The ITU Recommendation that defines H.323 is

    long and complex, making the specification difficult to implement [21]; likewise, the

    protocols that make up the specification are themselves complex, resulting in higher call

    setup latencies than are possible with a more lightweight protocol like RTSP. Finally, the

    complex nature of H.323 results in more complicated and larger implementations, which

    makes them inappropriate for some use in some applications, such as within a

    downloadable Java applet.

    2.4.4 Session Initiation Protocol

    Like H.323, the Session Initiation Protocol (SIP) is a signaling protocol that enables the

    creation of point-to-point and multi-party conferences and allows users to invite servers

    and other users to participate in active calls [11]. SIP is designed as a more lightweight,

    flexible and modular alternative to H.323, with a lineage consisting of standard Internet

    protocols like HTTP, RTP and RTSP and without the baggage of the circuit-switched

    ISDN protocols upon which H.323 is based. The creator of SIP provides a detailed

    comparison of the two architectures in [21]; the protocol itself is a work-in-progress

    described in draft form in an Internet Draft [22]. A brief synopsis of the design and

    underlying philosophy of SIP follows. A comprehensive description of an Internet

  • 25

    telecommunications architecture based on SIP and related protocols like RTP and RTSP

    is presented in [23].

    One of the primary objectives of the SIP design is simplicity: the protocol is intended to

    be easy to parse and debug. This objective is accomplished by using much of the syntax

    of HTTP, extending that protocol to allow bi-directional messaging. This enables the

    reuse of existing code for parsing HTTP-style messages and the multitude of HTTP

    extension mechanisms, and the fact that the protocol is text-based facilitates debugging

    SIP-based applications. In contrast, H.323 messages are encoded in binary form using

    relatively complicated ASN.1 packed encoding rules.

    The simplicity of SIP’s design also extends to the semantics of the messages exchanged

    by participants in a conference. Rather than requiring applications to utilize a Byzantine

    collection of interrelated protocols in order to provide communications services, SIP is

    based on a small number of orthogonal commands that can be composed to provide high-

    level call signaling functionality.

    SIP’s simplicity brings with it a measure of focus and modularity to the specification.

    The protocol’s scope extends solely to call setup and control functions; the mechanisms

    through which features like service discovery and quality-of-service are provided are left

    unspecified, so any of a number of alternative approaches can be utilized. The protocol

    leverages existing Internet infrastructure, such as the Domain Name System and

    electronic mail address formats, where possible rather than invent new solutions to old

    problems. Furthermore, the features provided by SIP are not interdependent; for example,

  • 26

    an application might use SIP’s call setup functionality to locate the target of a call and

    then use H.323’s to establish and maintain the call.

    Extensibility and scalability are the other main goals of SIP’s design. Whereas H.323 can

    be extended to support application- and vendor-specific functionality primarily through

    nonstandardParam fields included in specific locations in its protocols’ grammars, SIP

    builds on the mechanisms used in Internet protocols like HTTP and Simple Mail Transfer

    Protocol (SMTP) [24] to provide significantly more flexibility. SIP’s architecture allows

    clients to specify the exact features they require and for servers to accept or reject

    requests based on their support for clients’ needs. In addition, SIP allows implementers to

    add significant functionality to the protocol while preserving compatibility with existing

    implementations: by default, SIP agents ignore unknown headers within requests and

    reply messages, so older implementations can handle messages from applications

    supporting extensions to the protocol transparently.

    SIP’s extensibility also extends to the critical subject of audio and video encoding

    formats. The primary mechanism through which information about encoding formats is

    conveyed in SIP is SDP, which uses textual names to identify the codecs the participants

    in a SIP session can understand. These names can be registered by individuals or groups

    with the Internet Assigned Numbers Authority (IANA), which provides contact

    information for the registrant to interested parties so that implementers can incorporate

    support for any registered format in their applications. The H.323 Recommendation, on

    the other hand, mandates that supported codecs must be centrally registered and

  • 27

    standardized with the ITU. As of July 1998, the only encoding formats approved for use

    were ITU-developed, and many of them incorporated significant amounts of proprietary

    intellectual property, making developing inexpensive H.323-based systems a challenging

    proposition.

    Finally, SIP improves upon H.323 in its utility for large-scale communications systems

    by dint of its support for stateless and multicast signaling. Use of stateless call processing

    enables SIP gateways and servers to handle a larger number of calls by reducing the

    amount of memory and processing overhead associated with setting up and maintaining a

    call. Multicast signaling allows SIP conferences to scale transparently from two to a

    multitude of participants by removing the need for a central location at which all call

    processing occurs.

    Although it is possible to provide multimedia playback and recording services via SIP,

    the signaling functionality that is the primary focus of the specification is superfluous in

    these kinds of applications. RTSP, which is geared exclusively towards providing access

    to stored and live multimedia data streams, is more appropriate. In fact, RTSP and SIP

    are designed to complement each other nicely: SIP-aware conferencing systems can use

    RTSP to enable features like voice mail, recording of conferences for future retrieval in

    on-demand fashion, and playback of previously recorded material into an active call.

  • 28

    2.4.5 Real-Time Transport Protocol

    One of the key challenges of providing advanced multimedia services over TCP/IP-based

    Internets is delivering audio, video and other forms of data to viewers and listeners

    quickly and efficiently, so that playback can begin promptly and proceed smoothly,

    without delays or artifacts. The Real-Time Transport Protocol (RTP) is designed to

    facilitate delivery of data with real-time characteristics, like audio and video, providing

    functionality such as payload type identification, sequence numbering, source

    identification, timestamping, and receiver-generated feedback for quality of service

    monitoring. RTP is discussed at a high level in [9] and in detail in a standards track

    Internet Engineering Task Force RFC [25].

    Internet and web-based multimedia applications are well suited to a specialized protocol

    like RTP, rather than the more common HTTP, described above, because of the network

    performance requirements imposed by the characteristics of continuous media. In order to

    play back high quality audio or video, multimedia-enabled clients and servers depend on

    the network to provide predictable, if not necessarily minimal, delay. In addition, it is

    typically unnecessary for every last bit of multimedia data streams to be delivered intact;

    the delays and artifacts caused by retransmitting lost data packets are often more

    noticeable than the effects of ignoring them altogether. For these reasons, reusing

    existing protocols like HTTP, which make use of TCP as the underlying transport

    mechanism, for multimedia data is usually inappropriate. Because it is designed to deliver

    data packets reliably, TCP wastes time retransmitting dropped data packets, even when

    the application would be better served by ignoring them. Furthermore, TCP provides

  • 29

    windowing and congestion control mechanisms that, though effective for many data

    streams, can introduce latencies that obscure the temporal relationships between packets

    in continuous media streams. Finally, TCP-based protocols cannot take advantage of IP

    Multicast, described in section 2.4.7, which makes them ill-suited for large-scale

    conferencing and live multimedia applications [9, 18, 26].

    The origins of RTP can be found in earlier work on NVP[27] and PVP[28]. The protocol

    is designed to serve as a common basis for a variety of real-time continuous media

    services; RTP is application-independent, with application-specific profiles, or

    specializations, providing additional functionality needed for particular real-time

    applications. In a sense, the RTP specification is incomplete: in addition to the basic

    protocol, an application must incorporate an appropriate profile and payload formats in

    order to make effective use of RTP. The profile for audio and video conferences is

    described in [29].

    Because RTP is designed to be application-independent, implementations can be

    packaged into reusable code libraries and incorporated directly into clients, servers, and

    associated tools. Thus, protocol processing is performed at the application level, rather

    than in a separate layer. This approach, called application level framing and integrated

    layer processing, is described in [30]. In addition, the application-independence of RTP

    also enables the creation of generic tools for monitoring, tracing and providing quality-

    of-service information about real-time traffic without regard for the specific application

    that generated it.

  • 30

    RTP traffic is divided into data packets and control packets. Continuous media data is

    carried in data packets, while information about the performance of the network and other

    non-continous data are communicated via the control mechanism. When, as is typically

    the case in IP-based environments, both data and control information are carried in UDP

    packets, sequential port numbers are used, with the lower, even-numbered port the target

    for packets containing multimedia data and the higher, odd-numbered port used as the

    destination for control packets.

    RTP data packets begin with a number of fixed headers that provide information common

    across the spectrum of applications supported by RTP. These include a sequence number,

    payload type, timestamp, and identifiers specifying the source or sources of the data

    contained in the packet. The timestamp is a thirty-two-bit value whose meaning is

    dependent on the profile in use and the payload type; source identifiers are also thirty-two

    bits in length, but do not correspond to a particular address format, such as four-byte IP

    addresses. Rather, individual sources select random identifiers when they first generate

    data to be transmitted via RTP, and conflicts are detected and eliminated as they occur.

    The fixed header data is followed immediately by whatever profile-specific headers are

    required by the profile for a particular application; these headers, if present, are then

    followed by data of the variety specified in the payload type header field.

    As described above, RTP control packets are used to convey information regarding the

    level of service being provided to users by the network and real-time application. In

    addition, RTP control packets can be used to communicate further information about the

  • 31

    party generating RTP traffic and to establish rough synchronization relative to wall clock

    time among the participants in a live event or conference.

    The format and communications parameters for RTP control packets are defined as a

    subsidiary protocol within the RTP specification [25] called the RTP Control Protocol

    (RTCP). RTCP packets are similar in structure to RTP data packets, but are usually

    carried over a separate transport-level “connection.” The parties using an RTP-based

    application periodically multicast RTCP packets to the other participants in order to

    provide feedback about their state even when no real-time traffic is being generated.

    Both producers and consumers of RTP data must distribute RTCP packets containing

    sender or receiver reports. Receiver reports contain information useful to senders such as

    the highest sequence number yet received; timestamps, which allow senders to compute

    round-trip times; and a measure of jitter, or the variance in packet arrival times within a

    data stream. Sender reports enable listeners to estimate the actual data rate of the real-

    time data stream and establish a relationship between the timestamp values in the RTP

    data packets they are receiving and wall clock time. Sender reports also contain

    additional information identifying generators of RTP data above and beyond the thirty-

    two-bit identifier included in RTP data packets. This information allows recipients to

    provide readable names for senders to users and to maintain sender-specific information

    when identifier conflicts occur.

    Participants in a conference or broadcast decide how often to send RTCP messages based

    on they bandwidth consumed by the data they are sending or receiving, thereby limiting

  • 32

    the amount of control traffic to a known, fixed percentage of the overall load. A value of

    5% is suggested in the RTP specification, but the establishment of a mandatory fraction is

    left to the creators of profiles. So that newcomers to a session can quickly identify the

    parties generating RTP traffic, senders of real-time data are collectively allocated a

    quarter of the fraction of the load assigned to control traffic. The remaining three-quarters

    are split evenly among all the receiving participants. The specification includes an

    algorithm for determining the appropriate interval at which RTCP reports should be

    generated based on the available bandwidth and the number of participants [25].

    RTP and RTSP complement each other nicely, with the former providing transport

    services for networked multimedia applications and the latter providing control

    functionality for those same applications. This is somewhat unsurprising, considering that

    both protocols are defined in standards-track Internet RFC’s that share key contributors.

    Although RTSP is designed to be neutral with respect to transport mechanisms, RTP is

    the only multimedia transport protocol for which a means of specifying its use is

    described in the RTSP specification [2].

    2.4.6 Video Datagram Protocol

    Like RTP, the Video Datagram Protocol (VDP) is a transport protocol geared to the

    delivery of continuous media data over computer networks. Unlike RTP, VDP is

    designed to take advantage of the point-to-point connections between clients and servers

    in order to provide VCR-style control functionality, such as play, pause, rewind and fast-

    forward commands, and to enable optimized data transmission appropriate to the levels

  • 33

    of network congestion and CPU utilization experienced by the participating systems.

    VDP was developed by researchers at the University of Illinois in 1995 and 1996, and is

    described in a paper presented at the Fourth International World Wide Web Conference

    [18] and in a U.S. patent application [31]. Motivated by the difficulties posed by the use

    of HTTP for the transmission of multimedia data in Internet applications, VDP is

    designed to address the variability of Internet performance and client load, while

    providing real-time, on-demand delivery of audio and video streams.

    VDP is an asymmetric protocol involving two endpoints, a server and a client, which

    communicate using two distinct data channels. The first is a reliable control channel used

    by the client to convey user commands and connection management information to the

    server; the second is an unreliable channel used for the transmission of multimedia data

    from the server to the client and non-critical feedback in the opposite direction. This

    feedback mechanism provides VDP-based systems with the ability to dynamically adjust

    the data stream in order to accommodate changes in network performance and client CPU

    utilization.

    During playback, a VDP client measures performance by estimating packet round-trip

    times and monitoring the rate at which the system is displaying received video frames

    and the percentage of incoming data packets which are being lost in transit. In the event

    the client system is unable to display the video stream due to insufficient CPU power or a

    sufficiently high percentage of data packets are being lost, the VDP software generates a

    feedback message which is sent from the client to the server over the unreliable data

  • 34

    channel. Upon receipt of such a message, the server addresses the situation by thinning

    the data stream, reducing the frame rate of the video feed sent to the client and as a result,

    the amount of network bandwidth and processing power necessary to deliver and display

    the content. Should performance continue to suffer, additional feedback messages are

    generated that trigger further thinning of the data stream. Conversely, should the

    measured levels of packet loss or CPU utilization drop to more acceptable levels, the

    client generates a feedback message that instructs the server to restore the data being

    removed from the data stream. In this fashion, VDP clients and servers adapt to changing

    conditions by altering the data stream as needed to ensure that video quality degrades

    gracefully in the presence of problematic phenomena.

    In addition to enabling video clients and servers to adapt dynamically to difficult

    situations, VDP incorporates a demand re-send algorithm that improves the quality of

    video encoded in media formats that include inter-frame dependencies, such as MPEG.

    Because the communications channel used by VDP for video data is unreliable, it is

    possible that a packet containing frame data upon which subsequent or previous frames

    depend could be lost. In this event, VDP clients can request that the lost packet be re-

    sent; it is the responsibility of the client to maintain an internal data buffer sufficiently

    large to ensure that the re-sent frame arrives in time to be displayed and to identify which

    frames are important enough to re-send based on the media format in use.

    The VCR-style control functions supported by VDP overlap significantly with the scope

    of RTSP’s functionality. As RTSP is designed to function independently of a designer’s

  • 35

    choice of transport protocol, it would be possible to construct a system that utilized RTSP

    to convey control information and VDP to carry audio and video data, but the value of

    doing so might be somewhat limited. RTP is presently in more widespread use than VDP,

    is implemented in a wider variety of applications and ancillary tools, and offers support

    for a broader base of media encoding formats. VDP’s ability to allow applications to

    adapt to changes in the environment is critical, however; similar functionality might be

    incorporated into RTP-based applications through the RTCP receiver report mechanism.

    Further information about VDP and related research, including discussion of a

    comprehensive architecture for structuring streamed multimedia presentations and

    enhancements to VDP supporting frame-level addressing and in-line hyperlinking, is

    presented in [32].

    2.4.7 IP Multicast and the MBONE

    In the simplest of network-based, real-time multimedia distribution systems, user clients

    contact a server to request access to the feed and the multimedia data stream is sent

    directly to them over the network. If 1,000 clients are interested in a particular broadcast,

    1,000 identical copies of the stream are generated at the server and distributed over the

    network simultaneously. Thus, when many users desire to access a live multimedia

    broadcast, it is more efficient to generate a single data stream instead, which is then

    replicated as needed when the network paths to recipients of the broadcast diverge [9].

    The former approach is called unicasting; the latter, multicasting.

  • 36

    An early approach to multicasting over the Internet was ST-II [33]. ST-II utilized a

    sender-oriented approach, in which each sender established a set of appropriate

    connections based on the currently active recipients. Unfortunately, this approach is

    unsuitable for large-scale distribution of streaming multimedia, because each recipient

    must inform every potential sender of his participation in order to be added to the list of

    active endpoints [9].

    To address this problem, IP Multicast, defined as an Internet standard in [34] and

    extended in [35], takes a receiver-oriented tack. Clients interested in receiving a multicast

    advertise that fact using the Internet Group Management Protocol (IGMP). A multicast-

    aware router on the client’s local network receives the advertisement, notes the client’s

    interest, and uses the Distance Vector Multicast Routing Protocol (DVMRP) to

    communicate the client’s interest to other multicast routers on the Internet. Multicast-

    aware routers use DVMRP to ensure that multicast data is routed in such a way that it

    reaches all interested parties without unnecessarily congesting networks that don’t

    contain participating clients. Because the distribution and replication of multicast data is

    handled transparently by cooperating routers, there is no need for senders to maintain a

    record of all participating clients. As a consequence, the IP Multicast approach is far

    more scalable and robust in environments with large numbers of listeners that can come

    and go with relative frequency. However, IP Multicast’s reliance on router intelligence is

    also a hindrance, as a significant fraction of Internet backbone routers are not multicast-

    aware [9].

  • 37

    Because so much of the Internet does not support IP Multicast, the MBONE, or Multicast

    Backbone, was created. The MBONE is an overlay network that connects islands of

    networks that support multicast by encapsulating multicast packets in normal, unicast

    UDP packets and transmitting them over the unsuspecting Internet. Described in depth in

    [36, 37], the MBONE connects multicast routers using unicast tunnels over which

    multicast data and DVMRP traffic is passed.

    Although a control protocol like RTSP is not required in order to make use of IP

    Multicast, there are number of ways in which RTSP and IP Multicast can be used

    effectively together. For example, using RTSP enables multicast sessions to be named via

    a simple URL, which client applications can use to obtain a full specification of a

    session’s parameters. This mechanism can also be used to control access to multicast

    sessions, using RTSP to authenticate users and provide them with the encryption keys

    necessary to decrypt a private multicast. In addition, an RTSP server providing on-

    demand audio or video services can be used to play stored multimedia clips into an IP

    Multicast session already in progress or to record its contents for later viewing. The

    participants in such a conference can use RTSP clients as a virtual remote control,

    stopping and starting playback or recording as needed.

    2.4.8 Session Description Protocol and Session Announcement Protocol

    Described fully in [38], Session Description Protocol (SDP), is designed to enable the

    announcement of the existence of multimedia broadcasts or conferences to clients and to

    convey the information necessary for interested parties to participate in such a session.

  • 38

    An SDP description encapsulates such information as the name and purpose of a

    multimedia session; the time or times during which the session will be active; the types of

    media included in the session, and the formats in which they are encoded; and addressing

    information such as URL’s, ports and internet addresses that describe how clients should

    obtain access to the session. In addition, descriptions can include additional information,

    such as contact data for the individual responsible for administrating the session and

    details of the resource requirements of the broadcast or conference.

    SDP is simply a data format, despite its name; no mechanism for transporting a session

    description is described in the specification. Rather, SDP descriptions are carried over

    other protocols, such as RTSP, SAP, and MIME-based electronic mail. Furthermore, SDP

    session descriptions are text-based; this facilitates portability, the encapsulation of

    descriptions within other text-based Internet protocols, and automated generation of

    descriptions using scripting languages such as TCL and Perl.

    The Session Announcement Protocol (SAP) is a simple protocol used to disseminate

    announcements of multimedia sessions over the Internet. Session announcements take the

    form of a single UDP packet containing a SAP header and a textual payload, which is a

    single SDP session description. Announcements are multicast to a well-known multicast

    address and port and can be received by any individual with multicast-capable hardware

    and software. SAP is fully specified in an Internet-Draft [39] which has expired, but

    should be the subject of an IETF RFC in the near future.

  • 39

    SDP is complementary to RTSP. An RTSP server can use SDP to provide a client with a

    description of a particular multimedia resource in response to a DESCRIBE request, and

    the client can then use the information contained in the description to set up appropriate

    encoders and decoders to allow the user to participate in the session. Alternatively, a user

    might receive an SDP session description via electronic mail or from some other source.

    The description can be provided manually to client software, which can then use RTSP to

    initiate transmission and playback of the session. In either case, the mechanism used to

    communicate the characteristics of the broadcast or conference (SDP) is distinct from the

    mechanism used to control the user’s participation in the session (RTSP). In that SAP

    constitutes a mechanism for the distribution of session announcements containing SDP

    descriptions to potential participants, RTSP and SAP are complementary as well.

    2.4.9 Synchronized Multimedia Integration Language

    Synchronized Multimedia Integration Language, or SMIL, is an HTML-like language

    designed to allow users to use a text editor to create streaming multimedia presentations

    for playback over the WWW. Described in a recommendation of the World Wide Web

    Consortium [40], SMIL enables users to assemble multimedia resources to form a

    presentation, to describe how the presentation should be displayed on-screen, and to

    associate hyperlinks with multimedia objects. The multimedia resources that make up a

    SMIL-based presentation are specified as Uniform Resource Identifiers (URI’s) and can

    therefore include multimedia content accessible via RTSP as well as other access

  • 40

    protocols, like HTTP and FTP. A number of multimedia and network software companies

    have promised support for SMIL in current and future products.

    2.5 Other RTSP Implementations

    2.5.1 Real Networks Reference Implementation

    Real Networks Corporation provides a reference implementation of RTSP based on the

    July 30, 1997 draft of the RTSP specification; it has not been updated to reflect the final

    version of the standard. Released under the terms of the GNU general public license,

    documentation and source code for the reference implementation are available from Real

    Networks’ web site [41]. The package includes basic client and server implementations,

    as well as several sample applications, all written in the C programming language. The

    sample applications support playback of audio files in several formats and incorporate a

    simple implementation of RTP, a graphical client interface, and server management via a

    configuration file.

    The reference implementation is intended to server as a test platform for other

    implementations rather than as the basis for full-featured applications. As such, the

    source code is not designed to be particularly flexible or extensible. In particular:

    • the implementa