Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Multimedia Workstation Architecture with ATM Interconnect
Tomasz Solkowski
A thesis submitted in conformity with the requirements for the degree of Master of Applied Science
Graduate Department of Electrical and Cornputer Engineering, University of Toronto
0 Copyright by Tomasz Solkowski 1997
National Libraiy 1+1 of Canada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques
395 Wellington Street 395, rue Wellington Onawa ON K1A O N 4 OttawaON K1AON4 Canada Canada
Your li& Votre reterence
Our file Notre relerence
The author has granted a non- L'auteur a accordé une licence non exclusive licence dowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sel1 reproduire, prêter, distribuer ou copies of this thesis in microforrn, vendre des copies de cette thèse sous paper or electronic formats. la fonne de microfiche/filrn, de
reproduction sur papier ou sur format électronique.
The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sms son permission. autorisation.
Multimedia Workstation Architecture with ATM lnterconnect Tomasz Solkowski
Master of Applied Science. 1997
Department of Electrical and Computer Engineering University of Toronto
Abstract
This thesis describes a multimedia workstation architecture, which uses an ATM switch rather
than a traditional system bus, for connecting both multimedia and non-multimedia peripherals.
This architecture is intended to eliminate the performance bottlenecks which are present in a
typical bus-based workstation during transfers of high bandwidth information among the internai
workstation cornponents and between the workstation and the external network.
The thesis discusses other recently developed mu1 timedia workstation architectures, their
advantages and shortcoming. The detailed structure of the proposed workstation architecture,
including the CPU, memory, video display, disk, and ATM switch, is discussed.
To offer a performance cornparison between the proposed architecture and a genenc bus-based
systern, a cornputer simulation was created to show the delays for both types of in terco~ect
when subject to typical multimedia data streams. The simulation shows that the performance of
the ATM workstation can be up to nine times better compared to the bus-based workstation for
typical high bandwidth mu1 timedia ioads. However, the performance of the proposed ATM
architecture is highly dependent on the nurnber and the pattern of connections among
workstation peripherals through the ATM switch.
Acknowledgements
1 would like to thank my supervisor, Professor Safwat G. Zaky, for his guidance and counselling
throughout the last two years of my research. His help with technical and every day problems, as
well as the incredible attention to detail have been invaluable in the process of writing this thesis.
i am very gntehil to my family, Grazyna, Helena, and Andrzej, for their moral and financial
support. The love and support of my fiancee, Ewa, cannot go unnoticed. Her unconditional
understanding, always believing in me, as well as the constructive criticism have helped me
tremendously in my academic efforts.
1 must also inciude thanks to the Information Technology Research Centre and to the Univenity
of Toronto for their generous financial support.
Table of Contents
List of Figures ................................................... xi List of Tables .................................................. xiii Glossary ....................................................... xiv Chapter 1 Introduction ........................................... 1
1 . 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Background ........................................... 6 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Multimedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1. The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2. Multimedia modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3. Multimedia workstation components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.4. Time constraints of multimedia traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1. Description . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2. The ATM cell format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.3. Advantages of ATM .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4. Multimedia workstation architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.2. VuNet . . . . . . . . . . . . . . . . . . . .... .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2.1.Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2.2. lmplementation details ....................................... 21 2.4.2.3. Critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3. Netstation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.3.1. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.3.2. lmplementation details ........................................ 28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3.3. Critique 29 2.4.4. Desk Area Network (DAN) ......................................... 30
...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4.1. Architecture 30
2.4.4.2. lmplementation details . . . . . . . . . . . . . . . . . . . ... .... . . . . . . . . . . . . . 33 2.4.4.3. Critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5. MB86680B, ATM switch elernent from Fujitsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 3 Architecture ......................................... 39 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2. Architecture rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3. Architecture objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4. General structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.7. Interna1 ATM LAN . . . . . . . . . . . . . . . . . . . . . .. ... . . . . . ... 43 . . . . . . . . . . . . 3.4.2. lnterconnect topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3. lnterconnect flexibility 46 3.4.4. Network interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5. Basic workstation configuration 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6. Control path structure 50
3.4.7. Initial setup and typical usage scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5. The components and their implernentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1. Systern board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.2. Disk storage node . . . . . . . . . . . . . . . . . . . . . . . . . . .. ... . . . . . . . . . . . . . . . . . 57 3.5.3. Live video processing node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6. Cornparison with other multimedia architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 60
........................................... Chapter 4 Simulation 62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Introduction 62
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. ATM interconnect simulator 62
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1. Simulator cornponents 62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2. Simulator events 63
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3. Simulation metrics 65 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4. Simulation parameters 65
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5. Assumptions .... 66 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6. Limitations 67
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Bus simulator 68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1. Simulator components 68
................................................. 4.3.2. Simulation metrics 68 4.3.3. Assurnptions ....................................................... 68
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4. Limitations 70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Analysis and presentation of results 70
A.5.1. Memory node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A.5.2. Cell drop rate for the ATM system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
. . . . . . . . . . . A.6. Browsing and Mo-way teleconferencing with high paging load 103
A.6.1. Mernory node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.6.2. Cell drop rate for the ATM system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
List of Figures
Multimedia workstation architectures (a) using a bus. (b) using an ATM switch as an interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
OS1 reference rnodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 ATM cell structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Fields in ATM cell header as defined by UNI 3.0 . . . . . . . . . . . . . . . . . . . . . . . 17
VuNet multimedia architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Netstation interna1 LAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Netstation MOSAfC node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Typical configuration of DAN multimedia workstation . . . . . . . . . . . . . . . . . . . 31 Functional diagram of MB86680B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Block diagram of Fujitsu ATM switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
General structure of the architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Connection of basic system components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Data flow in system with a) separate utility nodes and b) utility nodes embedded into the peripheral nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Control data flow during a cell loss event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 ATM workstation with ATM circuitry integrated into rnass storage devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Configuration of a live video processing node . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Relation between ATM simulator objects and modelled physical devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Relation between the bus simulator objects and modelied physical devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Uninterrupted transfer delays as recorded by the video node ... . . . . . . . . 72 Average teleconferencing delays as experienced by the video node for DMA size of a) 256 bytes b) 512 bytes . . . . . . . . . . . . . . . . . . . . . . . . . 73 Maximum teleconferencing delays as experienced by the video node for DMA size of a) 256 bytes b) 512 bytes .... . . . . . . . . . . . . . . . . . . . . . 75
a) Average and b) maximum teleconferencing delays as experienced by the memory node with average paging .................. 77 a) Average and b) maximum teleconferencing delays as experienced by the rnemory node with high paging rate ................. 78
4.8 a) Average and b) maximum teleconferencing delays as experienced by the rnernory node with average paging rate . . . . . . . . . . . . . 80
4.9 Cell drop rates for average paging load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.10 a) Average and b) maximum teleconferencing delays as
experienced by the memory node with high paging rate . . . . . . . . . . . . . . . . . 83 4.1 1 Cell drop rates for high paging load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
xii
List of Tables
2.1 Bandwidths of typical video streams with acceptable MOS values . . . . . . . 12 2.2 Bandwidths of typical documents with acceptable MOS values . . . . . . . . . . 13
Glossary of Terms
AAL
A m
ATM
AVI
CBR
CLP
CODEC
CPU
DAN
DMA
DSP
FDDI
FIQ
GFC
GNU
HDTV
HEC
ISO
JPEG
Kbps
~ P S
LAN
Mbps
MBps
MAN
MOS
ATM Adaptation Layer
Advanced RISC Machine
Asynchronous Transfer Mode
Audio Visual Interleaved (audiolvideo compression standard)
Constant Bit Rate
Cell Loss Priority
Coder 1 Decoder
Central Processing Unit
Desk Area Network
Direct Memory Access
Digital Signal Processor
Fiber Distributed Data Interface
Fast Interrupt Request
Generic Flow Control
GNU1s Not Unix!
High Definition Television
Header Error Controi
International Organization for Standardization
Joint Photographie Experts Group (image compression standard)
Kilobit per second
Kilobyte per second
Local Area Network
Megabit per second
Megabyte per second
Metropolitan Area Network
Mean Opinion Score
MPEG
NNI
NTSC
OS1
PCI
PT
RAM
RISC
ROM
SCSI
SONET
UNI
VBR
VCI
VPI
WAN
www
Moving Picture Experts Group (video compression standard)
Network - Network Interface (ATM standard)
National Television Standards Cornmittee
Open Systems Interconnection
Personal Computer Interconnect
Payload Type
Random Access Memory
Reduced Instruction Set Computer
Read Only Memory
Srnall Computer Systems Interco~ect
S ynctuonous Opticai Network
User - Network Intedace (ATM standard)
Variable Bit Rate
Virtual Channel identifier
Virtual Path Identi fier
Wide Area Network
World Wide Web
Chapter 1
Introduction
Motivation
In the last few years an ever increasing demand for world-wide computer connectivity and
multimedia representation of information has been observed. Multimedia products for
microcomputers, like educational and entertainment software, training tools, and operating
system environments, allow easier and more natural interaction with the machines. Adding
music, voice, colour images, animation, and video to the textual information can greatly improve
the richness of available data. The shift towards multimedia representation of information was
made possible by fa1 ling costs of powerful microprocessors, improvements in cost and
performance of optical storage media (CD-ROM), and advances in digital signal processing
(DSP boards, video acceierators, sound and music boards).
At the same time we experience a tremendoiis growth in the use of global and local computer
networks. Many people realize now that access to global information is essential in successful
research, business activities, and persona1 development. The 1995 World Almanac [9] says that
" 15 million people in the U.S. and 25 million world-wide access Intemet regularly." Computer
networks being a huge global source of data contain increasingly large amounts of multimedia
information. Some of the most popular applications which make use of multimedia include
teleconferencing, video on demand delivered through the network, WWW browsers, remote
imaging tools in medicine, and Internet telephone and radio.
Combining the drive towards global connectivity with multimedia representation of data offers
new challenges in the design of computer systems and networks. In the design of modem
networks, one of the main objectives is to minimize the delays and the probability of information
loss in handling high volumes of multimedia trafic. Asynchronous Transfer Mode (ATM)
networks offer the required characteristics for low latency, high bandwidth communications; they
are beginning to replace the existing networks which were not designed witli such high demands
in mind.
On the other hand. computer hardware designers must ensure that the information exchange
between the network and connected workstations is handled eniciently. AIthough well suited to
locally handle real-time video, graphies, and sound, existing computer systems do not provide
adcquate interface to the high bandwidth networks. A new type of computer architecture is
needed to seamlessly integrate multimedia workstations with very high-speed networks and to
allow real-time processing of high-bandwidth information streams.
1.2 Objectives
Multimedia computer systems usually consist of a number of multimedia peripherals comected
to the processor, mernory, and storage devices using a system bus. They also provide a
connection to the external networks through a single centralized network interface. A typical
configuration of such workstation is illustrated in Figure 1 . la.
There are two major problems with this configuration. The system bus has a fixed bandwidth
shared among al1 connected devices. The multimedia information utilizes a large percentage of
this shared resource. If multimedia loads are allowed to occupy al1 available system bus
bandwidth, then the information exchanged over the bus by oiher devices will expenence
significant delays. To avoid those extensive delays, the bandwidth allocated to multimedia data
has to be lirnited. resulting in decreased quality of multimedia reception. Therefore, the system
bus does not allow efficient coexistence of both multimedia and non-multimedia data trafic.
The second problem with the system bus configuration invoives the transmission of high
bandwidth data between the intemal workstation components and the extemal network. The
devices currently used as network interfaces are not fast enough to support a continuous high
bandwidth multimedia traffic. This in turn limits the features and characteristics of distributed
multimedia applications, such as teleconferencing. Teleconferencing can only offer low frame
rate and small resolution of transmitted pictures in order to comply with the maximum
throughput restrictions of the curent network interfaces.
This thesis proposes an architecture for a multimedia computer system which offers a solution to
two speci fic performance boa1 enecks involving transfers of high bandwidth information streams
like multimedia and teleconferencing. The first bottleneck exists between the internal
components of a computer system and the external rietwork, the second one among the intemal
components themselves. The proposed architecture replaces a system bus with an ATM switch as
depicted on Figure 1.1 b. The switch, due to its crossbar structure, allows simultaneous exchange
of data through cmnections established between any pair of peripherals. Since the extemal
network is viewed as one of the system peripherals, the need for a separate network interface is
eliminated. The projected wide market acceptance of ATM influenced the choice of ATM as the
protocol for the switch.
A system using an ATM switch as an interconnect for the intemal components has many
advantages over a bus-based system. It allows a high level of connectivity between the system
components and the extemal ATM network and offers a high degree of parallelism in system
internal data flow. It also reduces the management and cmtrol burden of the CPU because
information transfers occur through pre-established connections not requiring the CPU
involvement. The scalability of bandwidth and number of component connections in the
switch-based interconnect surpasses that of a bus.
This thesis offers a comparison between the new architecture and a generic bus-based system to
evaluate the practicality and competitiveness of the proposed design. A computer simulation was
created to show the delays and the available bandwidth for both types of interconnect when
subject to typical multimedia and teleconferencing data streams. The thesis also provides insight
on how the propowd architecture compares with other novel mu1 timedia system architectures.
1.3 Outline
The second chapter of this thesis contains the background information needed to fùlly understand
the trade-offs involved in the design of the proposed architecture. This chapter introduces basic
multimedia concepts, surnrnarizes the ATM protocol, descnbes the features of the ATM switch
DISK MASS NETWORK STORAGE INTERFACE
SOUND VIDE0 SUBSYSTEhl SL'BSYSTEhl
1 SYSTEhl BUS
I E X E R S A L I SETWORK
I I I I DISK MASS ' STORAGE I
t I 1 L I
I
I SUBSYSTEM SUBSYSTEM I I
Figure 1.1 Multimedia workstation architectures (a) using a bus, (b) using an ATM
switch as an interconnect
used as an interconnect, and outlines the features of other architectures intended to efficiently
deal with multimedia and other high bandwidth data streams.
The details of the new architecture are presented in Chapter 3. The chapter begins with the main
objectives of the design, followed by the description of its general structure and implementation
details of individual components. The insight on how this architecture may influence system
performance is also discussed.
Chapter 4 presents the simulation environment used to compare a typicai bus system, loosely
based on the PCI standard, with the ATM system proposed in this thesis. The chapter outlines the
assumptions and simplifications made in the simulation and gives the results of the simulation
runs. This chapter also offers an analysis of the results obtained. Appendix A contains the
tabulated numerical results from al1 simulation nins.
The final chapter of the thesis points out the advantages and drawbacks of the proposed
architecture in light of the simulation results. I t also suggests possible research directions to
further veriw the usefulness of the new architecture.
Chapter 2
Background
2.1 Introduction
This chapter provides a shon review of the material forming the background of this thesis. It
assumes that the reader is already familiar with the ideas, technologies, and vocabulary in the
related fields. It is not intended to give an exhaustive coverage of the rnatenal, but to provide a
context for the work presented in this thesis.
The second section of this chapter gives a definition of multimedia used in the context of a
computer systems, as opposed to the one used in a general sense. It describes necessary
properties that allow the information streams to be classified as multimedia, and it gives practical
examples of some multimedia streams. It also outlines components that usually form a
multimedia workstation as well as their required physical properties such as bandwidth and
acceptable delays. These are the properties needed for seamless and acceptable presentation of
the information to the end user.
The multimedia workstation architecture introduced in this thesis has an ATM (Asynchronous
Transfer Mode) switch as its central component. Section three of this chapter briefly outlines the
main concepts behind ATM. The outline concentrates on the aspects that directly relate to the
thesis, namely the ce11 structure and the rnechanics of ce11 transfer and processing.
Section four presents previous research that deals with multimedia computer architectures. Three
different approaches with their advantages and shortcomings invoiving cost, performance. and
scalability are described. This thesis proposes an architecture which, while incorporating the
successfûl features of the above mentioned approaches, avoids their bottlenecks.
The chapter concludes with a section describing the operating features of a Fujitsu MB86680B
ATM switch which was selected as the interconnect for the thesis project. and as a mode1 for
performance simulation.
2.2 Multimedia
2.2.1 Th e deflnitiun
In order to establish the requirements for the multimedia architecture, the definition and
characteristics ~Tmuitimedia information should be considered first. According to the Amencan
Heritage Dictionary of English Language. multimedia is "the combined use of several media,
such as movies, slides. music. lighting, especially for the purpose of education and
entertainment". However, while describing multimedia as a mix of at least two different media,
this definition is not precise enough for the context of computer architecture and must be fùrther
refined.
Multimedia in a computer context entails the following four properties:
1. combination of several media
2. independence of combined media streams
3. computer integration
4. communication.
In computer system context, the information must contain several continuous and discrete media
to be called multimedia. Independence of media streams entails that their sources be independent
of each other. As an example, the text and video corne from independent sources (coniputer and
video camera), while sound and video from the same teleconferencing equipment do not (the
same video camera or recorder).
Computer integration suggests that al1 independent media are seamlessly conbined, controlled,
and processed by a single computer system. This system must aiso give the end user control over
al1 media strearns with the same or equivalent iùnctionality . The communication property entails
that multimedia information be easily exchanged between various multimedia systems and easily
transported over computer networks. The need comes from the observation that more and more
various media streams are stored and available on global networks. Based on those important
requirements, Steinrnetz and Nahrstedt [35] propose the following definition of a multimedia
computer system:
A multimedia system is characterised by computer controlled, integrated production, manipulation, presentation, storage, and communication of independent information, which is encoded at least through a continuous (time dependent) and a discrete (time independent) medium.
2.2.2 Multimedia modes
For each independent medium in a multimedia information stream, one must specify precise
timing rules to ensure acceptable quality when the infomation is presented to the end user. Each
type of medium can be assigned to one of the three transmission modes: asynchronous,
synchronous, and isochronous, depending on its timing constraints. Those constraints must be
taken into account when developing a multimedia computer architecture.
Asynchronous transmission does not enforce any time constraints on the flow of infomation.
Information is delivered on a "best effort" basis with no tirne guarantees. Examples of
asynchronous media 3re e-mail messages and files to be retneved in a non-interactive fashion.
Synchronous transmission dictates a maximum delay for each unit of information in a stream.
Such a limit is essential dunng the delivery of documents in an interactive user mode or in
transmission of video h m e s in real time on systems with ample intermediate storage. The video
frames which arrive too early are buffered in the intermediate storage so they can be presented to
the end user in a proper order. The buffenng eliminates the need to enforce minimum delays on
the information stream. The last group, the isochronous media, specifies both the minimum and
maximum delay for each media unit. The teleconferencing stream on a systern with a small
amount of intermediate storage would quali& as an isochronous information stream. The
isochronous transmission is also referred to as real-time transmission.
2* 2* 3 Multimedia workstation components
In view of the requirements imposed on a multimedia system by various media streams, a typical
system should contain at least the following components [29]:
general-purpose microprocessor
large primary storage (mrrnory)
huge permanent secondary storage (disks)
dedicated media processor
graphics, video, and sound equipment
communication adapters
The general-purpose processor is responsibie for the system management, operating system
tasks, and for standard (non multimedia) data processing. The primary stonge is needed for
processing, copying, and temporary storage of multimedia information; multimedia objects tend
to be of very large size. The secondary storage in the fom of hard disks. disk arrays, and read
only and rewritable optical disks, provides permanent archiving and distribution of multimedia
data.
The need for real-time processing of isochronous data which imposes a severe constraint on
processing times, entails the use of secondary, dedicated real-time processors (DSPs) to ensure
delay and delay jitter guarantees. Finally, the communication adapters are needed to provide
connectivity to extemal networks at the speeds required by the time-critical multimedia
applications. Multimedia information is displayed and presented to the end user by means of the
audio and video adapters, speakers, monitors, and other multimedia input/output equipment.
2.2.4 Time constraints of multimedia traffic
In order to quantiQ the time values of the multimedia traffic constraints, one should define what
initial criteria this trafic has to fulfill. Since al1 multimedia traffic eventually reaches the end
user of the cornputer system, the end user should provide some way of expressing the perception
of quality of the received information. A properly designed system should give a naturai feei
when working with information; the user should no! perceive any degradations such as glitches,
intolerable delays in teleconferencing, or lack of voice and video synchronization.
The industry criterion for estirnating user perception is known as a Mean Opinion Score value
(MOS). It is intended to give an "objective comparison of subjective testing. such as user's
perceived quality of network delay" [32]. The MOS varies in range from O to 5.0, where O
signifies the worst possible perception and 5.0 a perfectly natural perception of multimedia data.
Various delays, delay jitter. synchronization errors. and bit rate errors (ce11 loss rate) directly
influence the value of MOS. For example, in teleconferencing, the larger the delay jitter
introduced by the network and multimedia computer system. the more intempted and unnatural
the conversation between the end users will be. Hence, larger delay jitter contributes to the
reduction of the MOS.
Radhika Roy summanzes the multimedia system requirements needed to achieve acceptable
MOS values in [32]. He also discusses the minimum MOS required for a natural audio and video
communication. The MOS for audio should be between 4.0 and 5.0. At 3.5 conversation is still
possible. albeit with easil y detectable sound degradation. Human perception of video is more
forgiving. with an acceptable MOS value as low as 3.5. The one way end-to-end delay of an
audiohide0 stream should be no more than 150 ms to satisfy the above mentioned MOS, with
300 ms as a two way (return) end-to-end delay. The value of the delay jitter should be as low as
possible, but values round 250 ps are quite acceptable. Inter-media synchronization delays, for
example lip-synchronization errors, should fa11 in the range of -20 to +40 ms to be unnoticeable
to the end user. Intemptions in the receipt of continuous multimedia streams like video should
be minimal, implying at most one interruption (one ce11 lost) in about 40 minutes. Roy's
summary also quantifies acceptable system response time lirnits for user access in an interactive
mode, such as browsing or retrieving documents. The host system should provide a response to a
request in about 1-2 seconds for document retrieval, and about 0.5 second for browsing.
Roy provides detailed bandwidth figures for vanous teleconferencing configurations with their
appropriate MOS values. Selected system parameters that satisS the acceptable MOS scores are
presented in Tables 2.1 and 2.2. This data will be used in the performance simulation to speciQ
the bandwidths of comrnon mu1 timedia streams.
2.3 Asynchronous Transfer Mode (ATM)
2.3.1 Description
ATM is a method of transporting, switching, and multiplexing information over networks
[2, 5,341. Its features allow the high level of fiexibility needed to deal with the variety of
information types (both multimedia and traditional) exchangeci over the networks. ATM is
considered a connection-onented network protocol. but due to its flexibility it supports both
connection-oriented and connectionless services, as we!l as constant and variable bit rate
operations (CBR and VBR). The ATM protocol was designed to provide a good transport of
services like voice, data. still images. video, multimedia, and real-time information over a single
type of network, thus eliminating the need for separate, proprietary overlay networks for each of
these services.
ATM is confined to the upper half of layer 1. basic functions of layer 2, and parts of network and
transport layers of the OS1 (Open Systems Interconnection) network architecture model
developed by the International Organization for Standardization (ISO). The structure of the OS1
Reference Model is presented in Figure 2.1. ATM consists of two layers: the ATM layer and the
ATM adaptation layer (AAL). The ATM layer is common to al1 the services, while the AAL is
service dependent. The AAL adopts the information received fiom higher levels of the model to
the requirements of the ATM layer.
To transpon information over the network, ATM uses fixed size packets called cells. Each ce11 is
a collection of 53 octets, with 5 octets comprising a header. and the remaining 48 octets foming
the payload as depicted in Figure 2.2. The actual payload is usually reduced to 44 octets due to
the segmentation and reassembly control information added by the ATM adaptation layer.
ATM uses labelled chamel multiplexing to implernent addressing. Each ce11 header contains a
label called a comection identifier, which in turn consists of two subfields: VCI (virtual chamel
identifier) and VPI (virtual path identifiers). The VCI and VPI uniquely identiQ the destination
Vidco Data Dcscnption
Framcs per sccond Bits pcr samplc
Staridard TV quality, 720x480 pixels,
intcrlaccd. MPEG- I
Cablc TV, 360x480, non-intcrlaced,
MPEG-1
Low rate vidco-confcrcncing,
360x240, non-intcrlaccd,
MPEG-1
Low rat0 vidco-confcrcncing,
360x240, non-intcrlaccd,
MPEG-I
- C o n s t a n t bit r a t e
-Variable b i t rate
Transmission modc
CBR'
CBR '
Comprcsscd bit rates in MbiVs
Table 2.1 Bandwidths of typical video streams with acceptable MOS values
ASCII tcxt, 8 . 5 " ~ i I " pagc
Data type Uncornpresscd objcct sizc (Mbit)
8 . 5 " ~ l l n C O I O U T page, 200
pixcldinch, 24 bi ts/pixcl
8 . 5 " ~ l 1 " colour pagc, 400
p ixc lhch , 24 bitsipixel
8 . 5 " ~ l 1 " colour page, 800
pixcldinch, 24 bitsipixcl
Pcak bandwidth rcquiremcnts for objcct browsing
(Mbit/s)
8 . 5 " ~ 1 1 " colour pagc, 1600
pixcls/inch, 24 birs/pixcl
Table 2.2 Bandwid ths of typical documents with acceptable MOS values
Typical achicvablc compression ratio
90
359
1436
Pcak bandwidth requircments for objcct rctrieval
(Mbitk)
5744
10-20
10-20
10-20
4.5-2.3
18-9
72-36
10-20 287-144
ORIGINATING INTERMEDIATE DESTINATION NODE NODE NODE
Application laycr
Presentation layer
Data link Iayer + w Data link layer 4 Data link layer
4 4
Transport layer
I r
Physical layer 4 + Physicailayer 4 Physicül iayer
Application laycr
4 ~ ~ u i v a l l ,
1 Physical communication path 1
Presentation layer
b I 1
Figure 2.1 OS1 reference mode1
Transport layer
Cell header
1 AAL header
Ce11 payload
1 AAL trailer
Figure 2.2 ATM ce11 structure
O Cell octets
of the cell. Since ATM serves only a role of a transport rnechanism, it can support a wide range
of both present and future network protocols such as Token Ring, Ethernet, or Fast Ethernet. The
ATM, in combination with SONET (the physical layer protocol), achieves a high bandwidth
efficiency of round 80% (both layen 1 and 2 of the OS1 model), which compares very well with
other packet switching systems like FDDI (80% efficiency for the physical layer lone).
2.3.2 The A TM cell format
The 53 octets of an ATM cell are divided into two parts: the cell header and payload. The 5 octet
header contains the control information of the ATM layer. The header is further subdivided into
six fields. The structure of the ce11 header is presented in Figure 2.3.
The UNI 3.0 (User-Network Interface) and the NNI (Network-Network Interface) standards
developed by the ATM Forum speciQ the use of each field:
GFC: Generic Flow Control. This field can be used by the ATM customer to implement flow control or other local functions. Therefore. GFC has only local significance, and it is overwritten by ATM switches on the network side (public ATM switches). This field does not exist in the N N I standard.
VPINCI: Virtual Path IdentifierNiflua1 Channel Identifier. Boih fields uniquely identib the destination of the ceII and are used for routing the cell through the ATM network. The size of the VCI and VPI varies and is negotiated between the user and the network. The size of these fields is different in NNI and UNI standards.
PT: Payload Type. This tield indicates whether the ce11 carries user data or management and control information. Network congestion information c m also be encoded here.
CLP: Cell Loss Prionty. This single bit field indicates the priority of a cell when congestion occurs and the network must discard some cells. A value of "O" marks the cell as being of higher priority. so it will be discarded only after al1 cells with a CLP of" 1" have been discarded.
HEC: Header Error Control. The physicai layer uses this field for detection and correction of errors that affect the ce11 header.
2@3@3 Advantages of A TM
The characteristics of ATM make it a good choice for integrating different services like voice,
data, and multimedia over a single network. This allows significant cost reduction in operation
Octet bits
GFC
VCI
VCI
~ HEC
VCI
Cell octets
Figure 2.3 Fields in ATM cell header as defined by UNI 3.0
PT CLP
and maintenance of the network, since there is a single network with a single standardized
interface for al1 services. In addition. the flexibility of ATM enables the introduction and
implementation of new services with yet unknown characteristics, with minor or no
modifications. Since ATM multiplexes and switches cells asynchronously, the allocation of
network bandwidth is much more flexible than with synchronous protocols. Asynchronous
multiplexing is possible because ATM is connection-oriented and because each ATM ce11
coritains the VPINCI fields with the information sufficient to reach its destination.
The ATM ceIl size was chosen as a trade-off between the small voice packets used in telephony
today and large packets preferred in data transmission in the cornputer communication. Fixing
the size of a ceIl greatly increases the speed of multiplexing and switching, and reduces the
complexity of the buffen and queue management in the switching nodes. It is also much easier
and cheaper to implement switching and multiplexing of tixed size rather than variable size data
units. The fixed size of the header adds to the speed of the ATM switches, since only the header
is processed and the payload will follow its header afler procrssing. Unfortunateiy. the ce11
headers add 9.43% overhead to the useful user information. The speed and flexibility of the
ATM equipment translates into smaller delay and delay jitter in cornparison to other packet
switching technologies.
2.4 Multimedia Workstation Architectures
2.4.1 Introduction
This section outlines the most recent research in the field of multimedia computer architectures.
The three projects described here are:
1. VuNet fiom Telemedia Networks and Systems Group at the MIT Laboratory for Computer Science
2. Netstation fiom University of Southem California/Information Sciences Institute
3. Desk Area Network (DAN) fiorn University of Cambridge Computer Laboratory.
Each description is divided into three subsections. The first subsection outlines the major
architecture objectives, while the second one gives the details of the implementation, the test
environments, and some test results. A discussion of the advantages and the shortcomings of the
architecture forrns the third subsection.
2.4.2 VuNet
2.4.2.1 Architecture
The first workstation architecture [1] reviewed here is a network of general-purpose computers
and various shared multimedia peripherals as seen in Figure 2.4. When augmented with the
mu1 timedia peri pherals, the general-purpose machines become rnul timedia workstations capable
of displaying and processing high bandwidth data strearns. While presenting a new way of
working with multimedia data, VuNct does not alter the interna1 structure of the general-purpose
machines.
The designers of VuNet have chosen ATM as the network connecting the computers with the
multimedia devices and other networks. The multimedia devices like video compressors and
decornpressors, cameras, and displays, f o m separate nodes of this ATM network and connect to
i t directly through their own interfaces as depicted in Figure 2.4. The main objective in the
design of this system is to go beyond the ability of current systems, which only store or display
multimedia data, and to allow real-time processing of multimedia by general-purpose computers.
The applications running on the VuNet computers are able to directly receive the multimedia
data and perfom various tasks, such as stationary filtering, motion detection, and edge filtering.
The VuNet computers do not have any built-in multimedia devices. The creators of VuNet argue
that workstations using built-in multimedia components have many more disadvantages and
limitations when cornpared to the "universal" shared peripheral devices proposed in the VuNet
project. Since the built-in components are connected to the computer VO bus, the high bandwidth
multimedia trafic significantly reduces the bus bandwidth available for other data traffic. They
are usually designed for a single application, like teleconferencing, and are tied to one particular
hardware platform. On the other hmd, the VuNet peripherals, hooked up directly to the gigabit
ATM network, are universal and platfonn independent.
Comprc.uor and
Decsmprc?*or naûc
Gmcnl -purpc compuicr
ATM nierf fa cc
Vidco Cameri nodc Lidca Display
nodc
ATM intcrfacc
Mul~iproccuor nodc
Figure 2.4 VuNet multimedia architecture
In order to make their system flexible and portable, VuNet developers utilized a
sofhare-intensive approach. They simplified the network hardware as much as possible and
shifled most of the processing, including media data processing. network protocol tasks, and
network control, to the processors of the general-purpose cornputers. Complex functionali ty of
the architecture is then developed in software, which can be altered and augmented without any
hardware changes. Due to such software approach, VuNet can be easily ported to other hardware
platforms and does not require any changes of the VuNet system itself. However. a
software-intensive implementation cames with itself some performance penalties, namely
throughput ioss and speed degradation. The developers' tests show that even with those penalties,
the performance of VuNet is sti I l adequate.
Since different types of multimedia information are reaching the application level in the
workstation, the system must provide transparent data handling for al1 those types. The
applications must be provided with similar interfaces to vanous data types and must be able to
handle them in a similar manner. To implement transparent data handling the VuNet system must
provide a graceiùl scaling and degradation of multimedia traffic in situation when the load
reaches high levels. To achieve that, VuNet has the ability to control the sources of data and can
adjust their data rates and burstiness on-the-fly to react to the changes in the system load and
available bandwidth.
As mentioned above, VuNet introduces "universal" peri pherals which are connected direct1 y to
the ATM fabric and not to the I/O bus of the computer. This allows sharing of those peripherals
among al1 computer clients and eliminates the need for hardware platform-dependent penpherals.
There is also no need for re-design when new types of cornputers are connected to the system.
Each of the "universal" penpherals constitutes a single node in the VuNet architecture and has a
single function which it can perform with high efficiency. To implement a more complex
functionality, those peripherals are chained across the ATM intercomect.
2.4.2.2 lrnplementation details
Al1 elements of the VuNet environment are c o ~ e c t e d together using ATM switches and links
arranged in some configuration, such as a star, a ring, or other. The ATM switches together with
the peripherals and computers attached to their ports form the nodes of the architecture. The
switches contain four 700-Mbitls ports with 64 ce11 input and 256 ce11 output buffers. Each
switch has a 32- or 64-bit host interface. The links connecting al1 nodes of the VuNet provide
500-Mbit/s bandwidth. The ATM ce11 header is rnodified for the purpose of this project by
adding three bytes, resulting in a single ce11 of 56 bytes. The ce11 then corresponds to seven
aligned read operations on a 64-bit host interface bus.
The interconnect implements only the basic fùnctionality of transporting and routing the cells
along established connections. More advanced functions like congestion control and rnulticasting
have been moved to the client workstations and implemented in software. This simplifies the
hardware of the architecture, but may also result in severely degraded performance if such
advanced functions need to be provided. Connection setup and al1 network management control
are implemented on the sarne interconnect as data transfers. The hosts establish and terminate
connections by sending control cells to update the header lookup tables in the switches. The
control cells are also used to detemine the topology of the network by means of querying al1
nodes and links of the VuNet system. Again. al1 those functions are implemented in software
with hardware providing a transparent data transport.
To estimate the performance of the architecture, a test environment was build, which included an
Alpha workstation and a video camera node. The carnen node generates a Stream of data which
is sent over the ATM interco~ect to the workstation, where the data is reassembled fiom cells
and processed by the host processor. It was found that both the sending and the receiving
switches can communicate with their hosts only at 230 Mbitk due to the bus arbitration times
and bus gant latencies (the workstation used a TURBOchannel bus). However, the memory of
the host systems cm only receive data continuously at the rate of 65.5 Mbit/s. After adding the
time needed by the host to reassemble the data fiom the cells, the final maximum sustained
throughput directly to the application level is 42.2 Mbit/s. In most cases the workstation is the
performance limiting component. Only in the case of high colour video is the camera node the
bottleneck due to the computational complexity of the colour mapping algorithm used.
2.4.2.3 Critique
The VuNet architecture, though improving the prcsent stîte of multimedia processing, has some
drawbacks. First, the introduction of centralized peripherals might create congestion and delays
in the VuNet system. If there were more workstations using the system (the test environment
included only a single one), their access to a particular peripheral would have to be shared. To
avoid long access delays, each shared peripheral should provide high processing speed and
efficiency, thus increasing its cost. In many cases it would be more economical to provide
inexpensive built-in multimedia peripherals for each computer, thus reducing the VuNet traffic.
In addition, since al1 data to be processed in the "universal" peripherals has to be transported to
them and back to the computer through VuNet, the data transfers consume the VuNet bandwidth
available for other tasks and workstations. A similar situation arises when a number of simple
peripherals is combined to perform complex tasks, like chaining. The information has to traverse
the whole chain of requested peripherals, each time consuming the bandwidth of the network.
Dcspite the above rnentioned drawbacks. the idea of "universal" penpherals is very appealing. I t
would be cost effective to have devices with standard ATM interfaces, therefore independent of
the hardware platform.
Future applications will be openting on the multimedia streams, rather than just presenting them
to the user. Therefore, the workstation architecture should permit those applications
high-bandwidth access to multimedia data. However, the software-intensive approach suggested
by the MIT lab researchen offers only a medium-bandwidth solution. Delegating so many tasks
to the workstation host processor burdens it to the point that the processor becomes the
bottleneck of the system. Ce11 reassernbly, multicasting, protocol management, al1 executed in
software, significantly reduce CPU time to do regular data operations. Test results show that the
video fiames are transported to the application level at only 47 Mbit/s, while the host interface
can send that data at 230 Mbit/s. In the case of a single stream of data this bandwidth is
sufficient, but if more are present, the bandwidth can degrade even fûrther.
The increased size of the ATM ce11 to 56 bytes, although convenient from the point of view of
the host interface transfers, has two major disadvantages. First, it strays from the standard,
making the system incompatible with mainstream ATM equipment. This nses the cost of the
systern implementation in decreases the network interoperability. Second, additional three empty
bytes have to be sent over the VuNet network. This increases the total ce11 delivery time and the
consumption of the link bandwidth. The operation of padding to the required aligned access
width of the host interface could be done locally, without sending the extra bytes through the
VuNet switching fabric.
The control mechanisrn of the VuNet system seems very economical. Since the same path is used
for sending data and control information to the switches, links, and penpherals, there is no need
for a separate control interface.
2.4.3.1 Architecture
The designers of this architecture from the Information Sciences Institute at University of
Southem California, view a workstation as a group of CO-operatively c o ~ e c t e d subsystems [ 1 O].
They propose a LAN interconnect, internal to the workstation, connecting those peripheral
subsystems as seen in Figure 2.5. Netstation can be viewed as a heterogeneous message passing
rnulticomputer, with nodes communicating with each other by established point-to-point
connections over the intemal LAN. Each Netstation node, a so called MOSAIC node, contains an
internal LAN interface, a network protocol processor with its rnemory, and an interface to the
attached peripheral. The diagram of the MOSAIC node is presented in Figure 2.6. Network
interface in each node is ver/ small, with the ability to execute the network protocols at a speed
at least matching the speed at which the associated device can send or receive data.
To make al1 the intemal LAN components easily accessible from outside of the workstation, the
extemal and intemal LANs of such system are link-layer compatible. The only differences
between the extemal and intemal LANs are the latency of cornponent access and the secunty
access privileges. Such architecture allows a very fast communication between the workstations
c o ~ e c t e d to the external LANs, and is therefore ideally suited for distributed,
bandwidth-intensive applications.
TO OTHER LIOSAIC NODES OR
WTERXAL wrwouIi;s
TO OTiIER UOS,\IC SODES OR
EYTERNAL NETWORKS
TO 0-ïtIER SIOSAIC SODES OR
EXTERML NETWURKS
TO OTIIER MOShlC SODES OR
EXTERSAL SETWORKS
Figure 2.5 Netstation interna1 LAN
PERIPHERAL ASSOCIATED
WITH THE NODE
PACKET INTERFACE
ASYNCHRONOUS ROUTER _c__I,
Figure 2.6 Netstation MOSAXC node
Using a point-to-point interconnect rather than the system wide bus has many advantages. Only a
single device can be a bus master at one given time, while point-to-point connections use
dedicated channels to transport information. The number of charnels that can be open
simultaneously is limited only by the number of paths that can be routed through the interconnect
switching fabric. Due to transmission line effects, a bus is restricted in size and in the number of
devices comected to it, while the same restrictions are greatly reduced in a point-to-point
connection with only a sender and a receiver present on the line. The aggregate bandwidth of a
bus is constant. Hence, adding more devices reduces the bandwidth per device when the bus is
highly utilized. In a point-to-point system, added devices increase the total available bandwidth
of the interconnect. The connection of a new node to the Netstation increases the amount of
routes available through the interconnect, which at the same time increases the number of
point-to-point data transfer channels. Since each channel has a constant bandwidth associated
with it, the bandwidth of the newly introduced channels adds to the aggregate bandwidth of the
interconnect. The proponents of the Netstation architecture suggest the replacement of the
intemal system bus with a point-to-point intemal LAN that connects al1 peripherals, external
networks, and system processors. However, they argue that such interconnect should not be used
for processor-memory traffic.
Today's workstations contain a single, distinct network interface and a system wide bus for data
delivery to the attached peripherals. Although such setup works well with both local applications
and those requiring srna11 and medium external network bandwidth, it creates a bottleneck at
access speeds approaching gigabits per second. Currently, the path of the data coming From the
extemal LAN involves a number of copying and processing operations. First, on amval of the
packet, the network interface copies the received packet to the memory for protocol processing.
The processor then executes suitable protocols to extract the application data From the packet. In
addition, the data is fûrther copied From the kernel space to the application space. Such
operations require a lot of intemal in terco~ect bandwidth, especially with the fiequency of
incoming data reaching gigabits per second. To avoid burdening the processor and the internai
interconnect, the Netstation system delegates network information processing to the individual
devices that produce and receive network traffic. This can reduce the minimum speed
requirements for the system processor, the memory, and the interconnect.
The intemal LAN must offer very low latency and high reliability to be practical for connecting
workstation peripherals. In order to achieve this goal, the LAN has a very fast routing
mechanism that introduces minimum overhead while sending packets through the intermediate
nodes. The packet size is variable and appropriate for each device to rninimize the processing
overhead (Le. video frame sized packets for teleconferencing equipment). The choice of the
packet size reduces the time of packet assembly and reassembly. Since the extemal and intemal
LANs are link-layer compatible. no gateways or translations are necessary at their boundary.
2.4.3.2 Im~lementation details
There are currently two types of nodes developed for the purpose of the Netstation architecture:
the processing node (MOSAIC-C) and the peripheral interface node. The MOSAIC-C nodes
contain 14 MIPS rnicroprocessor. 64K of M M , 2K of ROM. as well as eight channels of 0.64
Gbit/s each connecting the node to the interna1 LAN. The MOSAIC interface chips are used for
connecting the peripheral devices to the internai LAN. In addition to the elements available in the
MOSAIC-C nodes, the interface chips have 128K of extemal dual ported RAM and extemal
peripheral bus. Both types of nodes can be prognmmed through the incoming channels to
perform di fferent network processing protocols, or to change the characteristics and objectives of
the associated device. The Asynchronous Router redirects the packets not dest ined for its node
using cut-through technique to minimize the routing delays. This operation is perfomed without
internipting the node processor. When a node is a destination for the incoming packet, data is
sent through the DMA chamel to the packet interface. Data is fùrther processed by the node
processor which has the ability to filter messages, execute appropnate protocols, and arrange data
to be presented to the application layers or to the associated peripheral device. The propagation
time through the node is 12.5 ns, with 25 ns for the routing decision to be made.
A typical data transfer on the Netstation LAN involves three stages: the data channel
establishment, the information transfer, and the chamel termination. During the first stage, the
source MOSAIC node sends a message header to the destination node indicting the beginning of
the information transfer. As the header passes through the intermediate MOSAIC nodes, it
allocates a data transfer chamel in each node. If a particular link behveen the nodes is busy, the
header is stopped. Only afier the iink is freed, is the header allowed to advance to the next node.
When it reaches the final MOSAIC node, the whole data transfer chamel has already been
allocated. The information transfer starts and continues until a message teminating the channel
reaches the destination node. When this message traverses the intermediate nodes, it frees a11
previously allocated data channe1 links.
2.4.3.3 Critique
The Netstation architecture provides an appealing framework for high bandwidth,
communication intensive multimedia applications. First, the interna1 and extemal LANs'
link-layer equivalence elirninates the bottleneck of the single, centralized extemal network
interface. The information flows without interruption through the boundary of a workstation,
unless secunty or administrative restrictions apply. The intemal devices can be transparently
accessed by both inremal and extemal nodes with appropriate security privileges. The
performance of the interconnect is much higher than that of a system bus, due to c o ~ e c t i o n
based information channels and distributed network protocol processing.
However, there are some drawbacks in the presented solution. The workstation LAN is based on
a c o ~ e c t i o n of nodes with an unspecified, user defined topology. Such configuration introduces
some unpredictability to the system; the distance and the number of intermediate nodes to be
traversed between nodes varies according to their position and the topology. Therefore, chamel
delays may be different for different pairs of communicating nodes. Additionally, in case of
failure of one of the intermediate nodes, some devices may be cut off fiom other workstation
components. Also, since the node chips (MOSAIC chips) have a fixed number of ports. they
cannot form a LAN as easily scaleable as central switch or a crossbar. The third problem
involves bottlenecks in available bandwidth which may anse when the trafic fkom a nurnber of
devices needs to traverse the same node; the increased delay jitter may be the immediate result.
The Netstation uses a proprietary routing mechanism, which decreases the delivery delays and
minimizes the node processor overhead. However, not adhering to the standard network
protocols forces custom network interface implementation, which in turn may increase the
overall cost of the workstation. n i e Netstation proponents argue that making the packet sizes
variable and adjusted to the data unit s i x of the devices or peripherals will reduce the overhead
in packet processing. However, such approach makes the design of hardware suitable to perform
vanous packet operations quite difficult and costly. The success of ATM is mainly due to its
fixed size cells which minimize and simpliQ the processing hardware.
2.4.4 Desk Areo Nehvork (DAN)
2.4.4.1 Architecture
The onginaiors of the concept of DAN. Computer Laboratory at University of Cambridge,
maintain that a highly efficient network communication of workstations can be achieved by a
carefbl design of a workstation architecture. The architecture must a1Iow fast transfers of
information fiom the network interface to the real data consumers. The DAN architecture [14.16]
combines the aspects of both ATM LANs and multiprocessor interconnect networks. Al1
elements connected to the DAN through an ATM switch fabric, which foms a central part of the
interconnect, are considered to be integral parts of the workstation. A typical configuration of a
DAN workstation is presented in Figure 2.7.
In a typical workstation, the functions of the network interface are to demultiplex the data
incoming from the external network and to send it over the interna1 bus to the destination device
inside the workstation (Le. a memory, a secondary storage. or a frame buffer). While sending to
the network, the data from the originating device has to be translated by the network interface to
the form and protocol acceptable by the extemal network. Therefore, the main hnction of such
interface is to translate between the multiplexing techniques used on the workstation bus and the
extemal network. DAN eliminates those translations by using the same data transport mechanism
both inside the workstation as on the network. DAN is totally enclosed within the physical limits
of the workstation. which greatly simplifies the control structure and the protocols on the
workstation interconnect, making them faster and cheaper to implement. On the DAN. the
functions of the network interface are confined to enforcing of the secunty features, to protecting
the access to the workstation cornponents, as well as to the conversion of signalling protocol
between the implementations inside and outside the workstation; the irnplementations of the
same signalling protocol rnay differ since the intemal DAN protocols are simplified to achieve
CPU WITH FIRST AND
SECOND LEVEL CACHES
SYNCHRONIZATION NODE
VIDE0 FRAME STORE \
ATM SWTTCH
MAIN MEMORY
SECONDARY STORAGE
DECOMPRESSION
ATM CAMERA
Figure 2.7 Typical configuration of DAN multimedia workstation
higher speed.
Most of the high level hnctions like queuing and scheduling algorithms for each device
connected to the DAN are delegated to the operating system. Those algorithms are tuned to
maximize performance based on the characteristics of each device. The operating system must be
aware of the devices and their characteristic traffic patterns to avoid congestion and to minimize
delays and contention. Al1 components of the DAN execute the sarne single, distributed
operating system for a flawless and congestion fiee operation. The system can, in fact, be
considered a highly asymmetric multiprocessor with the attached devices forming the nodes of
this machine as seen in Figure 2.7. The following is the sample list of devices which can be
connected to the DAN:
A CPU node: usrd for general-purpose data processing, as well as the control hnctions of the DAN ( switch connections setup, scheduling algorithms, queuing, resource allocation, synchronisation of multiple data streams).
A main memory node: the memory system of the DAN is divided by the switching fabric. The CPU with first and second level caches form one node, while the main memory forms another. The second level cache lines requested from the main memory have to be sent through the ATM switch fabric. The saine is true for the cache lines written back to the main memory.
A secondary storage node: the main data store (or stores) for the DAN. It can consist of a nurnber of disks and CD-ROM'S.
A display node: contains the frarne store for graphics manipulation and windowing fùnctions. as well as the video handling hardware.
A canera node: the source of video data streams. The camera ideally generates ATM cells rather than the traditional frarnes of video information.
A LAN interface node: a very simple node. It implemenis only the routing functions and the secunty features of the system (if required).
An audio node: dedicated to the audio input and output.
A compression/decompression node: shared by al1 devices. Used for conserving the secondary storage (disk compression procedures) and for reducing the network bandwidth (video, audio, graphics compression methods like JPEG. MPEG, etc.).
2.4.4.2 Irn~lernentation details
The Cambridge Laboratory irnplemented a test DAN with a small nurnber of devices attached. A
Fairisle switch fabric [23], created fiorn 4x4 self routing crossbar switching elements, constitutes
the heart of the systern. Each crossbar element has four input and four 8-bit output ports. The
fabric is clocked at 20 MHz achieving a throughput of 160 Mbit/s per port. In order to implernent
the self routing, the &bit routing tag has to precede the ATM cell.
On the device side. the dedicated port controllers provide the interface to the switching fabric
[15]. The functions of the pon controllers also include buffering of the incorning ce11 fiom the
fabnc, shapiiig the traffic going into the fabric, sensing the ce11 losses, and retransrnitting the lost
cells back into the switching fabnc. Each port controller has an ARM RISC processor and an
expansion bus to which the peripherals and devices arc attached. The operation of the DAN
should be transparent to the applications running on the system. Therefore, the switching fabric
must be fully reliable on the hardware level; a ceil rnust reach its intended destination or the
source (in this case the port controllers) must be informed.
Since al1 data paths on the DAN require an established connection, a connection management is
needed. As mentioned earlier, the CPU node is responsible for most of the connection setup
operations, since it has the most processing power. However, there may be other devices capable
of the set-up and the temination of the data path comections. Hence, there are three possible
device types on the DAN: "dumb". intemediate, and "smart". "Dumbtt devices implernent only
the simplest functions for the data management and control in the form of programmable interna1
registers, i.e. an audio recording node. "Smart" devices have substantial processing power and
can manage their own connections as well as establish connections for "dumb" and intermediate
devices. The intermediate devices fa11 somewhere between those two, having a varying arnount
of the processing power and the ability to perform some management hinctions independently.
The architecture proposes a separate control and data paths. Such implementation is a natural
requirement if very low latency of control commands is required. Since the main memory is
separated form the second level cache by the switching fabric, each cache read request has to
occupy a whole ce11 and must be transmitted over the switching fabric. This significantly
increases the caches miss tirnes and degrade the CPU performance.
The expenmental setup of the DAN involves testing the main memory service times of second
level cache misses. The time is measured as the time when the cache miss is detected till the time
the processor resumes execution. The designers tested three different memory server
configurations. The first one implemented memory request handling entirel y in software, the
second one used fast interrupi requests (FIQ) available on the ARM processors. The third one
was irnplemented as a dedicated hardware cornponent. The mean service times varied
significantly from implementation to implementation. The mean service time ranges are
presented below:
Software 373 - 42 1 ps
FIQ 33.1 - 40.7 PS
Hardware 8.4 ps.
For cornpanson, the mean memory service time on a typical workstation is in the order of
hundreds of nanoseconds.
The last architecture reviewed here is closest in concept to the one proposed in this thesis. The
summary of the memory mean service times shows that the separation of the main memory and
the second level cache substantially degndes the performance of the CPU. The service times are
orders of magnitude iarger than those found on a typical workstation utilizing a bus-based
memory system. The software implementation of rnemory handling is not acceptable in a
workstation because of its mean service time reaching hundreds of microseconds. Based on the
DAN expenments, the hardware implementation provides the performance closest to the one
found in typical workstation.
Separating the control and data paths is necessary for low latency access between main memory
and second level cache in this particular configuration. However, it adds to the cost of the final
system since it requires additional wiring, extra interfaces on each device on the DAN, as well as
the development of the control interconnect protocols.
The separation of some multimedia components (compression/decompression node) may lead to
a wasted bandwidth on the DAN when the data streams are passed fiom device to device for
vanous processing tasks. Also, since such components have to be shared, i t introduces contention
and puts more demands on the performance of such shared component.
2.5 MB86680B, ATM Switch Element From Fujitsu
In the heart of the architecture presented in this thesis is an ATM switch. An ATM switch is a
device used to deliver ATM cells amving at its input pons to the appropriate output ports.
Comrnercially available ATM switches [13] designed by AT&T, Fujitsu, IGT. TranSwitch, and
TnQuint were considered for the role of the performance simulation mode1 of an ATM switch.
AAer careful consideration. the MB86680B [26] frorn Fujitsu was selected. The MB86680B is
the only device which processes ATM cells as integral data units, detects the beginning and end
of cells, and switches entire cells rather than generic bit streams. Fujitsu's integrated circuit is a
self-routing design, allowing it to extract the ce11 destination address fiom the ce11 itself; the
switch does not have to be extemally controlled dunng the normal operation. The MB86680Bs
have built-in features for cascading them into various switching fabrics without any additional
hardware; this allows an increase in the number of available input and output pons.
The MB86680B From Fujitsu is implemented on a single piece of silicon. A simplified functional
diagram of the switch is sketched in Figure 2.8. The switch provides four input and four output
ports per chip. In addition to the standard input and output ports, it offers four expansion and
regeneration ports to allow easy implementation of matrix switching fabrics with more than four
input and output pons. Each port on the chip is eight-bits wide and can be clocked at 25 MHz,
providing a total bandwidth of 200 Mbitk per port. Each output port has a buffer with room for
75 cells, which can be further subdivided into high and low priority queues. The high priority
queue consists of 25 cells and the low priority of 50 celis. Additionaily, the switch also supports
multicasting and provides information about the discarded cells and queue overflow events. ï h e
REGENERATION PORTS
INPUT REGISTER it.c_7 NPUT REGISTER +
EXPANSION PORTS
PORTS
Figure 2.8 Functional diagram of MBS6680B
four port interfaces: input, output, expansion, and regeneration can be clocked independentl y for
greater flexibility with the attached devices. The pin-out of the MB86680B is presented in
Figure 2.9.
In order to impiement routing, each incoming cell has to be preceded by a 24-bit routing tag. The
tag contains the address of the destination port on the switch. If the switching fabric foms a
matrix or other configuration with a number of switching elements, the tag specities the route
through the fabnc. The tag is also used to indicted the priority of the incoming cell, and the
multicasting information.
The beginning of a new cell is indicted using a separate start of cell signal (SOC) associated with
each input port. The routing tag of a ce11 arrives before the cell itself and is processed
immediately by the address filter. Based on the information in the tag, the address filter directs
the ceIl to the appropnate output port and to the appropriate high or low priority queue. Each
output port uses a fast input multiplexer to allow receiving of up to three cells simultaneously.
one from the input, one from the expansion, and one from the multicasting interfaces. While
sending cells fiom the output ports, the cells in the high priority queue are always sent before any
cells in the low pnority queue.
Statistics serial daisy chah highway
STATUS IN STATUS OUT
MPUT PORTS
INPUT CLOCK
FUJITSU
ELEMENT
L
OUTPUT PORTS CIIII)
OUTPUT CLOCK
Switch and switch matrix initialization
daisy chain
MB 866808 ATM
SWITCHING INITIALIZE IN ' MlTlALlZE OUT
Figure 2.9 Block diagram of Fujitsu ATM switch
Chapter 3
Architecture
3.1 Introduction
This chapter describes the details of the new multimedia architecture proposed in the thesis. The
following section begins with a discussion of the rationale behind the development of the
architecture. The shortcomings of current rnainstream multimedia solutions are also mentioned.
Section three outlines the specific features needed in the new design and the major objectives it is
intended to achieve.
Section four concentrates on the general structure of the proposed architecture. It shows how this
particular design ful fils the objectives. It descri bes a minimum functional system configuration
and the topology of the data path. This section also discusses the implementation of control
mechanisrns and handling of high load situations which may occur during the workstation
operation. Typical data and control information flows as well as a workstation bootstrap
procedure are presented.
Section five of this chapter focuses on the individual components of the architecture. Major node
types which form the workstation are reviewed. General structure, implementation details, and
some interface and performance issues for each of the nodes are discussed. The last section
compares the characteristics of the presented multimedia architecture with the features of the
previously mentioned multimedia workstations developments.
3.2 Architecture Rationale
There are three major reasons for developing this new architecture. The fint follows fiom the
observation that people, both direct and indirect computer users, are very interested in
multimedia representation of information. The use of various media and audio-visualization
techniques makes the presentation of raw textual data rnuch easier to digest and analyze
[7,17,19,28]. The presentation becomes attractive and interesting to the usen. The second reason
is people's need to freely interact with information. Thus, the ability to alter the form of a
presentation of data on-the-fly can be advantageous in many situations. This is especially true
when the reaction time of the system is critical; an interactive heart visualization during an
operation provides a good exarnple [3 1 , 331. An increased demand for human communication
across large distances is the third incentive for the creation of the new multimedia architecture.
The need for simultaneous teleconferencing and remote workgroup environments is suppressed
only by the high cost of the necessary equipment. Such environments allow browsing and editing
of documents and visual exchange of ideas in groups connected by a cornputer network but
separated by large distances.
The three applications mentioned - multimedia representation of data, interactive access to
multimedia data. and multimedia communication - al1 require transporting, processing, and
displaying large arnounts of data. In order to make them feasible, one must provide a
communication infrastructure and workstations which are able to efficiently handle high
bandwidth information streams. Thc communication infrastructure in the form of high speed
LANs, MANS, and WANs is quickly becoming a reality, with the increased corporate and
personal interest in advcrtising and information exchange over the Intemet. Multimedia
workstations also have the necessary components for dealing with hi& bandwidth information,
including powerfùl and inexpensive processors, large and inexpensive storage, affordable
audio-video equipment, and ample amount of fast RAM.
However, today's workstations lack the means for the efficient utilization of those powerful
components in handling large amounts of multimedia data. A typical workstation employs a
system bus for connecting al1 its devices and peripherals. The performance of current bus-based
systems, well suited for delivery and processing of non-multimedia data, suffers when faced with
the huge loads brought by multimedia. The system bus. due to its fixed bandwidth becomes an
immediate bottleneck; the bandwidth available for each device and the speed of data
transmission become lower with every new device attached to the bus.
There are various anempts to remçdy the bus bottleneck problem. Some of them employ a new
type of bus with improved throughput to satisfy the needs of current generation of applications.
However, the total bandwidth of the bus rernains fixed. When the next generation applications
amve, their required bandwidths will most likely exceed the capacity of the new bus, making it
obsolete again. Others propose systems with a secondary bus for transporting and displaying
multimedia data [29]. However. if multimedia data is to be processed, it must be transferred to
the primary system bus for submission to the CPU. In such case, the interface between the two
buses or again the primary system bus becornes a bottleneck. Solutions wi th secondary buses are
not only costly, but also make the processing of the multimedia information difticult. They are
suitable for first generation multimedia applications which do not require multimedia data
processing.
In order to provide good handling of high bandwidth information corning fiom the external
networks, workstations must provide a fast network interface. Typical bus-based systems employ
a single, centralized network interface which becornes another bottleneck, this time for the high
bandwidth data being delivered to the workstarion frorn outside of its boundaries. Most solutions
to that problem incorporate special network access accelerators, but that shifis the bottleneck
fiom the workstation boundary to the system bus.
3.3 Architecture Objectives
The purpose of this thesis is to propose a multimedia workstation architecture that would allow
efficient processing and handling of both multimedia and non-multimedia data. In addition. the
architecture should allow a cost effective implementation and flexibility in c o ~ e c t i n g various
workstation peripherals. The cost aspect is very important, since it will help bring about quick
market acceptance. A successful architecture shouid also be flexible, allowing both current and
future penpherals to be attached without performance degradation. Both cost and flexibility
influence the lifetime of the new design.
As mentioned in the previous section, currently used workstations are not well suited to
efficiently handle both multimedia and non-multimedia information. Therefore, to compete in the
market, the new architecture must not only be efficient and flexible when dealing with
multimedia data, it must also match or even exceed the performance of current workstations in
general purpose data processing. It should eliminate the limitations and bottlenecks present in
bus-based systems.
The proposed architecture should replace the commonly used system bus with a new type of
interconnect, able to eliminate system bus limitations. The interconnect must tie the extemal
network and al1 devices and peripherals together, to form an integrated workstation system. To
achieve high bandwidth for ali data transfers, the interconnect must employ two important
features. First, it must allow maximum parallelism in data transfers, to permit simultaneous data
exchange among several peripherals. Second, it must offer high interconnect clocking speeds.
Further. to add flexibility and avoid blocking of nodes, the system should permit easy
multicasting of data, so that any information transported over the intemal network can be
replicated and sent to multiple destinations. This can facilitate second generation multimedia
applications which c m both display and submit information for processing at the same time. For
example, a Stream of video frames coming from the extemal network interface can be multicast
to the video node to be displayed on the monitor and to the CPU node where additional
information fiom the frames can be extracted. Both operations can be performed simultaneously.
On the other hand, the information coming fiom different sources and amving at a single
destination node at the same time should also be handled properly. The intemal network and the
interface of the receiving node must prevent both blocking of the node and any loss of
information.
The new workstation should provide efficient access to the extemal network, far exceeding the
capabilities of the single, centralized network interface of today's workstations. The topology of
the design should be as simple as possible to maintain low cost of the workstation and to
eliminate any performance bottlenecks on the data path between comrnunicating devices and
penpherals. The design must also be easily scaleable to facilitate upgrades to the workstation and
any new devices that will be connected in the future, withsut sacnficing its performance and
flexibility. In addition, the control structure of the architecture must allow full control of the
system and peripherals under any load conditions. The control structure should also enable
scaling and downgrading of the traffic on the interconnect to avoid system lockups when the load
becomes unacceptably high. This should be done by reconfiguring the attached devices or their
interfaces by the operating system.
3.4 General Structure
To achieve the objectives outlined in the previous paragraphs, this thesis proposes a new
multimedia workstation architectiirc. The general stmcture of the architecture is presented in
Figure 3.1. A new system board is proposed, which in an innovative way connects al1 peripherals
typically incorporated into the system board: the CPU, the main memory. and the graphics
interface. These peripherals communicate with each other through a fast, direct access path,
allowing efficient generai purpose data processing and display. To facilitate the information
exchange between system board components and extemal peripherals, an inexpensive,
single-chip ATM switch is built into the system board. It implements an internal ATM LAN.
providing high-speed, point-to-point connections to a11 workstation peripherals. The proposed
system board also allows merging of live video and computer graphics. To provide a highly
efficient network access for al1 peripherals, this thesis proposes to eliminate a network interface
and integrate the intemal ATM LAN with the extemal ATM network by means of the ATM
switch on the system board.
3.4.1 Interna1 A TM LAlV
The proposed architecture uses an internal ATM LAN to connect al1 workstation peripherals and
devices as presented on Figure 3.1. Each peripheral or device foms a separate node of this
intemal network and connects to it directly through an optimized ATM interface. A single ATM
switch with a full internal crossbar structure serves as a central element of the intercomect and
binds al1 components together, forming an integral workstation.
Asynchronous Transfer Mode is selected as a protocol for the intemal network, because it is
suitable for supporting both continuous (time dependent) and discrete (time independent) media
[24, 301. Therefore, an ATM interconnect should perform well while transporting both general
SYSTEM BOARD
MAIN I M A i N 12 MEMORY I A STORAGE 1 IC>
1 GKAPHICS 1 L AN ' 1 ,,.-.- L L V L
f COLOL'R DISPLAY
Figure 3.1 General structure of the architecture
purpose and multimedia data. The high speed of the ATM LAN is achieved by hardware
processing of the routing information. As mentioned before. an ATM ce11 has a fixed size, which
makes it suitable for hardware processing.
The delays on the ATM LAN are also fairly small due to the srna11 size of the ATM cells.
However, those delays determine the choice of peripherals that can successhilly communicate
through the intemal workstation LAN. The average delays through the interconnect should not
approach access or response times of comrnunicating peripherals. The memory and graphical
interface acceptable access timec are in the range of tens of nanoseconds [ 1 11. Therefore, as seen
on Figure 3.1. the CPU-rnemory and CPU-frame buffer traffic is not routed through the ATM
switch. but rather through a fast direct access path.
3.4.2 Interconneci topology
The topology of the intemal ATM LAN is simple and cost effective. The intemal LAN
components and al1 peripheral interfaces are tied to a single ATM switch. They are al1 enclosed
within a physical workstation boundary. The small overall size of the internal ATM LAN makes
the distances between the peripherals short, which in tum allows high reliability of the internal
LAN. The LAN connections are implemented on the system board inside a workstation chassis,
where environmental factors can be easily controlled. Compared to an extemal LAN, the intemal
LAN is more reliable and more immune to external interference and noise.
The preferred topology for the intemal LAN is a star with the ATM switch as its centre and al1
attached penpherals as its arms. Small distances and point-to-point connections in the star
topology allow a very high clocking speed of the peripheral channels. In such configuration, the
transmission delay experienced by data transferred between any pair of comrnunicating
penpherals is constant and equal to two ce11 transmission times. The delays on the intemal ATM
LAN are expressed in ce11 transmission times since it makes the measurement independent of the
clock speed of the intemal LAN. The ce11 transmission time is equal to the length of the ATM
ce11 divided by the clock speed of the interconnect and expresses the amount of time needed to
send a whole ce11 through a single hop of an ATM link.
3.4.3 Interconnect fleribility
The presented interconnect is also easily scaleable. Adding peripherals to the system does not
degrade the overall performance. Attaching new peripherals to one or more ports of the central
ATM switch is al1 that is required. Since the switch has a crossbar structure, the bandwidth
available for each communication channel of the interconnect is unchanged. The utilized
bandwidth actually increases, because the bandwidth available on the port used to connect the
new device adds to the total system bandwidth. The only assumption made in this scenano is that
the switch has enough ports to connect al1 desired penpherals. Ideally, the interco~ect crossbar
fabnc consists of a single switch for cost and performance reasons. Since the number of
peripherals used by a multimedia workstation is usually small [29], a single switch can indeed
provide a sufficient numbcr of ports.
Since the switch used as a fabric of the interconnect has a full crossbar structure, the system
enables maximum parallelism of data transfers. Any two devices which establish a connection
through the switch can communicate without interruption until the comection is teminated. The
system can support as many simultaneous, parallel connections as the number of ports on the
switch allows. The intnnsic properties of the Fujitsu ~ ~ 8 6 6 8 0 ~ ' ATM switch. used as an
instance of a switch in this thesis, permit easy multicasting of cells and handling of multiple data
screarns destined for the same destination device. Those operations can be performed without any
involvement of the CPU. A ceIl destined to multiple devices contains the appropriate
multicasting information in its header, which specifies al1 required destination ports. In an
extreme case, a ceIl can be broadcast to the entire system when it is multicast to al1 output ports
of the switch.
The use of an ATM switch provides gracefùl handling in situations when multiple information
streams are being sent to the same device simultaneously. Those information streams arrive in
the switch where they are both directed to the same output pon. Each output port of the switch
contains a considerable nurnber of ce11 buffers which are further divided into high and low
priority queues. Therefore, the cells from both streams are combined in the output queues and are
sen: to the destination device on a first-corne-first-served basis. The only exception is when one
'Refer to C h a p t e r 2 , page 35
46
of the streams is of higher prionty than the other. Since it is buffered in the high ptiority queue,
each cell belonging to this stream has precedence over the low priority one. If a burst of the high
priority cells is fairly long, the low priority queue may overflow. However, the switch has the
ability to inform the operating system about this event. The operating system can react to it by
either asking the stream source device to resend the cells or by reconfigunng the device to
ternporarily slow or stop the stream transmission.
3.4.4 Nehuork interface
The new architecture completely changes the shape of a typical extemal network interface. To
take fu l l advantage of the new architecture, however, the extemal network must be link-layer
compatible with the intemal network. If both extemal and internal LANs use ATM as a transport
mode, the extemal network becomes de facto another penpheral of the workstation. A full
bandwidth of the switch port is dedicated to the network connection. The network information
streams can be multicast or rnerged with other streams transparently. Data format conversions are
minimized since there are only two formats present: a device specific format and an ATM ce11
format. For example, an audio signal is recorded, sampled, and stored in an audio node memory,
packed into ATM cells and transportcd to the destination workstaîion, where it is extracted from
the cells in another audio node. and sent to the speakers through a digitaVanalog converter.
The reverse is also true; al1 internal workstation peripherals become part of the extemal network;
any authonzed device on the network can establish a point-to-point connection with any
workstation penpheral. It can then directly access and control this peripheral without intervention
of the workstation CPU.
In many practical situations a need anses to increase the bandwidth of a computer network
interface, as in the case of heavily loaded network file and application servers. This is typically
accomplished by adding an additional network interface device with a separate dedicated
network cable. Similar bandwidth scaling is possible in the ATM workstation without the use of
any additional hardware. The total bandwidth of the network interface can be selected as needed
by dedicating a required nurnber of the switch ports. Each port comects to the ATM network
through its own link. As a consequence. this setup also requires a dedicated port fiom the switch
on the extemal ATM network.
3.4.5 Basic worksîation configuration
The basic ATM workstation configuration includes the following components: a CPU with
appropriate caches. a main mernory, a frame buffer for colour display, and disks for a permanent
storage. These components are essential, because they provide the basic functionality of the
workstation. The above components must be present even if the workstation is not involved in
the processing of multimedia information. As mentioned before. the maximum acceptable delay
for the CPU-memory traffic and the CPU-frame buffer traffic is much srnaller than what the
ATM interconnect can provide. Therefore, both of these data paths by-pass the intemal LAN. On
the other hand, the disks can be separated from the CPU and the memory because a typical disk
access time (5-20 ms) is considerably larger than the typical deiay through the ATM interconnect
(5-300 ps, equivalent to around 2- 150 ce11 transmission times over the interconnect).
A general topology of the basic systern components is presented in Figure 3.2. The structure of
the system comecting the CPU, the rnemory, the ATM switch, and the Frame buffer can be
proprietary to the system board manufacturer. This does not decrease the flexibility of the
architecture since those components traditionally reside on the system board. The designers of
the system board must provide ATM interfaces for each of the components as depicted on Figure
3.2. Such setup allows easy upgrades in case the CPU or the memory technology changes. An
upgrade involves only an exchange of the system board without altering the rest of the
workstation since the board connects to the rest of the workstation through the standard ATM
interfaces.
Other components of the system, which expand the minimum functionality, inciude the
following nodes: live video processing. video camera, audio, additional storage and CPU,
network interfaces, and modem pools. Al1 those peripherals include a fast ATM interface
designed and optimized for a paxticular peripheral. A single specialized peripheral node can also
combine a number of devices with very small bandwidth demands like mice, keyboards,
BOARD MA1N
I
MEMORY I iMEMORY I
I I
DISK
1 STORAGE I ' MEMORY ATM
1 A
1 INTERFACE INTERFACE
I
CPL SILICON CHlP l - - - - - - '
I I CPUWITH 1 1 LEVELl
- - 0 - - - 1 DISE; CONTROLLER
GRAPHICS CSh-_TR?LLER
r - - ' -
LEVEL2 GRAPFIICS 1 CACHE
I - I 1
I 1 I
1 I I
I 1 FRAME 1 I
BUFFER 1 I I
1 I I I I
1 I T O COLOLiR
DISPLAY
Figure 3.2 Connection of basic system components
digitizing tablets, and joysticks. It can facilitate their communication with other workstation
components without dedicating ATM ports to each of these devices.
The proposed workstation architecture does not include any nodes that are shared among a
nurnber of peripherals. To reduce the intemal LAN traffic, each device contains full hinctionality
within its own node. For example. if the live video processing node supports MPEG compression
standard, the MPEG compression/decompression circuitry should be a part of the node. This also
reduces the number of switch ports required in the workstation. Such setup is preferable from the
performance point of view. However, in pnctical situations, the decision whether to duplicate the
functionality in a number of peripheral nodes and Save a port on the intemal ATM switch or
employ a separate utility device shared among many peripherals is highly dependent on the cost
of the dedicated utility devices. Figure 3.3 visually shows the benetits of the chosen scheme.
3.4.6 Con trot path structure
Since the access time of connected peripherals is relatively long cornpared to the interconnect
delays, the control path of the system is implernented on the same intemal LAN as the data path;
the control information is distributed along the same intemal ATM interconnect as data. The
need for a separate control and management interface is eliminated, which signi ficantly reduces
the overall complexity of the workstation architecture. Since the control and management
information is usually small. it is sent in a single ATM ceIl with higher priority than the data
cells. The whole payload section of the ATM ce11 is dedicated to the control information. Giving
higher priority to the control cells allows them to bypass the data cell queue buffers and reach the
destination with a minimum delay. The ATM interface of each c o ~ e c t e d peripheral decodes the
information in the cell, looks for any control information, and acts on it. Since control data is
located at the b e g i ~ i n g of the cell payload section, the peripheral interface can perform the
requested action even before the whole control ce11 is fully received. This further reduces the
delivery time of the control and management information.
One of the most important aspects of the control architecture involves handling of very high
system loads. In such situations the system may experience ce11 loss due to a buffer overfiow in
the ATM switch. Output buffer overtlow can occur when two or more data streams are
COMPRESSOR M A N
Figure 3.3 Data flow in system with a) separate utility nodes and b) utility nodes
embedded into the peripheral nodes
competing for the same output port and their combined data rate and burst length exceeds the
sending capabilities of the output port. A loss of an ATM ce11 by the interconnect can only occur
intentionally. In addition, al1 devices allowed to drop cells intentionally are also required to
inform the source of the dropped ceIl and a workstation operating system about this event, so the
lost information can be retransmitted.
When a ce11 is lost in the output buffer, the switch automatically generates and sends the
appropriatz information through its statistics interface. This information is then embedded into a
ceIl and sent with high priority to the node which runs the operating system, narnely the
supervising CPU node. Upon receipt of this cell, the operating system sends a control ce11 to the
source device whose ceIl has been lost, and requests a retransmission, slowing, or stopping of the
ce11 stream. The originating device interface decodes the control ce11 and configures the attached
peripheral accordingly. Figure 3.4 provides a quick sketch of the path of the control information.
To avoid system lockup, the ceIl loss information is sent through the high priority queues,
therefore bypassing the overflowing switch buffers. The high priority queues are reserved for the
use of the operating system and other control and management information to minimize the
delays in the transport of the control cells. Hence, the total reaction tirne to a ce11 loss event is
detennined mainly by the transmission tirne of the control ce11 through the ATM interconnect.
The delays introduced when the system is working with the external network, either as a ce11
source or destination, are mostly determined by the extemal network characteristics. The network
delays are independent of the interna1 workstation architecture. Therefore, this topic is not
discussed further in the thesis. The only aspect of the thesis that is relevant to the network delay
issue is the additional delay added by the intemal interconnect to the total network delay.
However, since the network is treated as a generic workstation peripheral, the delays introduced
by the intercomect are identical to those for any other peripheral attached to the workstation.
To improve performance of the system with general purpose data processing, the control and
resource management burden of the CPU, introduced by multimedia devices, is minimized. Since
information transfers inside the workstation are comection oriented, the CPU is only required for
the set-up and termination of those connections. In many cases, if the devices have enough
52
Stop, slow down, \ or rctnnsmit
, Switchstatistics data ' (seriai interface)
\
' CeIl loss event \ t STATISTICS
CELL GENERATOR
h
Figure 3.4 Control data flow during a cell loss event
processing power, they can establ ish the connections themselves, only Infoming the CPU (an
operating system nodej of this fact. The only remaining responsibility of the CPU is the
necessary scheduling and resource allocation of the interconnect, as well as trafic shaping and
ce11 loss handling.
3.4.7 Initiai setup and îypicai usage scenarios
Naturally, the workstation requires an operating system to tie al1 the components together into a
seamlessly operating workstation. To manage the components properly, the operating system
must be aware of al1 components of the workstation and their characteristics. The operating
system is responsible for scheduling the flow of data to avoid congestion and blocking, and it
should react to such situations promptly to restore normal system operation. The general
rnechanism of congestion control was discussed in a previous section. The following paragraphs
describe how the operating system collects information about system structure and how the data
connections are establ ished.
The information about the system structure and the device characteristics should be obtained
during the operating system startup. First, the CPU initializes the system board components, then
sets up their respective ATM interfaces. In the next phase. it identifies the remaining workstation
cornponents by broadcasting a special status query cell. The reply cells should provide a detailed
information about ail peripherals attached to the switch ports, their maximum and typical
bandwidth requirements. and the control options they provide. Sincc this quety process can be
repeated during the normal workstation operation, new penpherals can be dynamically added and
configured after the bootstrap procedure.
After the workstation is completely initialized and the operating system is in full control of the
workstatior. components, the regular data processing may be started. A typical comection
between peripherals can be established by a request from the connecting peripheral to the
operating system. A reply to this query will supply the VCINPi pair and a port number of the
switch to include in al1 ce11 headers. If a two-way c o ~ e c t i o n must be established, both
connecting nodes receive appropriate replies from the operating system. The same procedure
applies to the multicast and broadcast connections. Once connected, the involved peripherals
require no further intervention from the operating system. unfess congestion or blocking occurs.
The connections with the extemal network difler slightly since the operating system node has to
establish a connection with the extemal ATM network first. Only after such connection is
negotiated, the openting system provides the corresponding workstation peripheral with the
switch port address and a VCINPI pair. The procedure described above can be bypassed if the
originating node has enough processing power to perform an ATM connection session with the
externat network by itself. In this case the node needs only to inform the openting system of the
connection and the required bandwidth. and to request the switch port address of the extemal
network node. If, on the other hand, some nrtwork entity is to contact an individual device of the
workstation, it has to contact the operating system node first to receive a comection grant. This
is necessary from a secunty point of view. so that any unauthorized access to the interna1
peripherals can be stopped.
Some connections in the workstation have to be permanent to minimize access delays during the
connection establishment. The comection between the memory and disk storage nodes is a good
example of such permanent connection. Since the main memory and the disks are involved in
paging, any additional delays introduced during the connection setup unnecessarily degrade the
overall performance of the workstation memory system. The memory-disk storage connection
should be terminated only during the operating system shutdown or restart.
3.5 The Components And Their Implementations.
3.5.1 System board
The system board contains three main components: the CPU with first and second level caches,
the main memory, and the graphics controller circuit. The connections of those components are
depicted on Figure 3.2. The CPU subsystem, with the first level cache and second level cache
controller integrated ont0 a single silicon chip, communicates with the rest of the workstation
through the input/output systern agent. The i/0 agent facilitates CPU access to the main memory,
the frarne buffer, and directly to the ATM switch of the internai interconnect. The direct access to
the switch facilitates operating system control fùnctions.
The system board allows two types of access to the main memory. The CPU uses the fast
connection via the I/O agent, while al1 other peripherals use the intemal ATM LAN. A page fault
process exploits both of these connections. When the CPU performs a memory operation, the
appropriate request is passed to the memory controller through the I/O agent. The memory
controller processes the request and acts upon it appropriately. If the referenced data is in fact in
the main rnemory. it performs the requested action. If it is not, it informs the operating system of
a page fault. The CPU then sends a request cell to the storage node to retrieve a page through a
permanent ATM memory-disk connection. AAer issuing this request the CPU is fiee to continue
execution of other rasks. The page retrieval through the ATM interconnect is analogous to a
DMA transfer in the bus-based systems.
The memory controller and the ATM interface of the memory node can both be integrated onto a
single chip. However, this can reduce the flexibility of the system board; when the memory
technology changes and the rnemory interface must be replaced, the ATM interface will
unnecessarily be replaced with it.
The third main component of the system board, the graphics controller circuit, facilitates the
access to the colour display monitor. There are many graphics controller architectures for the
bus-based systems [ I l ] , and rnost of them can be easily implemented on the proposed system
board with only minor modifications. In the ATM workstation, this circuit is used exclusively for
the processing of the low bandwidth visual information associated with the operation of the
graphical user interface. As opposed to the main memory, this circuit can oiily be accessed
through the fast data path via the I/O agent. Assigning a separate ATM port to the graphics
controller would be a waste of the switch resources, because of the very low bandwidth needed
for the manipulation of the display frame buffer, which is estimated at round 200 KBps [18].
Because of such small bandwidth requirements of the CPU-frame buffer link, the fast data path
to the CPU can be shared with the main memory without introducing any bottlenecks to the
CPU-memory data traffic. High bandwidth video information, such as video-teleconferencing,
motion pictures, and other live video matenal, is processed in the iive video node, described later
in the chapter. The system board incorporates a simple connector for the second external source
of video information as seen on Figure 3.2. The outputs fi-om the live video connector and the
graphics controller circuit are merged on the system board to be displayed on a single video
screen.
3.5.2 Disk storage node
The disk storage node can be implemented in two different ways. For compatibility with the
existing disk standards. the ATM circuitry of this nodr can incorporate an appropriate interface
to the disk subsystem. As seen on Figure 3.2, the peripheral side of the storage node interface can
be compatible with SCSI, IDE, or any other disk protocol. However, such setup is subject to al1
limitations of the above mentioned protocols. On the other hand, the integration of the ATM
interface protocol within the disk controller is as practical and cost effective as for other disk
protocols. Figure 3.5 shows the configuration of the ATM workstation with vanous storage
nodes using the ATM interface protocol. Then, the minimum data unit sent or recrived by the
disk controller is an ATM cell.
3.5.3 Live video processing node
The live video node is intended to process the high bandwidth visual information received by the
workstation, such as teleconferenc ing streams and motion pictures, and transmit the video
recorded locally to the extemal network. The general structure of this node is presented on Figure
3.6. The cost and complexity of the video node directly relates to the flexibility and processing
power of the video processing circuitry. This circuihy can take form of a codec module to
support vanous video compression standards like MPEG- 1, MPEG-2, or AVI, or i t can contain a
signal processor with supporting devices to perforrn more complex operations [8]. The live video
node sends the processed signal to the special connector on the system board where it is
displayed on the main screen as seen on Figure 3.2. During video recording, the signal fiom the
video carnera is compressed or processed as required, packed into the ATM cells in the ATM
interface, and sent to the intemal ATM network through an established co~ec t ion . The
compression circuitry and the ATM interface may be built into the video carnera itself, allowing
it to form a separate node on the interna1 ATM LAN [3].
BOARD MAIN
MEMORY
A L - , - - A
p - 0 - 9 , v
CPU 1 I/O
AGENT
ATM LAN
CD-ROM 1 I I
Figure 3.5 ATM workstation with ATM circuitry integrated
into mass storage devices
Livc vidco nods - - - To systcm board
I ~ V C vidco 1 conncctor VIDE0
BUFFER
VlDEO
VlDEO PROCESSMG CIRCUITRY SWITCH Q
Figure 3.6 Configuration of a live video
processing node
3.6 Cornparison With Other Multimedia Architectures
The proposed architecture improves the design of the multimedia architectures presented in the
background chapter. This section provides a quick review of how it deals with the drawbacks and
shortcomings described earlier in the appropriate critique sections.
The shared, centralized penpherals in VuNet are eliminated. All functionality required by a
particular node is embedded into that node (compression, decornpression. synchronization, etc.).
Similarly, by eliminating those shared cornponents, the chaining of the nodes wiih a single
function to perform complex tasks is no longer needed. The software intensive approach in the
workstation control is replaced by a decentralized hardware approach. Most functions such as
multicasting, ce11 assembly and reassembly, and protocol management are al1 executed in the
ATM hardware.
Compared to the unspeci fied topology of the Netstation architecture, the topology of the
proposed architecture is a star with a single ATM switch as a central element of the interconnect.
Therefore, the distances to al1 devices on the interconnect are constant. There is no possibility of
a node being cut off unless the central switch is damaged, which is equivalent to a rnalfunction of
the workstation as a whole. The number of ports on the central switch is selected by a
workstation designer and is not fixed to four, such as in the case of the proprietary MOSAIC
nodes in the Netstation architecture.
Compared to the DAN architecture, the proposed architecture offee a much better performance
of the memory system. The main memory is not separated fiom the CPU by the relatively slow
ATM interconnect. The control path is implemented on the same interconnect as the data path,
saving in the cost and complexity of the architecture. Unfortunately, the integration of the data
and control paths adds some overhead to the existing intemal LAN traffic. As mentioned before,
the compression, decompression, synchronization, and other similar utility peripherals are
embedded into the nodes of devices which require those services, thus reducing the use of
bandwidth of the internai LAN.
There are reasons to believe that the implementation cost of an ATM workstation will be smaller
than the implementation cost of other architectures. While other architectures use proprietary
interconnect protocols and hardware, the proposed architecture uses a standard ATM protocol.
Therefore, off-the-shelf ATM cornponents can be used in the interconnect and al1 peripheral
interfaces. Many devices traditionally used in computer workstations has been deemed obsolete
inciuding DMA controllers, network interfaces, and bus control circuits. The use of the same
interco~ect for both data and control information further reduces the implementation cost,
because there is no need for separate control protocols and hardware.
Chapter 4
Simulation
4.1 Introduction
The previous chapter presented a detailed description of the proposed multimedia workstation
architecture. In order to validate the performance stûternents contained in the previous chapter. a
cornparison between the bus-based and ATM-based architectures is needed. To accomp!ish this,
a system bus simulator and an ATM interconnect simulator were created.
This chapter contains the description of the two simulators and the analysis of results fiorn the
simulation nins. The following two sections explain the structure and the features of the
simulators, the assumptions made in their design, and their limitations. These sections also
explain the choice of the performance metrics and the simulation factors.
Section four presents the simulation results and their analysis. The choice of the simulation
scenanos is justified. For each scenario, a nurnber of graphs which represent the most significant
findings is included. A full listing of simulation results from all runs is offered in Appendix A.
The last section discusses the performance figures obtained from simulations and estimates the
impact the simulator assurnptions and limitations have on the reliability of the results.
4.2 ATM Interconnect Simulator
4.2.1 Simula for components
Four devices of an ATM multimedia workstation are rnodelled in this simulator, which include: a
main rnemory, a disk, a network c o ~ e c t i o n , and a live-video device. A single ATM switch
modelled after the Fujitsu ~ ~ 8 6 6 8 0 ~ ' provides an interconnect for al1 devices. An object
onented language was selected to impiement the simulator since al1 the above mentioned devices
'Refer to C h a p t e r 2, page 35
62
are easily mapped into separate software objects. The simulator was written in GNU C++ and nin
under üNIX operating system. There are six main objects in the sirnulator: a memory node, a
disk node, a network node, a video node, an ATM switch, and a main simulator module. The
relation between the simulator objects and the physical devices are depicted on Figure 4.1. The
main simulator module facilitates the connections and the data exchange among al1 object nodes.
4.2.2 Sirnulutor events
In order to rninirnize the time of the simulation runs, the simulator is event-dnven. Events are
generated by ail devices capable of transmitting and receiving data. The event queue
management and event execution is implemented by the main simulator module. This module is
also responsible for the initialization of al1 modelled objects and of the simulator as a whole.
To approxirnate the real data traffic, the data transfer events must occur at random intervals with
probabilities determined by the characteristics of the data traffic load. A prime module
multiplicative linear congruential generator (PMMLCG) [2 1.221 was used as a random event
generator. PMMLCGs generate random integers Zi . Zz, ... according to the following recursive
formula:
ZI = (~2 , - + c ) mod m
In this formula, u is the multiplier, c is the increment. Zo is the seed or the string value, and the modulus rn is the largest prime number less than 2 b , with b being the nurnber of bits in a word
on a computer used for the simulation. The PMMLCG used in this simuIator was of the form:
It was thoroughly tested and proved to produce an uniform and uncorrelated series of randorn
numbers, and is supported by the W I X operating system [Z I l . Each device capable of
generating events uses its own random number seed. These seeds are carefully selected to avoid
correlation between the randorn number streams generated fiom each seed. To accomplish this a
separate utility program was created which produced the seeds according to the procedure
described in [2 1 1.
L , , , , , , I L , , - , , , l L ,,,,,- I L , , , , - , I VIDE0 MODULE MEMORY hlODL'LE DISK MODULE KETWORK hIODCLE
c - -
1 , - I SOFTWARE OBJECT
0 PHYSICAL
DEVlCE
Figure 4.1 Relation between ATiM sirnulator objects and modelled physical devices
4.2.3 Simulation metrics
The delay through the interconnect, expressed in clock cyc tes, was selected as a measure of
performance for the simulated architecture. The delay reflects the time spent by each ATM ce11
in the ATM interconnect before being received by the destination device. The delay is calculated
as the nurnber of cycles between the time when the cell is ready to be transrnitted by the source
device till the time when the ce11 is fùlly received by the destination device, except when
caiculating the delay for cells coming from the extemal network. Since there is no physical
device that acts as the network interface3, the delay is measured From the time a cell is fully
received fiorn the network by the interna1 ATM switch.
4.2.4 Simulation parameters
Each device included in the simulator has an inherent data type, specific to the task it is intended
to perfom. This subsection describes the parameters that characterize the device data types.
These parameters determine the behaviour of al1 simulated objects.
Memory node. Throughout the simulation, the mernory is involved in the paging process with
the disk node. Therefore. the inherent data type for memory node is a page. Two parameters, a
page size and a page fault probability, characterize the paging process. Common page sizes of
4K and 8K [18] and two page fault probabilities of 1 O4 and 1 O-5 are used for al1 simulation
mns. These probability values are typical of average and heavy paging activity [18]. The page
fault probability together with the page size detemine the data rate of communication between
the memory and the disk.
Disk node. The disk node uses the sarne data type as the rnemory node. Therefore. the
simulation parameters for these two are identical.
Video node. This node is involved in video-teleconferencing, during which the video frames
are constantly being exchanged between this node and the extemal network. Therefore, a vide0
frarne is the most suitable data type for this node. Two parameters, video frame size and data
rate, descnbe the video-trleconferencing process. The video frame size determines how many
consecutive cells are used to send a single fiame. The maximum frame size was adopted fkom
the experiments with the full motion picture transmission over a local area network reported by
' Refer to Chapter 3, page 4 4
65
[6]. The data rate, expressed in Mbps, reflects the amount of video data transmitted per unit
time, and is used to specify the quality of the teleconferencing connection. The minimum rate
used in al1 simulation runs, 9 Mbps. corresponds to a currently used standard TV-quality
teleconferencing. I t is characterized by a 30 framesk refresh rate, an interlaced 720 by 480
pixels resolution, and an 8-bit colour display. Table 2.1 in Chapter 2 lists the characteristics of
other currently used teleconferencing standards. The maximum rate used in the simulations,
200 Mbps, approximates the teleconferencing quality of the Grand Alliance HDTV System
Specification Version 2.0. It defines the picture as 720 by 1280 pixels at 60 frarnes/s. and
24-bit colour. The above rnentioned data rates take into account an average MPEG
compression of the video data.
Nehvork node. Since this node is involved in video-teleconferencing with the video node. the
same set of parameters is used to characterize its behaviour.
4.2.5 Assumptions
The simulation assumes that the modelled system is already Fully initialized. The goal of the
simulation is to observe the behaviour of the system under normal operating conditions. Since
the initialization phase does not have any influence on the performance of the system. it can be
ignored.
The clocking speed of the ATM switch is 33 MHz, higher than the 25 MHz specified by
Fujitsu. This clock frequency was selected so that the aggregate bandwidth of the switch is
identical to the total bandwidth of the simutated bus. The rnodelled bus is 32-bits wide and
runs at 33 MHz. providing a total bandwidth of 1 O56 Mbps; the switch has four 8-bit wide
pons, and provides aggregate bandwidth of 1056 Mbps if ruming at 33 MHz.
The bandwidth of the connections among the simulated devices is equal to the maximum data
transmission and reception rate of the ATM switch pons. namely 264 Mbps. The intention of
this simulation is to measure the impact of the device parameten on system performance, not
the impact of the physical connections between the devices. In addition, point-to-point
comections c m easily exceed the assumed bandwidth of 264 Mbps, as s h o w by examples of
RAMBUS [4, 271 and Ramlink [12, 201 memory interface intercomects.
Al1 modelled devices are able to send and receive data at the rate equal to the interco~ect
bandwidth, which is a realistic assumption since the devices with rates in excess of 264 Mbps
are readily available on the market. As an example, the RAMBUS memory interface has a data
throughput of 500 MBps. A single SCSI-2 Ultra Wide hard disk controller cm handle data
transfers at 320 Mbps. This assumption eliminates the case when the device refuses to receive
the data from the ATM switch due to the inability to process the previously received cells. If it
was the case, the incorning ceIl would be dropped and retransmitted, resulting in the increased
ce11 delays and data trafic. The increase would be proportional to the time required by the
device to resume the ce11 reception. To remedy the problem. an input buffer could be added to
the ATM interface of each device. The ATM interface would receive the ce11 from the switch
and allow the device to retrieve the data iater.
Thc extemal network hardware is capable of generating a routing tag for the intemal ATM
switch. If the extemal network hardware did not have this ability. a separate device would have
to be used insiead. The addition of such device would increase the cell delays associated with
the traffic coming from the external network. Since the delays incurred by the extemal network
are most likely higher than the delays produced by the routing tag generation device. the
impact of this device on the total ce11 delay would be very small. However. the addition of this
device would definitely affect the cost of the ATM system. When sending cells to the extemal
network, the routing tags are automatically stnpped by the intemal ATM switch, resulting in a
standard ATM cell.
The smallest data object manipulated by the simulator is a cell. A ceIl consists of 56 bytes.
which is composed of a standard ATM ce11 (53 bytes) and a 3-byte routing tag required by the
self-routing Fuj itsu ATM switch.
4.2.6 Limitations
Only the data transfers are simulated. The control information flow is not implemented.
Therefore, if a ce11 is dropped in an ATM switch, only the statistics information is updated,
without the ce11 being retransmitted. Similarly, no data request cells are implemented. The data
transfer events are generated by a random eveiit generator.
Each of the modelled devices can only generate cells to one destination in a single simulation
m.
+ The drop rate of the cells in the ATM switch is calculated on a per-output-port basis. It is not
possible to directly determine the number of dropped cells which originated from a particular
input port.
Only the minimum. maximum. and average delays of the cells are recorded.
4.3 Bus Simulator
The bus simulator is a modification of the ATM interconnect simulator. Thetefore, al1
information presented in the previous section applies to the bus simulator as well. The
differences between these two are outlined below.
4.3.1 Simula for components
This simulator rnodels the sarne four devices as the ATM switch simulator: a main memory, a
disk, a live-video node, and a network interface. The relation between the simulator objects and
physical devices is depicted on Figure 4.2. The main simulator module provides the connection
between the simulated objects in the form of a systcm bus, perfoms bus scheduling, arbitration,
and data transfers. The generic bus specification chosen for the sin~ulation is loosely based on the
PCI standard. It is 32-bits wide and runs at 33 MHz. providing a total bandwidth of 1056 Mbps.
4.3.2 Simulation metrics
As in the case of the ATM simulator, the delay through the bus as experienced by the transferred
data was selected as the measure of performance for the system. For al1 devices, the delay is
calculated as the number of cycles from the time a data unit is ready to be transmitted in a source
device till the tirne it is fùlly received by a destination device.
4.3.3 Assumptions
+ The smallest data object manipulated by the simuiator is a 32-bit word. However. the bus
system implements data transfers using a DMA technique. The size of the D M . block is a
simulation parameter. Two common DMA block sizes [36], 256 and 5 12 bytes, were used in
al1 simulations.
Al1 devices are able to send and receive data at a rate of 264 Mbps. This value is selected to
allow fair cornparison between the bus and ATM interco~ects, and corresponds to the data
MEMOEYMODULE 0 - 0
DiSK MODULE M A P i SlhlLrLATOR MODULE - - O - - - -
I I r - - - - - - - - - I I
D i
. I I I I I I I I I I I
LIVE + . I
I ? 4 fi I . XETWORR VIDE0 I INTERFACE I I
I I I I I I L - - - - , , , , d
I I I I L o o o , , , I L , , o , , o I
P O -
I -, I SOFTWARE 0 PHYSfCAL
OBIECT DEVICE
MEMORY
I
i. I V!
VIDE0 MODULE I NETWORK MODULE I
- - o - o o -
I I
Figure 4.2 Relation behveen the bus simulator objects and modelled physical devices
rate of any device in the ATM simulator.
Each device bus interface is able to communicate with the bus at a rate equal to the throughput
of the bus. To accommodate this, each intertàce has a data buffer. In case of a bus congestion,
al\ data awaiting transmission in the device bus interface, and al1 new data generated by the
device is stored in this buffer. Similarly. data received from the bus is always stored in this
buffer before the device retrieves it. It is assumed that no bus interface buffer wiIl overf'low.
This assumption has minimal impact on the simulation results since the data rates used in al1
simulations never approach the bus bandwidth. They are always in the range of the device data
rate.
4.3.4 Limitations
Each device c m perforrn data transfers to only one destination dunng a simulation nin.
Only the DMA burst mode is implemented.
The process of bus ownership transfer is not implemented.
4.4 Analysis And Presentation Of Results
Al1 simulation runs are divided into four scenanos. This section provides a brief description of
each scenario, followed by the analysis of generated results.
6.4.1 Scenario 1 Uninterrupted unidirectional video transfer
4.4.1.1 Description
In this scenario a single stream of video fiames is being delivered from the network to the
workstation. There is no other activity on the interna1 workstation intercomect. Such scenario
corresponds to the situation when a movie or any other high bandwidth data stream is being sent
to the end user !tom a network source. The workstation is used solely as a display for the
incoming information. The delay on the intercomect is a sum of the tirne data spends in the
device buffer awaiting transmission and the transmission time through the interconnect. Since
there are no queuing delays, this scenario shows the differences in inh-însic transmission delays
for the two simulated intercomects.
4.4.1.2 Analvsis
The average and the maximum delays experienced by the data are depicted on Figure 4.3. Al1
data transfer rates and video fnme sizes tested proditce the same results for each of the simulated
intercomects. The incoming information flows without any interruption and reaches its
destination with a minimum delay. In this scenario. the delay observed is purely a transmission
delay. The ATM ceIl delay involves the time needed to deliver a ceil fiom the switch to the
destination device.
4.4.2 Scenario 2 Two- way teleconJerencing
4.4.2.1 Descn~tion
In this scenano, a live-video node sends a video-teleconferencing data stream to the extemal
network. At the same time, a data stream with the same data rate and the same video frarne size
amves fiom the network to be displayed by the workstation. This scenano shows the impact of
interference on the average data delay.
4.4.2.2 Analvsis
Figures 4.4a and 4.4b present the average delays observed for vanous teleconferencing data rates
and video fiame sizes. The ATM system shows identical and constant delays for al1 simulation
parameters. Therefore, only one line is sufficient to represent al1 results. The bus system shows a
different behaviour. For small video fiarne sizes, the delays are very similar to the ones in
Scenano 1 and increase very slowly with increasing data rates. As the data rate of the video
stream increases, for larger video frames, a significant increase in the average delay for the bus
system is observed. For 200 Mbps and a 20 Kbyte video frame, the average delay for the bus is
approximately nine tirnes longer than for the ATM system.
Graph 4.4b shows the average delays for the smaller size of the DMA block than graph 4.4a. The
corresponding curves on graphs 4.4a and 4.4b have very similar shapes except for the cases of
srnaIl video fiame sizes. Simulation results show that for increasing data rate and video frame
size the delays for both DMA block sizes converge. This is due to the fact that DMA block size
and the corresponding block transmission time are becomlng insignifiant as compared to the
7 1
Delays during unintemipted video transfer D M tramfer size of 51 2 bytes
9 50 1 O0 150 200 Data rate (Mbps)
Figure 4.3 Uninterrupted transfer delays as recorded by the video node
Average delay during teleconfemncing DMA îramfer size 256 bytes
- --
9 50 100 150 200 Data rate (Mbps)
- *
ATM ail frarnes + Bus lK Irama Bus 5K lrame -3- Bus t0Kfram.e. - - I - - ! !Js-~~. ! rame~- -- - - - - . - --- -d
Video node
Average delay during teleconferencing DMA kansfer size 5 12 bytes
01 9 50 100 150 200
Data rate (Mbps) - - - - -- - - - --
ATM ail frarncs + Bus l K hama + &sSKframs . - 0 ~ s 1OK !rame --i- Bus 2OK trama
Video node
Figure 4.4 Average teleconferencing delays as experienced by the video node for DMA
size of a) 256 bytes b) 512 bytes
time spent waiting for the bus to become available.
Graphs 4.5a and 4.31 show the maximum delays of the teleconferencing traffic from the same
simulation runs as Figures 4.4a and 4.4b. The graphs indicate very long waiting delays in the bus
system. For data rate of 200 Mbps. 20 Kbyte video fiame size, and DMA size of 256 bytes, the
maximum delay is 10 195 cycles for the bus system compared to 57 cycles for the ATM system.
ln the case of the ATM interconnect, the average and maximum delays are constant as seen on
Figures 4.4 and 4.5. This is due to the fact that the switch provides a point-to-point path for the
incoming and outgoing data streams. The streams do not compete for the same shared resource,
namely the interconnect bandwidth. as in the case of the bus. One should note that even in the
case of the highest traffic load simulated, 200 Mbps, the bus utilization is only 38%. Therefore,
one can expect even longer bus delays for higher interconnect utilization.
The graphs 4.4 and 4.5 present only the delays for the video node since the simulated system is
symmetrical and the results for the network node are very similar.
4.4.3 Scenario 3 Two- way tekcon ferencirrg with page faults
4.4.3.1 Description
In this scenano, four data streams are being delivered through the workstation interconnect.
M i l e the network and video nodes are engaged in video-teleconferencing, the memory and disk
nodes are involved in paging. This scenario shows the impact of the high-bandwidth multimedia
information, in the form of video-teleconferencing, on the performance of the workstation paging
system. It also examines the influence of paging activity on the delays of the teleconferencing
c o ~ e c t i o n .
4.4.3.2 Analysis
The teleconferencing streams expenence delays very similar to the ones from Scenario 2,
because the page faults do not constitute a significant bandwidth burden. However, the effect of
the teleconferencing streams on the paging delays is worth noting. Figures 4.6a and 4.6b present
Maximum delay during tdeconfemncing DUA tramfer size 256 bytes
r
O rn = 7 - O
9 50 100 150 200 Data rate (Mbps)
Video node
Maximum delay durlng teleconferencing D M transfér sue 512 b y m
9 50 100 150 200 Data atc (Mbps)
Figure 4.5 Maximum teleconferencing delays as experienced by the video
node for DMA size of a) 256 bytes b) 512 bytes
the average and the maximum delays experienced by the memory node as a function of the
teleconferencing data rate with a fixed paging rate. Only the figures for the 256 byte DMA block
size are given, because the results for DMA size of 5 12 bytes are very similar. The disk node
results are also very similar to the memory results, because the simulation setup is symrnetrical.
Figure 4.6a shows a steady increase in the average delay experienced by the bus system with the
increase of the video data rate and frame size. This is due to the fact that the high-rate
teleconferencing uses a significant portion of the shared bus bandwidth, thus increasing the
waiting times of the pages before they can be transmitted over the bus. The average delays of the
ATM interconnect remain constant because al1 the connections present in this scenano do not
interfere with each other. Each device uses a separate path to its destination device and each
destination device is associated with a separate output port on the switch. For teleconferencing
rates below 75 Mbps, the bus system outperforms ATM. This is because the pure transmission
delay of the bus is smaller cornpared to the ATM interconnect. The crossover point falls between
the data rates of 75 Mbps and 130 Mbps, depcnding on the size of the video Frame. The
maximum delays are significantly bigger in the bus system, for the whole range of the simulated
video data rates.
Figures 4.7a and 4.7b show the average and maximum delays analogous to those in Figures 4.6a
and 4.6b, but for a heavy paging load. The delay increase with the increasing teleconferencing
rate observed for the bus system is even more pronounced. In most cases, the average and
maximum delays are at least doubled as compared to the average paging load. The bus
outperforms the ATM system only for the teleconferencing rates below 50 Mbps. For higher
paging load the crossover point falls somewhere between the data rates of 30 Mbps and 50 Mbps.
Some delay curves in Figures 4.6b and 4.7b show a counterintuitive behaviour; for increasing
data rate, the decrease in the maximum delay can be observed. This behaviour stems fiom the
fact that the maximum delay is a random variable of the statistical pattern of interference among
al! data streams in scenario three. Regardless of this behaviour, the most important conclusion
drawn fiom these figures is that the maximum delay in the bus system is more than one order of
magnitude larger than in the ATM system.
Average delay during teleconferencing' with paging DMA tmmjier size 256 bytes
Page ske IKbytes, Page fault 1 E-6
- - - - - - - - . - -. + ATM ail framas -t- Bus lK frarno Bus SU trama -M- Bus 10% framo + Bus ZOK frame - -
Memory node
Maximum delay during teleconferencing with paging DM4 transfer size 256 bytes
Page sée JK&IS, Pagefïuft IE-6
01 m - - - - - I
9 50 100 150 200 Data rate (Mbps)
- - - - - .. -- - - - - - ATM al1 frames + Bus I K frame + Bus 5K fmmo
* . m ~ s m K m e -_BusOKhamo
Memory node
Figure 4.6 a) Average and b) maximum teleconferencing delays as experienced by the
memory node with average paging
Average delay during telwonferencing with paging DMA tramfer size 256 bytes
Page size BKbyfa, Page /au11 1E-5 c 600 1 0 500 C
400
= 300 Y C
1 200 C
s 100
O ' 9 50 100 1 50 200
Dafa rate (Mbps)
' + ATM ail hamas Bus 1Kframa -+- Bus 5K frarno
Maximum delay during teleconfennclng with paging DMA tramfer size 256 bytes
Page s ix BKbytes, Pagejâulr IE-5
O ' 9 50 100 150 200
Data rate (Mbps) - . - - -. . . - - - - - - - - - -- - -- - - - -- . + ATM al1 harnas Bus l K lrame * Bus 5K frarno
Bus lOK !rama Bus-2OK @ma _ -- . - - - - - - . Memory node
Figure 4.7 a) Average and b) maximum teleconferen cing delal experienced by the memory node with high paging rate
78
4.4.4 Scenario 4 Browsing and video transmission with page faults
4.4.4.1 Description
This scenano consists of three simultaneous operations. The first operation, browsing, involves a
high-bandwidth document transfer from the extemal network directly to the workstation
memory. The document is not sent in one long burst. but is subdivided into smaller data units.
The size of the data units is treated as one of the simulation parameters. The second operation is
a transmission of a video Stream recorded at the workstation from the live-video node to the
network. In addition to these two processes. page faults occur between the rnemory and the disk.
This scenario is intended to show the consequences of sending two data streams to a single
device, the memory. The rnemory node receives pages from the disk and the browsed document
to be displayed. The video transmission is used to create interfering traffic.
4.4.4.2 Analysis
Figures 4.8a and 4.8b show the comparison of the ATM and the bus system with a high, fixed
data rate of the interfering traffic and an average page fault probability. Figure 4.9 complements
them with the graph depicting the effects of the simulation panmeters on the cell drop rate. The
graphs indicate that even though the delay through the ATM interconnect is still fairly constant
for al1 data rates, a small percentage of cells is being dropped for higher data rates. The drop rate
will affect the cell delays because these cells will have to be retransmitted. If the dropped cells
onginate fiom the extemal network, the increase in their delays can be significant. However, as
seen on Figure 4.9, the drop rate never exceeds 0.35%, so the impact of the delay of retransmitted
cells on the average ce11 delay should be negligible. The increase in the drop rates is due to the
limited size of the output pon buffer in the ATM switch. The buffer starts to overflow when it
receives large amounts of data from more than one source. The steady increase in the average
delays for the bus system, as observed in Figures 4.8a and 4.8b, is consistent with the results
fiorn previous scenanos. Larger data rates translate into longer delays through the interconnect.
The ATM system outperforms the bus system in al1 but one case, when the document is
subdivided into srnall 1 Kbyte data units.
Average delay: browsing, paging, and video transmit Paging at I E-6, page size 4K
Video trafic ut 2OOMbps with drua unir 20K c 240 i
" 9 50 100 150 200
Data rate (Mbps)
Average delay: browsing, paging, and video transmit Paging ut 1 E-6, page sùe 4K
Video trafic ai 2OOMbps wirh data unit 2ûK c 6001
9 50 100 150 200 Data rate (Mbps)
Figure 4.8 a) Average and b) maximum teleconferencing delays as experienced by the memory node with average pagiog
rate
Cell drop rate: browsing, paging, and video transmit Paging at IE-6, page size 4K
Video haflic Ü t 2 0 0 ~ b ~ s k t h dora unit 20K
9 50 100 150 200 Data rate (Mbps)
- - -- - - . - - --C- 1 ~ " i d e o frame 5K video frarne - + l0K video frame
20K video frame - -- ----
Memory node
Figure 4.9 Cell drop rates for average paging load in the ATM system
Figures 4.1Oa and 4. lob present the average data delays for the high paging load, while Figure
4.1 1 shows the corresponding ce11 drop rates. In the case of a heavy paging load, the resuits are
quite favourable for the bus system. As seen on Figure 4.10, the ATM interconnect experiences
significantly longer delays compared to the bus system. The cell drop rates are also high, and
reach up to 7%. Therefore, the impact of the delay of the retransmitted cells on the average ce11
delay can be considerable. However. the utilization of the bus is quite low, around 35%. One can
expect that for higher utilization values, the bus delays will be comparable to the ATM
interconnect delays. In the current fom, the simulator is not capable of producing higher bus
utilization, because of the small number of modelied devices.
It is worth noting that in the ATM system only the memory node, acting as a destination for two
data streams, expenences larger delays and drop rates. The other nodes maintain a very good
level of performance and a lossless operation as in the previous scenarios.
4.5 Discussion
4.5.1 Performance
The previous sections present a performance cornparison between the proposed ATM
interconnect and a generic bus. These interconnects were subjected to different loads in four
different scenanos. The analysis of the results obtained fiom the simulation runs indicates that
there are some cost/performance trade-offs that should be considered in the design of the ATM - based workstation.
The first scenario shows the base performance equivalence of a genenc bus and the ATM
intercomect. Both systems exhibit similar performance under identical load with no queuing
delays and interfenng traffic present, which serves as a basis for a companson of the two systems
under different load configurations.
The second scenario illustrates the improvement of performance resulting fiom using an ATM
interconnect instead of a bus. A significant performance degradation in the form of longer delays
can be observed in the bus system, with the ATM systern delays remaining small and constant.
This delay gap between the two simulated interco~ects is especially visible for large data rates
Average delay: browsing, paging, and vicko transmit Paging at I E-5, page size 8K
W e o ircific ut ZûûMbps willi data irnit ZOK
9 50 100 150 200 Data mtc (Mbps)
Average delay: browsing, paging, and video transmit P aging ut I E-5, page size 8K
Vidm tratc ot ZûûMbps with data unit 20K C A L W
1 0 1OOO C
800
600 Y C
1 C
s 200
01 9 50 100 150 200
Data rate Wps)
Figure 4.10 a) Average and b) maximum teleconferencing delays as experienced by the memory node with high paging rate
Cell drop rate: browsing, paging, and video transmit - -
Paging at Ï E - ~ , page size 8K Video trafic at 2OûMbps with data unit 20K
9 50 100 150 200 Data rate (Mbps)
- -A- - - --- - - - -- - -- --- -.)- 1 K video frame - + 5~ vide~ frame + 10K video frame
2OK video frame A --
Memory noûe
Figure 4.1 1 Cell drop rates for high paging load in the ATM system
of the load. This behaviour is expected. because while in ATM system the two teleconferencing
streams are transmitted over separate point-to-point channels, the bus shows signs of their
interference.
The impact of high-bandwidth data streams on the performance of the memory system is
analyzed in the third scenario. The bus memory system exliibits better performance for low
teleconferencing data rates, while the ATM interconnect is supenor for high data rates. The delay
of the ATM memory system is constant and independent of the load data rate, because the
memory-disk connection is separatc from the video-network connection. The performance of the
bus memory system gets progressively worse with the increasing teleconferencing data rate. At a
crossover point, the bus delays become longer than for ATM. The crossover point depends on the
paging load and the data rate of the teleconferencing connection. which determine the bus
utilization. The crossover point is centered around the teleconferencing data rate of 100 Mbps,
which corresponds to the broadcast-quality NTSC video. For comparison, high-quality image
transmission, such as for example studio-quality NTSC and high-definition video, requires
bandwidths far surpassing the 100 Mbps mark [25]. Therefore. to obtain better performance of
the memory system with high-quality images, the ATM workstation is preferred to the bus
workstation. At the same time, the delays experienced by the teleconferencing streams are similar
to the ones fiom the second scenario. where the ATM interconnect is superior to the bus
interconnect for al1 load data rates.
The last scenario, in which memory is a destination device for two different data strearns,
exposes a cost/performance trade-off for the ATM intercomect. The performance of the bus
systern is far supenor to the ATM system. The ATM switch output buffer overflows where two
data streams compete fgr the same output port, which should be avoided in order to maintain
good performance of the ATM interconnect. Dedicating a separate port for each of the competing
data streams brings the delays back to the data rate independent and constant minimums, as
observed in scenano two and three; two switch ports should be dedicated to the memory node,
one for the network-memory browsing comection, the other for the memory-disk paging.
When designing an ATM interconnect, the ports of the ATM switch must be carefully allocated,
with any data streams competing for the same output port and their respective cliaracteristics
taken into account. The allocation of additional ports increases the overall cost of the
workstation, since the ATM switch with a larger amount of ports may be needed to facilitate al1
necessary comections. If some connections can accept the performance penalty indicated by the
graphs fiom scenano four. using only a single pon can krep the workstation cost low.
4.5.2 Resuifs relia biliîy
There are three limitations of the simulator that have an impact on the reliability of results
obtained fiom the simulations. Each of them is discussed below in an attempt to estimate the
degree of error they may bring into the conclusions.
4.5.2.1 Control information
The first of these limitations is the lack of implementation of the control information commands.
In the case of the ATM system, one has to deal with two types of control cells: connection
setup/termination and read/write request cells. The impact of the connection setup/termination
cells on the ATM system performance is negligible. The number of conaol cells per connection
is usually limited to three: setup, connect. and comect acknowledgement cells [2]. Considering
the large amounts of cells transferred during a connection, the delay increase is unnoticeable. The
reacUwrite request cells are used during the paging operations. One readwrite ceil is sent through
the interconnect per disk page. This additional traffic is therefore less than 1.4% of the total
paging traffic. There are no read/write request cells in the teleconferencing process, only the data
and setup/termination cells described above.
The impact of the control information is higher in the case of the bus system. ïhere are typically
five words required to set up each DMA transfer [36]. The words contain the DMA destination
address, the amount of data transferred, the direction of the transfer, the source data address, and
the execution start cornmand. Hence, the trafic increase can be estimated at less than 8% for a
256 byte DMA block and less than 4% for the 5 12 byte block. Since al1 control information is
sent over the bus, the bus transfers fiom a11 sources are affected. The burden of the
teleconferencing c o ~ e c t i o n establishment cannot be easily calculated, because it depends on the
type of the extemal network and the features of the physical network interface device used.
However, as in the case of the ATM. connection setup occun infrequently and should have
negligible impact on the simulation metrics.
4.5.2.2 DMA mode selection and bus ownership transfer
The simulator implements DMA burst mode, but not the cycle-stealing mode. The main reason is
that cycle stealing is advantageous only when the DMA transfers occur on the same data path as
the CPU-rnernory traffic. Cycle stealing sends the contents of the DMA block one word at a t h e ,
separating each word transfer by several bus cycles, to allow CPU access to the memory. Burst
mode sends the whole DMA block in one unintempted transfer. blocking the CPU access to the
bus and the memory for the duration of the transfer. Since the simulated bus connects only the
peripheral devices and does not provide the CPU-memory connection, burst mode is preferable
to cycle-stealing mode. In addition. cycle stealing brings a significant overhead associated with
the bus ownership transfer. For every word transferred, two bus cycles are wasted for obtaining
and releasing the bus mastenhip. In burst mode, bus ownership changes occur only at the
bçginning and end of the D M A block transfer. The additional traffic due to the bus ownership
change in burst mode can be estimated at 3.125% for a 256 byte DMA block, and 1.56% for a
5 12 block, with the assumption that bus ownership transfer requires a single bus cycle to
complete.
4.5.2.3 Ce11 retransmission
In scenario four, ceIl &op rates of up to 7% can be observed. However, this drop rate is not
reflected in the simulation results, because the simulator does not have the ability to request and
retransmit the dropped cells. For each dropped cell, two control cells should be generated, one
destined for the CPU, and one for the device from which the dropped ce11 originated4. Based on
this information, there will be an increase in the delay for al1 cells present in the switch queues
associated with the CPU and the originating device. Since the CPU node is not implemented, the
additional delays have no impact on the obtained results. The additional delays incurred in the
onginating device switch queue will be proportional to the &op rate of cells coming fiom this
'Refe r t o C h a p t e r 3 , page 53
87
node. There will also be an increase in the data rate from the originating device, which in turn
might further escalate the drop rate. However, the inclusion of the additional delays caused by
the ce11 retransmission in the simulation results will not change the conclusions drawn in
scenario four. It will only widen the performance gap between the ATM system and the bus
system.
4.5.2.4 Conclusion
The review of the possible effects of the simulator limitations on the generated results indicates
that those limitations do not change the in ferences drawn. If the existing limitations were
removed fiom the simulator, the increase in traffic would enforce the performance analysis
statements and conclusions.
Chapter 5
Conclusions
5.1 Introduction
The proposed multimedia architecture was intended to fulf i l l two main goals. The first goal was
to remove the constraints typically present in the bus-based systems. The second one was to add
new functionality which would allow a more efficient execution of the current and future
generations of multimedia applications. The ideal systern, while adding functionality, would also
match or exceed the performance of a bus-based system without being difficult or costly to
implement.
ARer describing the particulars of the ATM-based architecture in Chapter 3 and presenting the
performance evaluation in Chapter 4, i t can be concluded that the new architecture fulfils the
above mentioned goals.
5.2 Performance
Chapter 4 focuses on comparing the ATM intercomect central to the new architecture with a
bus-based system loosely based on the PCI standard. The graphs from different pnctical usage
scenarios indicate that the ATM-based solution exhibits superior performance in most tested
cases. The data delays through the interconnect, selected as the performance metrics, are
signiticantly smaller than the corresponding ones for the bus. Only in the case of multiple
devices sending the data streams to a single device is the delay for the ATM interconnect larger.
which can be remedied by allocating more ATM switch pans to this devices. However, even in
this case, the performance of al1 other nodes of the ATM system is still supetior to the bus-based
system. Another important characteristic of the ATM system is that its performance is
R e f e r to discussion in Chapter 4, page 82
89
independent of the load data rate", except when two data streams are directed to the same switch
output port. The performance of the bus degades with the increasing data rate and data unit size.
5.3 Features
The ATM system makes new functionality available for the curent and future generations of the
multimedia applications. It irnplernents in hardware, through the built-in features of the ATM
switch, the multicasting and broadcasting of the information. Al1 devices present in the system
can be both transmitters and receivers of multimedia information. This not only allows the
system to display the multimedia data, but also to process it in a variety of ways.
By niaking the intemal cell switching interconnect compatible with the ATM protocol, the need
for a separate network interface has been eliminated. The intemal interconnect becornes an
extension of the external ATM network. The reverse is also true; the external network is treated
in the same manner as any workstation cornponent.
The scalability of the new architecture is also superior to the bus-based architectures. The
addition of new devices to the system requires only an available port on the intemal ATM
switch. The added components do not degrade the overall performance of the interconneci as in
the case of a bus system. The total bandwidth of the interconnect can be easily increased without
system redesign by using an ATM switch with a Iarger number of ports.
5.4 Future Work
This section proposes possible topics for further study of the properties and performance
characteristics of the ATM multimedia system. The currently used simulator has many
limitations and provides only a first level approximation of the processes occumng in a real
computer system. Therefore, a more detailed simulation is needed to give more accurate
performance figures. It would include a larger number of devices communicating with each other
over the ATM interconnect. These devices would be able to send data to multiple destinations
' Refer t o d i s c u s s i o n i n Chapter 4 , page 82
90
during the same simulatiori run. Including the control processes in this new simulation would
also be essential.
The hardware implementation usually exposes a different set of architectural problems, which
could lead to fùrther modifications to the architecture. Therefore, building a hardware prototype
of the ATM system can provide more insight than the detailed simulation. The prototype would
not only allow further verification of the simulated results presented in Chapter 4, but also the
testing of the behaviour of the ATM interconnect under a real workload.
Since the ATM architecture requires a very different approach in handling data and hardware, a
new operating system would have to be developed. This operating system needs to be
'ATM-aware' because it would be required to directly interact with al1 peripherals and the
extemal network using the ATM protocol. It would also be responsible for proper ATM
interconnect scheduling to ensure that al1 data streams can allocate the required bandwidths. The
development of such operating system would in tum permit the testing of the performance and
system characteristics from a user perspective. Multimedia application could be executed on the
hardware prototype to vaiidate the practicality of the ATM multimedia workstation.
[1 ] Adam. Joel F.. Henry H. Houh, Michael Ismert, and David L. Tennenhouse [ 19941.
"Media-Intensive Oata Communications in a 'Desk-Area' Network," IEEE Communications
Magazine (August). (pp 60-67)
[2] A TM: UW-Nertwork Interjiuce Specici/icu&ion. Version 3.0. Englewood Cli ffs. NJ: Prentice
Hall, 1993.
[3] Burham, P., M. Hayter, D. McAuley, and 1. Pratt [1995]. "Devices on the Desk Area
Network," lEEE Journal on sekcted ureas in commitnicatons 13:4 (May). (pp 722-732)
[4] Bursky, Dave [1992]. "Memory-CPU interface speeds up data transfers," Ekctronic Design
(March 19). (pp 137- 142)
[5] Cheung, Nim K. [1992]. "The Infrastructure for Gigabit Computer Neiworks," IEEE
Communications Mugarine ( Apri 1). (pp 60-68)
[6] Chou, Chih-Che, and Kang G. Shin [1994]. "Statistical Real-Time Video Charnels over a
Multiaccess Network," Higll-Speed Netivorking and Muftimedia Computing, SPIE Proceedings,
volume 21 88, San Jose. CA. (pp 86-96)
[7] Comerford, Richard [ 1 9961. "Interactive media: an intemet reality," IEEE Spectrum (Apnl).
(PP 29-32)
[8] "Digital video compression on personal computers: algorithms and technologies."
Proceedings of SPIE. 7-8 Febntary 1994. San Jose, Cafi/ornia. Bellingham, Washington: S P I E ,
1994.
[9] Fanighetti, Robert (editor). The rvorld Almanuch and book of facrr 1995. Mahweh, NJ: Funk
& Wagnalls, 1995.
[IO] Finn, Gregory G. [1991]. "An Integration of Network Communication with Workstation
Architecture," ACM SIGCOMM, Cornputer Cornmunicurions Review (October). (pp 18-29)
[ l I l Foley, J. D., A. van Dam. S. K. Feincr. and J. F. Hughes, Computer graphies. principles und
practice. New York: Addison-Wesley, 1992.
[12] Gjessing, Stein, David B. Gustavson, David V. James, Glen Stone, and Hans Wiggers
[1992]. "A RAM link for high speed." lEEE Specrntm (October). (pp 52-53)
[13] Goldberg, Lee [1994]. "ATM switching: a bief introduction," EIectronics design
(December 16). (pp 87-103)
[14] Hayter. Mark David [1993]. A CVorkrtdon Architectirrr to Support Mitltirnedia. Ph. D.
Thesis, St John's College, University of Cambridge (September).
[15] Hayter, Mark, and Richard Black [1992]. "Fairisle Pon Controller Design and Ideas," ATM
Document Collection 2 (ûrunge Book). Cambridge University. (p23)
[16] Hayter, Mark, and Derek McAuley [199 1). "The Desk Area Network," AChf Operuting
Systems Review (October). (pp 1 4-2 1 )
[17] Heath, Michael T., Allen D. Malony, and Diane T. Rover [1995]. "The Visual Display of
Pardlel Performance Data," IEEE Computer (November). ( p p 2 1-28)
[18] Hennesy, J. L., and D. A. Patterson, Computer architecture a qituntitative approach. San
Mateo, CA: Morgan Kaufmam, 1993.
[19] Hohnem, K. H., B. Pflesser, A. Pornmert, M. Riemer, T. Schiemam, R. Schuber, and U.
Tiede [1996]. "A 'Virtual Body' Mode1 for Surgical Education and Rehearsal," IEEE Computer
(January). @p 25-3 1 )
[20] IEEE Drap Standard for High-Bandwidrh Memory Interface Based on SC1 Signalling
Technologv (RamLink). P1596.4. New York: IEEE, 1995.
[2 11 Jain, Raj, The art of computer svstems performance ana[vsis. New York: Wiley, 1991.
[22] Law, Averill M.. and W. David Kelton. Simulation modeling di analwvsis. New York:
McGraw-Hill, 1992.
[23] Leslie, 1. M., and D. R. McAuley [1991]. "Fairisle: An ATM network for the local area,"
Proc. ACM SIGCQMM (Septernber). (pp 2 1 -35)
[24] Levy, Roger, and Hank Kafka [1992]. "Are YOU ready for ATM," Telephony (Novernber
30). (pp 32-35)
[25] Lyles, J. Bryan, and Daniel C. Swinehart [1992]. "The Emerging Gigabit Environment and
the Role of Local ATM," IEEE Commrtnications Magazine (Apnl). (pp 52-58)
[26] MB86680B ATM Swirch elment (SRE) Dura Shm. Fujitsu Limited. 1994.
[27] Ota, Hiroo [1992]. "Rarnbus vs synchronous DRAM," Ekctronic Engineering (November).
(pp 104- 105)
[28] Perry, Tekla S. [1996]. "The trials and travails of interactive TV," IEEE Spectntm (April).
(PP 22-28)
[29] Prabhat, K. Andleigh, and Kiran Thakrar, Multimedia *rems Design. Upper Saddle River:
Prentice Hall, 1996.
[30] Prycker, Martin de, Asynchronoiis rransfer mode: solution for broadband ISDN, 2nd ed.
New York: E. Horwood, 1993.
[3 11 Robb, R. A., D. P. Hanson, and J. J. Camp [1996]. "Cornputer-Aided Surgery Planning and
Rehearsal at Mayo Clinic," [EEE Cornputer (January). @p 39-47)
[32] Roy, Radhica R. [1994]. "Networking Constraints in Multimedia Conferencing And the
Role of ATM Networks." AT& T Technical Joiirnal (JulflAugust). (pp 97- 108)
[33] Soaliyappan, M., T. Poston. P. A. Heng. E. R. McVeigh, M. A. Guttman, and E. A.
Zerhouni [1996]. "Interactive Visualization for Rapid Noninvasive Cardiac Assessment," IEEE
Cornputer (January). (pp 55-6 1 )
[34] Spragins, John D.. Joseph L. Hammond, and Krzysnof Pawlikowski, Telecommunications:
Protocols and Design. New York: Addison-Wesley, 199 1.
[35] Steinmetz, Ral f, and Klara Nahrs tedt, Muitimediu: computing, commtrnicutions and
applications. Upper Saddle River: Prentice Hal 1, 1 995.
[36] Vranesic, Zvonko G., and Sa fivat G. Zaky, hlicrucompirter structures. New York: Saunders
College Publishing, 1989.
Appendix A
Tabulated Simulation Results
A. 1 Uninterrupted Video Transfer
II -p - -- -
DMA size=5 11 bytes; Vidco frame s i x = I K ~ K , I OK, 2OK II II Transfer data rate (Mbps) If
1 9
II DMA sizc=256 bytes; Vidco frame sizc= I K, 5K, IOK, 20K II
ATM average ATM maximum
Bus average Bus maximum
Delay - Video node (clock cycles) 50
11 Delay - Video node (clock cycles) 11
100 1 150
57 57
64.5 128
Transfer data rate (M bps)
II Bus average 1 32.5 1 32.5 1 32.5 1 32.5 1 32.5 11
200 1 9 1 50
ATM average BUS maximum
Ii Bus maximum 64 1 64 1- 64 1 64 1 64 1 1
57 57
64.5 128
100 1 150
57 57
64.5 128
57 57
64.5 128
57 57
57 57
64.5 128
57 57
57 57
57 57 57 57 I
A.2 Two-Way Uninterrupted Teleconferencing
DMA size=5 12 bytcs
DMA size=256 bytcs 1 1
System type
ATM
Bus
ATM
Bus
*
II Systcm typc 1 Vidco fnmc (bytes) / Avcngc dclay - Vidco nodc (clock cycles) ,
Vidco fmmc (bytcs)
al1
1 K
5K
1 OK
20 K
alt
1 K
5K
1 OK
20 K
Transfer data rate (Mbps)
II 1 1 Maximum delav - Video node (clock cvcles) 1
9 1 50
II ATM al1 57 l 57 57 1 57 I 57
Transfer data rate (Mbps)
Maximum dclay - Vidco nodc (clock cyclcs)
Bus
57 384
1,267
2,462
4,931
1 O0
9 50 1 O0 1 150
57
473
1,528 3,120 5,315
150
200
Avcragc dchy - Vidco nodc (clock cycles)
200
57 64.647
68.578
73.387 77.639
57
591
2,290
4,267
8,508
57
65.393
86.81 9
1 15.449
17 2.029
57 582
2,775 4,558
8,689
57 71 8
2,742
5,152
10,131
57
66.283 109.372
168.632 286.349
57 67.286 132.994
223.833
408.303
57
68.349
156.993
279.1 17 520.239
A.3 Two-Way Teleconferencing With Average Paging Load
A.3.1 Video node
DMA sizc=5 12 bytcs
I Transfer data rate (Mbps)
Bus
1 Systcm type
ATM
ATM
Bus
DMA sizc=256 bytcs
Vidco franic (bytes)
al1
dl
1 K
5K
I 0K
20 K
Systcm typc
ATM
Bus
r
ATM
Bus
Avcragc dclay - Vidco nodc (clock cyclcs)
Maximum dclay - Vidco nodc (clock cycles)
Vidco frdrnc (bytcs)
all
1 K
5K
1 OK
20K
ail
1 K
5K
10K
20 K
57 1 57
57
304
1,267 2,462 4,931
57 1 57 1 57
Transfer data rate (Mbps)
Maximum dclay - Vidco node (clock cycles)
57 753
1,813
3,513
5,315
9
57 387
1,267
2,462 4,931
57
749
2,290 4,835
8,508
50
57 695
1,877
3,577
5,379
Avcragc dclay - Vidco node (dock cycles)
57
1,089
2,?75 4,558
8,689
1 O0 1 150
57
33.15
38.028 43.953 49.193
57 862
2,290 4,963 8,508
57
1,046 2,742 5,172
10,731
200
57
35.488 60.395 90.25
148.028
57 1,209 2,839
57
1,173
2,742
57
38.245 86.ï72
147.605 268.155
4,558
8,689
57
41.253 114.463
207.938 396.79
5,236
10,195
57
44.391 142.91
267.91 2 51 3.452
4
A.3.2 Memûry node
DMA sizc=5 12 bytcs; Pagc sizc=4K; Pagc fault probability= 10"
Transfer data rate (Mbps)
Systcm typc 1 - ATM
Il Maximum dclay - Mcmory nodc (clock cycles)
Bus
Vidco fnmc (bytes)
al l
DMA sizc=Z56 bytcs; Pagc size=.?K; Page fault probability=104
Averagc delay - Mcmory node (clock cycles)
113 1 113 1 113 1 113 1 113
5K
1 OK
ATM
Bus
Transfer data rate (M bps)
Systcm type Vidco fnmc (bytcs) Avcnge dclay - Mcmory nodc (clock cycles)
ATM al1 113 1 113 1 113 1 113 1 113
69.361 72.787
al1
I K
5K
10K
20K
Bus 1 OK 42.287 76.563 123.375 200.407 241 343 ZOK 36.756 72.869 140.747 202.605 328.856
100.535
102.1 96
113 384
1,267 2,462 4,931
Maximum dclay - Mcmory nodc (clock cycles)
ATM al1 113 113 113 113 113 - 1 K 1 576 768 873 1,216 1,203
130.44 141.932
113 753
1,813 3,513 5,315
Bus
165.078 208.049
113 749
2,290 4,835 8,508
208.907 242.529
5K
IOK
$13 1,089 2,775 4,558
8,689
113
1,046 2,742 5,172 10,131
1,013
1,022 1,429 3,072
2,029 3,411
2,240 3,180
2,185 4,950
A.4 Two-Way Teleconferencing With Heavy Paging Load
A.4.1 Udeo node
DMA size=256 bytcs; Pagc sizc=8K; Pagc fadt probabilip IOe5
Transfer data rate (Mbps) 9 1 50 1 1 O0 150 200
Systcm type Vidco fnmc (bytes) Avcngc dclay - Vidco nodc (clock cyclcs) ATM al l 57 57 57 57 57
1 K 36.778 41.407 47.138 54.078 61.41
Bus 5K 63.27 86.092 118.603 153.08 188.575 IOK 98.106 140.373 208.667 275.207 348.076
II 1 1 Maximum dclav - Vidco nodc (dock cvclcs)
Bus 5K 2,855 3,603 4,240 6,511 5,529 '
1 OU 4,240 5,424 7,901 7,467 8,105
20K 6,752 + 9,664 9,864 10,673 12,631
II DMA sizc=5 12 bytcs; Pagc sizc=8K; Pagc fault probability= l oS5 II Transfer data rate (Mbps)
Bus 5K 91.015 110.498 138.269 167.71
I OK 124.91 1 163.648 226.83 288.256 355.285
9 1 50
Systcm typc
ATM
II 1 1 Maximum delay - Video node (dock cycles) II
1 O0
Video fnmc (bytcs)
dl
1 K
ATM
Bus
150
Avcrage delay - Video node (dock cycles)
200
57
66.37
al1
1 K
5K
I OK
20K
57
68.305
57
1,725
2,79f 4,112
6,688
57
70.845
57 2,094 3,411 5,232
9,472
57
74.174
57
3,207
4,112 7,773
9,800
57 77.872
57 2,671
6,383 7,330
10,609
57 3,847
5,401 7,786
12,503
A.4.2 Memory node
DMA sizc=256 bytcs; Pagc sizc=8K; Pagc fault probabilityr10"
Transfer data rate (Mbps)
Systcm type Vidco framc (bytes) Avcngc dclay - Mcmory nodc (clock cycles) ATM dl 113 113 113 113 113
1 K 65.156 121.35 187.934 251.083 311.238 / Bus 5K 68.895 142.765 230.202 315.331 404.225
1 OU 72.1 37 144.692 248.206 340.712 470.097
Maximum dclay - Mcmory node (clock cyclcs) ATM al l 113 113 113 113 113
1 K 2,171 2,387 3,483 3,044 4 , m
Bus 5K 3,386 3,640 3,889 6,508 4,733 I OK 4,667 4,367 7,257 6,995 8,173
20 K - 3,975 10,336 7,782 8,456 10,403
DMA sizc=5 12 bytcs; Pagc sizs=8K; Pagc fault pr~bability=IO-~
Systcm typc
ATM
Bus
. ATM
Bus
Vidco framc (bytes)
al1
1 K
5K
1 OK
ZOK
ail
1 K
5K
10K
20K
Transfer data rate (Mbps)
Maximum delay - Memory node (dock cycles)
9
113 3,932
4,482 7,917 10,339
113 2,1 07
3,258 4,539 3,91 f
200 50 1
Avcngc dclay - Memory nodc (dock cycles)
113 2,259
3,448 4,175 10,208
113 90.567
97.14
100.492
97.594
1 O0
113 3,294
3,685 7,065 7,590
150
113 126.65 162.342
166.785 184.855
113 2,754
6,380 6,867 8,328
113 171.01
239.657
260.53 2M.326
113 214.851
314.927
344.594
416.503
113 257.34 392.357
462.069 555.537
A.5 Browsing And Two-Way Teleconferencing With Average Paging Load
DMA sizc=5 12 bytcs; Pagc sizc=4K: Pagc fault probability= 10b;Vidco data ntc=200Mbps l
I
1 1 Transfer data rate (Mbps) 1
Systcm typc Data unit sizc (bytcs) Avcrrige dclriy - Mcmory nodc (dock cycles)
1 K 102.519 104.634 112.175 114.152 1 118.255
ATM 5K 102.895 104.151 110.32 113.133 1 I l 5.a09
Bus 5K 1 150.383 1 163.777 1 1 200.362 1 217.02 - - -
IOK 227.053 1 26 1.232 286.072 303.735 322.327
20 K 448.818 [ 452.13 481.202 514.903 525.589
A.5.2 Cell drop rote for the A TM systern
-- -- -- -
DMA sizc=5 12 bytcs; Pagc sizc=4K; Pagc fault probability= l Oa;Vidco data ratc=200Mbps
Transfer data rate (Mbps) 9
Systcm type
ATM
50
Data unit sizc (bytcs)
1 K
5K
1OK
20K
100 1 150 1 200 - --
Cc11 drop ratc (%)
0.14
0.26
0.3
0.3 1
O
0.24
0.28
0.3
0.07
0.24
0.28
0.32
0.03
0.22
0.26
0.32
O. 1
0.24
0.3 1
0.33
A.6 Browsing And Two-Way Teleconferencing With High Paging Load
ATM
..
DMA sizc=5 12 bytes; Pagc sizc=8K; Page fault probability=l~'~;~idco data ratc=200Mbps
I I ! 20K ( 426.914 1 800.329 1 915.889 1 1,021.89 1 1,024.16 11
Systcm type
A. 6.2 Ceil drop rate for the A TM systern
Data unit sizc (bytcs)
I K
5 K
10K
Bus
Transfer data rate (Mbps)
1 K
5K
DMA sizc=5 12 bytcs; Pagc sizc=8K; Pagc fault probability= I 0'5;Vidco data ratc=200Mbps
9
Systcm type
ATM
139.933
226.398
50
Data unit sizc (bytcs)
l K
5K
1 OK
20 K
100 1 150 1 200
Avcragc dclay - Mcmory nodc (clock cycles)
142.69
235.167
448.047
380.989
385.861
Transfer data rate (Mbps)
144.148
250.97 1
9
863.583
688.128
724.578
15 1.883
274.532
962.9
797.586
815.033
165.977
294.39
200 50
CclI drop ntc (%)
977.285
856.863
875.057 ,
1 O0
5.58
6.19
6.69
7.16
0.12
1.85
2.35
2.64 .
978.809
89û.743
906.684
150
1.64
4.3
5.35
5.64 . . -. . .
3.5
5.19
5.95
6.5
4.68
5.77
6.53
7.2 1 . . .
" W . . .-- L W ' \Lu- 1 I V I Y
TEST TARGET (QA-3)
APPUED I W G E . lnc 1653 East Main Street - -. - - Rochester. NY 14609 USA -- -- - - Phone: 7161482-0300 -- -- - - Fax: 71 6/288-5989
O 1993. Applied Image. Inc.. All Rights Reserved