Tomasz Solkowski - University of Toronto T-Space...Multimedia Workstation Architecture with ATM lnterconnect Tomasz Solkowski Master of Applied Science. 1997 Department of Electrical

Multimedia Workstation Architecture with ATM Interconnect

Tomasz Solkowski

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Graduate Department of Electrical and Cornputer Engineering, University of Toronto

0 Copyright by Tomasz Solkowski 1997

National Libraiy 1+1 of Canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395, rue Wellington Onawa ON K1A O N 4 OttawaON K1AON4 Canada Canada

Your li& Votre reterence

Our file Notre relerence

The author has granted a non- L'auteur a accordé une licence non exclusive licence dowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sel1 reproduire, prêter, distribuer ou copies of this thesis in microforrn, vendre des copies de cette thèse sous paper or electronic formats. la fonne de microfiche/filrn, de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sms son permission. autorisation.

Multimedia Workstation Architecture with ATM lnterconnect Tomasz Solkowski

Master of Applied Science. 1997

Department of Electrical and Computer Engineering University of Toronto

Abstract

This thesis describes a multimedia workstation architecture, which uses an ATM switch rather

than a traditional system bus, for connecting both multimedia and non-multimedia peripherals.

This architecture is intended to eliminate the performance bottlenecks which are present in a

typical bus-based workstation during transfers of high bandwidth information among the internai

workstation cornponents and between the workstation and the external network.

The thesis discusses other recently developed mu1 timedia workstation architectures, their

advantages and shortcoming. The detailed structure of the proposed workstation architecture,

including the CPU, memory, video display, disk, and ATM switch, is discussed.

To offer a performance cornparison between the proposed architecture and a genenc bus-based

systern, a cornputer simulation was created to show the delays for both types of in terco~ect

when subject to typical multimedia data streams. The simulation shows that the performance of

the ATM workstation can be up to nine times better compared to the bus-based workstation for

typical high bandwidth mu1 timedia ioads. However, the performance of the proposed ATM

architecture is highly dependent on the nurnber and the pattern of connections among

workstation peripherals through the ATM switch.

Acknowledgements

1 would like to thank my supervisor, Professor Safwat G. Zaky, for his guidance and counselling

throughout the last two years of my research. His help with technical and every day problems, as

well as the incredible attention to detail have been invaluable in the process of writing this thesis.

i am very gntehil to my family, Grazyna, Helena, and Andrzej, for their moral and financial

support. The love and support of my fiancee, Ewa, cannot go unnoticed. Her unconditional

understanding, always believing in me, as well as the constructive criticism have helped me

tremendously in my academic efforts.

1 must also inciude thanks to the Information Technology Research Centre and to the Univenity

of Toronto for their generous financial support.

Table of Contents

List of Figures ................................................... xi List of Tables .................................................. xiii Glossary ....................................................... xiv Chapter 1 Introduction ........................................... 1

1 . 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 2 Background ........................................... 6 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Multimedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1. The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2. Multimedia modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.3. Multimedia workstation components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.4. Time constraints of multimedia traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3. Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1. Description . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2. The ATM cell format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.3. Advantages of ATM .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4. Multimedia workstation architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.2. VuNet . . . . . . . . . . . . . . . . . . . .... .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.2.1.Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2.2. lmplementation details ....................................... 21 2.4.2.3. Critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.3. Netstation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.3.1. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.3.2. lmplementation details ........................................ 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3.3. Critique 29 2.4.4. Desk Area Network (DAN) ......................................... 30

...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4.1. Architecture 30

2.4.4.2. lmplementation details . . . . . . . . . . . . . . . . . . . ... .... . . . . . . . . . . . . . 33 2.4.4.3. Critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5. MB86680B, ATM switch elernent from Fujitsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 3 Architecture ......................................... 39 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2. Architecture rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3. Architecture objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4. General structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.7. Interna1 ATM LAN . . . . . . . . . . . . . . . . . . . . . .. ... . . . . . ... 43 . . . . . . . . . . . . 3.4.2. lnterconnect topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3. lnterconnect flexibility 46 3.4.4. Network interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5. Basic workstation configuration 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6. Control path structure 50

3.4.7. Initial setup and typical usage scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.5. The components and their implernentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1. Systern board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.2. Disk storage node . . . . . . . . . . . . . . . . . . . . . . . . . . .. ... . . . . . . . . . . . . . . . . . 57 3.5.3. Live video processing node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.6. Cornparison with other multimedia architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 60

........................................... Chapter 4 Simulation 62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Introduction 62

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. ATM interconnect simulator 62

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1. Simulator cornponents 62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2. Simulator events 63

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3. Simulation metrics 65 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4. Simulation parameters 65

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5. Assumptions .... 66 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6. Limitations 67

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Bus simulator 68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1. Simulator components 68

................................................. 4.3.2. Simulation metrics 68 4.3.3. Assurnptions ....................................................... 68

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4. Limitations 70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Analysis and presentation of results 70

A.5.1. Memory node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A.5.2. Cell drop rate for the ATM system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

. . . . . . . . . . . A.6. Browsing and Mo-way teleconferencing with high paging load 103

A.6.1. Mernory node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.6.2. Cell drop rate for the ATM system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

List of Figures

Multimedia workstation architectures (a) using a bus. (b) using an ATM switch as an interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

OS1 reference rnodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 ATM cell structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Fields in ATM cell header as defined by UNI 3.0 . . . . . . . . . . . . . . . . . . . . . . . 17

VuNet multimedia architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Netstation interna1 LAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Netstation MOSAfC node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Typical configuration of DAN multimedia workstation . . . . . . . . . . . . . . . . . . . 31 Functional diagram of MB86680B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Block diagram of Fujitsu ATM switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

General structure of the architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Connection of basic system components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Data flow in system with a) separate utility nodes and b) utility nodes embedded into the peripheral nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Control data flow during a cell loss event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 ATM workstation with ATM circuitry integrated into rnass storage devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Configuration of a live video processing node . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Relation between ATM simulator objects and modelled physical devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Relation between the bus simulator objects and modelied physical devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Uninterrupted transfer delays as recorded by the video node ... . . . . . . . . 72 Average teleconferencing delays as experienced by the video node for DMA size of a) 256 bytes b) 512 bytes . . . . . . . . . . . . . . . . . . . . . . . . . 73 Maximum teleconferencing delays as experienced by the video node for DMA size of a) 256 bytes b) 512 bytes .... . . . . . . . . . . . . . . . . . . . . . 75

a) Average and b) maximum teleconferencing delays as experienced by the memory node with average paging .................. 77 a) Average and b) maximum teleconferencing delays as experienced by the rnemory node with high paging rate ................. 78

4.8 a) Average and b) maximum teleconferencing delays as experienced by the rnernory node with average paging rate . . . . . . . . . . . . . 80

4.9 Cell drop rates for average paging load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.10 a) Average and b) maximum teleconferencing delays as

experienced by the memory node with high paging rate . . . . . . . . . . . . . . . . . 83 4.1 1 Cell drop rates for high paging load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

xii

List of Tables

2.1 Bandwidths of typical video streams with acceptable MOS values . . . . . . . 12 2.2 Bandwidths of typical documents with acceptable MOS values . . . . . . . . . . 13

Glossary of Terms

AAL

A m

ATM

AVI

CBR

CLP

CODEC

CPU

DAN

DMA

DSP

FDDI

FIQ

GFC

GNU

HDTV

HEC

ISO

JPEG

Kbps

~ P S

LAN

Mbps

MBps

MAN

MOS

ATM Adaptation Layer

Advanced RISC Machine

Asynchronous Transfer Mode

Audio Visual Interleaved (audiolvideo compression standard)

Constant Bit Rate

Cell Loss Priority

Coder 1 Decoder

Central Processing Unit

Desk Area Network

Direct Memory Access

Digital Signal Processor

Fiber Distributed Data Interface

Fast Interrupt Request

Generic Flow Control

GNU1s Not Unix!

High Definition Television

Header Error Controi

International Organization for Standardization

Joint Photographie Experts Group (image compression standard)

Kilobit per second

Kilobyte per second

Local Area Network

Megabit per second

Megabyte per second

Metropolitan Area Network

Mean Opinion Score

MPEG

NNI

NTSC

OS1

PCI

PT

RAM

RISC

ROM

SCSI

SONET

UNI

VBR

VCI

VPI

WAN

www

Moving Picture Experts Group (video compression standard)

Network - Network Interface (ATM standard)

National Television Standards Cornmittee

Open Systems Interconnection

Personal Computer Interconnect

Payload Type

Random Access Memory

Reduced Instruction Set Computer

Read Only Memory

Srnall Computer Systems Interco~ect

S ynctuonous Opticai Network

User - Network Intedace (ATM standard)

Variable Bit Rate

Virtual Channel identifier

Virtual Path Identi fier

Wide Area Network

World Wide Web

Chapter 1

Introduction

Motivation

In the last few years an ever increasing demand for world-wide computer connectivity and

multimedia representation of information has been observed. Multimedia products for

microcomputers, like educational and entertainment software, training tools, and operating

system environments, allow easier and more natural interaction with the machines. Adding

music, voice, colour images, animation, and video to the textual information can greatly improve

the richness of available data. The shift towards multimedia representation of information was

made possible by fa1 ling costs of powerful microprocessors, improvements in cost and

performance of optical storage media (CD-ROM), and advances in digital signal processing

(DSP boards, video acceierators, sound and music boards).

At the same time we experience a tremendoiis growth in the use of global and local computer

networks. Many people realize now that access to global information is essential in successful

research, business activities, and persona1 development. The 1995 World Almanac [9] says that

" 15 million people in the U.S. and 25 million world-wide access Intemet regularly." Computer

networks being a huge global source of data contain increasingly large amounts of multimedia

information. Some of the most popular applications which make use of multimedia include

teleconferencing, video on demand delivered through the network, WWW browsers, remote

imaging tools in medicine, and Internet telephone and radio.

Combining the drive towards global connectivity with multimedia representation of data offers

new challenges in the design of computer systems and networks. In the design of modem

networks, one of the main objectives is to minimize the delays and the probability of information

loss in handling high volumes of multimedia trafic. Asynchronous Transfer Mode (ATM)

networks offer the required characteristics for low latency, high bandwidth communications; they

are beginning to replace the existing networks which were not designed witli such high demands

in mind.

On the other hand. computer hardware designers must ensure that the information exchange

between the network and connected workstations is handled eniciently. AIthough well suited to

locally handle real-time video, graphies, and sound, existing computer systems do not provide

adcquate interface to the high bandwidth networks. A new type of computer architecture is

needed to seamlessly integrate multimedia workstations with very high-speed networks and to

allow real-time processing of high-bandwidth information streams.

1.2 Objectives

Multimedia computer systems usually consist of a number of multimedia peripherals comected

to the processor, mernory, and storage devices using a system bus. They also provide a

connection to the external networks through a single centralized network interface. A typical

configuration of such workstation is illustrated in Figure 1 . la.

There are two major problems with this configuration. The system bus has a fixed bandwidth

shared among al1 connected devices. The multimedia information utilizes a large percentage of

this shared resource. If multimedia loads are allowed to occupy al1 available system bus

bandwidth, then the information exchanged over the bus by oiher devices will expenence

significant delays. To avoid those extensive delays, the bandwidth allocated to multimedia data

has to be lirnited. resulting in decreased quality of multimedia reception. Therefore, the system

bus does not allow efficient coexistence of both multimedia and non-multimedia data trafic.

The second problem with the system bus configuration invoives the transmission of high

bandwidth data between the intemal workstation components and the extemal network. The

devices currently used as network interfaces are not fast enough to support a continuous high

bandwidth multimedia traffic. This in turn limits the features and characteristics of distributed

multimedia applications, such as teleconferencing. Teleconferencing can only offer low frame

rate and small resolution of transmitted pictures in order to comply with the maximum

throughput restrictions of the curent network interfaces.

This thesis proposes an architecture for a multimedia computer system which offers a solution to

two speci fic performance boa1 enecks involving transfers of high bandwidth information streams

like multimedia and teleconferencing. The first bottleneck exists between the internal

components of a computer system and the external rietwork, the second one among the intemal

components themselves. The proposed architecture replaces a system bus with an ATM switch as

depicted on Figure 1.1 b. The switch, due to its crossbar structure, allows simultaneous exchange

of data through cmnections established between any pair of peripherals. Since the extemal

network is viewed as one of the system peripherals, the need for a separate network interface is

eliminated. The projected wide market acceptance of ATM influenced the choice of ATM as the

protocol for the switch.

A system using an ATM switch as an interconnect for the intemal components has many

advantages over a bus-based system. It allows a high level of connectivity between the system

components and the extemal ATM network and offers a high degree of parallelism in system

internal data flow. It also reduces the management and cmtrol burden of the CPU because

information transfers occur through pre-established connections not requiring the CPU

involvement. The scalability of bandwidth and number of component connections in the

switch-based interconnect surpasses that of a bus.

This thesis offers a comparison between the new architecture and a generic bus-based system to

evaluate the practicality and competitiveness of the proposed design. A computer simulation was

created to show the delays and the available bandwidth for both types of interconnect when

subject to typical multimedia and teleconferencing data streams. The thesis also provides insight

on how the propowd architecture compares with other novel mu1 timedia system architectures.

1.3 Outline

The second chapter of this thesis contains the background information needed to fùlly understand

the trade-offs involved in the design of the proposed architecture. This chapter introduces basic

multimedia concepts, surnrnarizes the ATM protocol, descnbes the features of the ATM switch

DISK MASS NETWORK STORAGE INTERFACE

SOUND VIDE0 SUBSYSTEhl SL'BSYSTEhl

1 SYSTEhl BUS

I E X E R S A L I SETWORK

I I I I DISK MASS ' STORAGE I

t I 1 L I

I

I SUBSYSTEM SUBSYSTEM I I

Figure 1.1 Multimedia workstation architectures (a) using a bus, (b) using an ATM

switch as an interconnect

used as an interconnect, and outlines the features of other architectures intended to efficiently

deal with multimedia and other high bandwidth data streams.

The details of the new architecture are presented in Chapter 3. The chapter begins with the main

objectives of the design, followed by the description of its general structure and implementation

details of individual components. The insight on how this architecture may influence system

performance is also discussed.

Chapter 4 presents the simulation environment used to compare a typicai bus system, loosely

based on the PCI standard, with the ATM system proposed in this thesis. The chapter outlines the

assumptions and simplifications made in the simulation and gives the results of the simulation

runs. This chapter also offers an analysis of the results obtained. Appendix A contains the

tabulated numerical results from al1 simulation nins.

The final chapter of the thesis points out the advantages and drawbacks of the proposed

architecture in light of the simulation results. I t also suggests possible research directions to

further veriw the usefulness of the new architecture.

Chapter 2

Background

2.1 Introduction

This chapter provides a shon review of the material forming the background of this thesis. It

assumes that the reader is already familiar with the ideas, technologies, and vocabulary in the

related fields. It is not intended to give an exhaustive coverage of the rnatenal, but to provide a

context for the work presented in this thesis.

The second section of this chapter gives a definition of multimedia used in the context of a

computer systems, as opposed to the one used in a general sense. It describes necessary

properties that allow the information streams to be classified as multimedia, and it gives practical

examples of some multimedia streams. It also outlines components that usually form a

multimedia workstation as well as their required physical properties such as bandwidth and

acceptable delays. These are the properties needed for seamless and acceptable presentation of

the information to the end user.

The multimedia workstation architecture introduced in this thesis has an ATM (Asynchronous

Transfer Mode) switch as its central component. Section three of this chapter briefly outlines the

main concepts behind ATM. The outline concentrates on the aspects that directly relate to the

thesis, namely the ce11 structure and the rnechanics of ce11 transfer and processing.

Section four presents previous research that deals with multimedia computer architectures. Three

different approaches with their advantages and shortcomings invoiving cost, performance. and

scalability are described. This thesis proposes an architecture which, while incorporating the

successfûl features of the above mentioned approaches, avoids their bottlenecks.

The chapter concludes with a section describing the operating features of a Fujitsu MB86680B

ATM switch which was selected as the interconnect for the thesis project. and as a mode1 for

performance simulation.

2.2 Multimedia

2.2.1 Th e deflnitiun

In order to establish the requirements for the multimedia architecture, the definition and

characteristics ~Tmuitimedia information should be considered first. According to the Amencan

Heritage Dictionary of English Language. multimedia is "the combined use of several media,

such as movies, slides. music. lighting, especially for the purpose of education and

entertainment". However, while describing multimedia as a mix of at least two different media,

this definition is not precise enough for the context of computer architecture and must be fùrther

refined.

Multimedia in a computer context entails the following four properties:

1. combination of several media

2. independence of combined media streams

3. computer integration

4. communication.

In computer system context, the information must contain several continuous and discrete media

to be called multimedia. Independence of media streams entails that their sources be independent

of each other. As an example, the text and video corne from independent sources (coniputer and

video camera), while sound and video from the same teleconferencing equipment do not (the

same video camera or recorder).

Computer integration suggests that al1 independent media are seamlessly conbined, controlled,

and processed by a single computer system. This system must aiso give the end user control over

al1 media strearns with the same or equivalent iùnctionality . The communication property entails

that multimedia information be easily exchanged between various multimedia systems and easily

transported over computer networks. The need comes from the observation that more and more

various media streams are stored and available on global networks. Based on those important

requirements, Steinrnetz and Nahrstedt [35] propose the following definition of a multimedia

computer system:

A multimedia system is characterised by computer controlled, integrated production, manipulation, presentation, storage, and communication of independent information, which is encoded at least through a continuous (time dependent) and a discrete (time independent) medium.

2.2.2 Multimedia modes

For each independent medium in a multimedia information stream, one must specify precise

timing rules to ensure acceptable quality when the infomation is presented to the end user. Each

type of medium can be assigned to one of the three transmission modes: asynchronous,

synchronous, and isochronous, depending on its timing constraints. Those constraints must be

taken into account when developing a multimedia computer architecture.

Asynchronous transmission does not enforce any time constraints on the flow of infomation.

Information is delivered on a "best effort" basis with no tirne guarantees. Examples of

asynchronous media 3re e-mail messages and files to be retneved in a non-interactive fashion.

Synchronous transmission dictates a maximum delay for each unit of information in a stream.

Such a limit is essential dunng the delivery of documents in an interactive user mode or in

transmission of video h m e s in real time on systems with ample intermediate storage. The video

frames which arrive too early are buffered in the intermediate storage so they can be presented to

the end user in a proper order. The buffenng eliminates the need to enforce minimum delays on

the information stream. The last group, the isochronous media, specifies both the minimum and

maximum delay for each media unit. The teleconferencing stream on a systern with a small

amount of intermediate storage would quali& as an isochronous information stream. The

isochronous transmission is also referred to as real-time transmission.

2* 2* 3 Multimedia workstation components

In view of the requirements imposed on a multimedia system by various media streams, a typical

system should contain at least the following components [29]:

general-purpose microprocessor

large primary storage (mrrnory)

huge permanent secondary storage (disks)

dedicated media processor

graphics, video, and sound equipment

communication adapters

The general-purpose processor is responsibie for the system management, operating system

tasks, and for standard (non multimedia) data processing. The primary stonge is needed for

processing, copying, and temporary storage of multimedia information; multimedia objects tend

to be of very large size. The secondary storage in the fom of hard disks. disk arrays, and read

only and rewritable optical disks, provides permanent archiving and distribution of multimedia

data.

The need for real-time processing of isochronous data which imposes a severe constraint on

processing times, entails the use of secondary, dedicated real-time processors (DSPs) to ensure

delay and delay jitter guarantees. Finally, the communication adapters are needed to provide

connectivity to extemal networks at the speeds required by the time-critical multimedia

applications. Multimedia information is displayed and presented to the end user by means of the

audio and video adapters, speakers, monitors, and other multimedia input/output equipment.

2.2.4 Time constraints of multimedia traffic

In order to quantiQ the time values of the multimedia traffic constraints, one should define what

initial criteria this trafic has to fulfill. Since al1 multimedia traffic eventually reaches the end

user of the cornputer system, the end user should provide some way of expressing the perception

of quality of the received information. A properly designed system should give a naturai feei

when working with information; the user should no! perceive any degradations such as glitches,

intolerable delays in teleconferencing, or lack of voice and video synchronization.

The industry criterion for estirnating user perception is known as a Mean Opinion Score value

(MOS). It is intended to give an "objective comparison of subjective testing. such as user's

perceived quality of network delay" [32]. The MOS varies in range from O to 5.0, where O

signifies the worst possible perception and 5.0 a perfectly natural perception of multimedia data.

Various delays, delay jitter. synchronization errors. and bit rate errors (ce11 loss rate) directly

influence the value of MOS. For example, in teleconferencing, the larger the delay jitter

introduced by the network and multimedia computer system. the more intempted and unnatural

the conversation between the end users will be. Hence, larger delay jitter contributes to the

reduction of the MOS.

Radhika Roy summanzes the multimedia system requirements needed to achieve acceptable

MOS values in [32]. He also discusses the minimum MOS required for a natural audio and video

communication. The MOS for audio should be between 4.0 and 5.0. At 3.5 conversation is still

possible. albeit with easil y detectable sound degradation. Human perception of video is more

forgiving. with an acceptable MOS value as low as 3.5. The one way end-to-end delay of an

audiohide0 stream should be no more than 150 ms to satisfy the above mentioned MOS, with

300 ms as a two way (return) end-to-end delay. The value of the delay jitter should be as low as

possible, but values round 250 ps are quite acceptable. Inter-media synchronization delays, for

example lip-synchronization errors, should fa11 in the range of -20 to +40 ms to be unnoticeable

to the end user. Intemptions in the receipt of continuous multimedia streams like video should

be minimal, implying at most one interruption (one ce11 lost) in about 40 minutes. Roy's

summary also quantifies acceptable system response time lirnits for user access in an interactive

mode, such as browsing or retrieving documents. The host system should provide a response to a

request in about 1-2 seconds for document retrieval, and about 0.5 second for browsing.

Roy provides detailed bandwidth figures for vanous teleconferencing configurations with their

appropriate MOS values. Selected system parameters that satisS the acceptable MOS scores are

presented in Tables 2.1 and 2.2. This data will be used in the performance simulation to speciQ

the bandwidths of comrnon mu1 timedia streams.

2.3 Asynchronous Transfer Mode (ATM)

2.3.1 Description

ATM is a method of transporting, switching, and multiplexing information over networks

[2, 5,341. Its features allow the high level of fiexibility needed to deal with the variety of

information types (both multimedia and traditional) exchangeci over the networks. ATM is

considered a connection-onented network protocol. but due to its flexibility it supports both

connection-oriented and connectionless services, as we!l as constant and variable bit rate

operations (CBR and VBR). The ATM protocol was designed to provide a good transport of

services like voice, data. still images. video, multimedia, and real-time information over a single

type of network, thus eliminating the need for separate, proprietary overlay networks for each of

these services.

ATM is confined to the upper half of layer 1. basic functions of layer 2, and parts of network and

transport layers of the OS1 (Open Systems Interconnection) network architecture model

developed by the International Organization for Standardization (ISO). The structure of the OS1

Reference Model is presented in Figure 2.1. ATM consists of two layers: the ATM layer and the

ATM adaptation layer (AAL). The ATM layer is common to al1 the services, while the AAL is

service dependent. The AAL adopts the information received fiom higher levels of the model to

the requirements of the ATM layer.

To transpon information over the network, ATM uses fixed size packets called cells. Each ce11 is

a collection of 53 octets, with 5 octets comprising a header. and the remaining 48 octets foming

the payload as depicted in Figure 2.2. The actual payload is usually reduced to 44 octets due to

the segmentation and reassembly control information added by the ATM adaptation layer.

ATM uses labelled chamel multiplexing to implernent addressing. Each ce11 header contains a

label called a comection identifier, which in turn consists of two subfields: VCI (virtual chamel

identifier) and VPI (virtual path identifiers). The VCI and VPI uniquely identiQ the destination

Vidco Data Dcscnption

Framcs per sccond Bits pcr samplc

Staridard TV quality, 720x480 pixels,

intcrlaccd. MPEG- I

Cablc TV, 360x480, non-intcrlaced,

MPEG-1

Low rate vidco-confcrcncing,

360x240, non-intcrlaccd,

MPEG-1

Low rat0 vidco-confcrcncing,

360x240, non-intcrlaccd,

MPEG-I

- C o n s t a n t bit r a t e

-Variable b i t rate

Transmission modc

CBR'

CBR '

Comprcsscd bit rates in MbiVs

Table 2.1 Bandwidths of typical video streams with acceptable MOS values

ASCII tcxt, 8 . 5 " ~ i I " pagc

Data type Uncornpresscd objcct sizc (Mbit)

8 . 5 " ~ l l n C O I O U T page, 200

pixcldinch, 24 bi ts/pixcl

8 . 5 " ~ l 1 " colour pagc, 400

p ixc lhch , 24 bitsipixel

8 . 5 " ~ l 1 " colour page, 800

pixcldinch, 24 bitsipixcl

Pcak bandwidth rcquiremcnts for objcct browsing

(Mbit/s)

8 . 5 " ~ 1 1 " colour pagc, 1600

pixcls/inch, 24 birs/pixcl

Table 2.2 Bandwid ths of typical documents with acceptable MOS values

Typical achicvablc compression ratio

90

359

1436

Pcak bandwidth requircments for objcct rctrieval

(Mbitk)

5744

10-20

10-20

10-20

4.5-2.3

18-9

72-36

10-20 287-144

ORIGINATING INTERMEDIATE DESTINATION NODE NODE NODE

Application laycr

Presentation layer

Data link Iayer + w Data link layer 4 Data link layer

4 4

Transport layer

I r

Physical layer 4 + Physicailayer 4 Physicül iayer

Application laycr

4 ~ ~ u i v a l l ,

1 Physical communication path 1

Presentation layer

b I 1

Figure 2.1 OS1 reference mode1

Transport layer

Cell header

1 AAL header

Ce11 payload

1 AAL trailer

Figure 2.2 ATM ce11 structure

O Cell octets

of the cell. Since ATM serves only a role of a transport rnechanism, it can support a wide range

of both present and future network protocols such as Token Ring, Ethernet, or Fast Ethernet. The

ATM, in combination with SONET (the physical layer protocol), achieves a high bandwidth

efficiency of round 80% (both layen 1 and 2 of the OS1 model), which compares very well with

other packet switching systems like FDDI (80% efficiency for the physical layer lone).

2.3.2 The A TM cell format

The 53 octets of an ATM cell are divided into two parts: the cell header and payload. The 5 octet

header contains the control information of the ATM layer. The header is further subdivided into

six fields. The structure of the ce11 header is presented in Figure 2.3.

The UNI 3.0 (User-Network Interface) and the NNI (Network-Network Interface) standards

developed by the ATM Forum speciQ the use of each field:

GFC: Generic Flow Control. This field can be used by the ATM customer to implement flow control or other local functions. Therefore. GFC has only local significance, and it is overwritten by ATM switches on the network side (public ATM switches). This field does not exist in the N N I standard.

VPINCI: Virtual Path IdentifierNiflua1 Channel Identifier. Boih fields uniquely identib the destination of the ceII and are used for routing the cell through the ATM network. The size of the VCI and VPI varies and is negotiated between the user and the network. The size of these fields is different in NNI and UNI standards.

PT: Payload Type. This tield indicates whether the ce11 carries user data or management and control information. Network congestion information c m also be encoded here.

CLP: Cell Loss Prionty. This single bit field indicates the priority of a cell when congestion occurs and the network must discard some cells. A value of "O" marks the cell as being of higher priority. so it will be discarded only after al1 cells with a CLP of" 1" have been discarded.

HEC: Header Error Control. The physicai layer uses this field for detection and correction of errors that affect the ce11 header.

2@3@3 Advantages of A TM

The characteristics of ATM make it a good choice for integrating different services like voice,

data, and multimedia over a single network. This allows significant cost reduction in operation

Octet bits

GFC

VCI

VCI

~ HEC

VCI

Cell octets

Figure 2.3 Fields in ATM cell header as defined by UNI 3.0

PT CLP

and maintenance of the network, since there is a single network with a single standardized

interface for al1 services. In addition. the flexibility of ATM enables the introduction and

implementation of new services with yet unknown characteristics, with minor or no

modifications. Since ATM multiplexes and switches cells asynchronously, the allocation of

network bandwidth is much more flexible than with synchronous protocols. Asynchronous

multiplexing is possible because ATM is connection-oriented and because each ATM ce11

coritains the VPINCI fields with the information sufficient to reach its destination.

The ATM ceIl size was chosen as a trade-off between the small voice packets used in telephony

today and large packets preferred in data transmission in the cornputer communication. Fixing

the size of a ceIl greatly increases the speed of multiplexing and switching, and reduces the

complexity of the buffen and queue management in the switching nodes. It is also much easier

and cheaper to implement switching and multiplexing of tixed size rather than variable size data

units. The fixed size of the header adds to the speed of the ATM switches, since only the header

is processed and the payload will follow its header afler procrssing. Unfortunateiy. the ce11

headers add 9.43% overhead to the useful user information. The speed and flexibility of the

ATM equipment translates into smaller delay and delay jitter in cornparison to other packet

switching technologies.

2.4 Multimedia Workstation Architectures

2.4.1 Introduction

This section outlines the most recent research in the field of multimedia computer architectures.

The three projects described here are:

1. VuNet fiom Telemedia Networks and Systems Group at the MIT Laboratory for Computer Science

2. Netstation fiom University of Southem California/Information Sciences Institute

3. Desk Area Network (DAN) fiorn University of Cambridge Computer Laboratory.

Each description is divided into three subsections. The first subsection outlines the major

architecture objectives, while the second one gives the details of the implementation, the test

environments, and some test results. A discussion of the advantages and the shortcomings of the

architecture forrns the third subsection.

2.4.2 VuNet

2.4.2.1 Architecture

The first workstation architecture [1] reviewed here is a network of general-purpose computers

and various shared multimedia peripherals as seen in Figure 2.4. When augmented with the

mu1 timedia peri pherals, the general-purpose machines become rnul timedia workstations capable

of displaying and processing high bandwidth data strearns. While presenting a new way of

working with multimedia data, VuNct does not alter the interna1 structure of the general-purpose

machines.

The designers of VuNet have chosen ATM as the network connecting the computers with the

multimedia devices and other networks. The multimedia devices like video compressors and

decornpressors, cameras, and displays, f o m separate nodes of this ATM network and connect to

i t directly through their own interfaces as depicted in Figure 2.4. The main objective in the

design of this system is to go beyond the ability of current systems, which only store or display

multimedia data, and to allow real-time processing of multimedia by general-purpose computers.

The applications running on the VuNet computers are able to directly receive the multimedia

data and perfom various tasks, such as stationary filtering, motion detection, and edge filtering.

The VuNet computers do not have any built-in multimedia devices. The creators of VuNet argue

that workstations using built-in multimedia components have many more disadvantages and

limitations when cornpared to the "universal" shared peripheral devices proposed in the VuNet

project. Since the built-in components are connected to the computer VO bus, the high bandwidth

multimedia trafic significantly reduces the bus bandwidth available for other data traffic. They

are usually designed for a single application, like teleconferencing, and are tied to one particular

hardware platform. On the other hmd, the VuNet peripherals, hooked up directly to the gigabit

ATM network, are universal and platfonn independent.

Comprc.uor and

Decsmprc?*or naûc

Gmcnl -purpc compuicr

ATM nierf fa cc

Vidco Cameri nodc Lidca Display

nodc

ATM intcrfacc

Mul~iproccuor nodc

Figure 2.4 VuNet multimedia architecture

In order to make their system flexible and portable, VuNet developers utilized a

sofhare-intensive approach. They simplified the network hardware as much as possible and

shifled most of the processing, including media data processing. network protocol tasks, and

network control, to the processors of the general-purpose cornputers. Complex functionali ty of

the architecture is then developed in software, which can be altered and augmented without any

hardware changes. Due to such software approach, VuNet can be easily ported to other hardware

platforms and does not require any changes of the VuNet system itself. However. a

software-intensive implementation cames with itself some performance penalties, namely

throughput ioss and speed degradation. The developers' tests show that even with those penalties,

the performance of VuNet is sti I l adequate.

Since different types of multimedia information are reaching the application level in the

workstation, the system must provide transparent data handling for al1 those types. The

applications must be provided with similar interfaces to vanous data types and must be able to

handle them in a similar manner. To implement transparent data handling the VuNet system must

provide a graceiùl scaling and degradation of multimedia traffic in situation when the load

reaches high levels. To achieve that, VuNet has the ability to control the sources of data and can

adjust their data rates and burstiness on-the-fly to react to the changes in the system load and

available bandwidth.

As mentioned above, VuNet introduces "universal" peri pherals which are connected direct1 y to

the ATM fabric and not to the I/O bus of the computer. This allows sharing of those peripherals

among al1 computer clients and eliminates the need for hardware platform-dependent penpherals.

There is also no need for re-design when new types of cornputers are connected to the system.

Each of the "universal" penpherals constitutes a single node in the VuNet architecture and has a

single function which it can perform with high efficiency. To implement a more complex

functionality, those peripherals are chained across the ATM intercomect.

2.4.2.2 lrnplementation details

Al1 elements of the VuNet environment are c o ~ e c t e d together using ATM switches and links

arranged in some configuration, such as a star, a ring, or other. The ATM switches together with

the peripherals and computers attached to their ports form the nodes of the architecture. The

switches contain four 700-Mbitls ports with 64 ce11 input and 256 ce11 output buffers. Each

switch has a 32- or 64-bit host interface. The links connecting al1 nodes of the VuNet provide

500-Mbit/s bandwidth. The ATM ce11 header is rnodified for the purpose of this project by

adding three bytes, resulting in a single ce11 of 56 bytes. The ce11 then corresponds to seven

aligned read operations on a 64-bit host interface bus.

The interconnect implements only the basic fùnctionality of transporting and routing the cells

along established connections. More advanced functions like congestion control and rnulticasting

have been moved to the client workstations and implemented in software. This simplifies the

hardware of the architecture, but may also result in severely degraded performance if such

advanced functions need to be provided. Connection setup and al1 network management control

are implemented on the sarne interconnect as data transfers. The hosts establish and terminate

connections by sending control cells to update the header lookup tables in the switches. The

control cells are also used to detemine the topology of the network by means of querying al1

nodes and links of the VuNet system. Again. al1 those functions are implemented in software

with hardware providing a transparent data transport.

To estimate the performance of the architecture, a test environment was build, which included an

Alpha workstation and a video camera node. The carnen node generates a Stream of data which

is sent over the ATM interco~ect to the workstation, where the data is reassembled fiom cells

and processed by the host processor. It was found that both the sending and the receiving

switches can communicate with their hosts only at 230 Mbitk due to the bus arbitration times

and bus gant latencies (the workstation used a TURBOchannel bus). However, the memory of

the host systems cm only receive data continuously at the rate of 65.5 Mbit/s. After adding the

time needed by the host to reassemble the data fiom the cells, the final maximum sustained

throughput directly to the application level is 42.2 Mbit/s. In most cases the workstation is the

performance limiting component. Only in the case of high colour video is the camera node the

bottleneck due to the computational complexity of the colour mapping algorithm used.

2.4.2.3 Critique

The VuNet architecture, though improving the prcsent stîte of multimedia processing, has some

drawbacks. First, the introduction of centralized peripherals might create congestion and delays

in the VuNet system. If there were more workstations using the system (the test environment

included only a single one), their access to a particular peripheral would have to be shared. To

avoid long access delays, each shared peripheral should provide high processing speed and

efficiency, thus increasing its cost. In many cases it would be more economical to provide

inexpensive built-in multimedia peripherals for each computer, thus reducing the VuNet traffic.

In addition, since al1 data to be processed in the "universal" peripherals has to be transported to

them and back to the computer through VuNet, the data transfers consume the VuNet bandwidth

available for other tasks and workstations. A similar situation arises when a number of simple

peripherals is combined to perform complex tasks, like chaining. The information has to traverse

the whole chain of requested peripherals, each time consuming the bandwidth of the network.

Dcspite the above rnentioned drawbacks. the idea of "universal" penpherals is very appealing. I t

would be cost effective to have devices with standard ATM interfaces, therefore independent of

the hardware platform.

Future applications will be openting on the multimedia streams, rather than just presenting them

to the user. Therefore, the workstation architecture should permit those applications

high-bandwidth access to multimedia data. However, the software-intensive approach suggested

by the MIT lab researchen offers only a medium-bandwidth solution. Delegating so many tasks

to the workstation host processor burdens it to the point that the processor becomes the

bottleneck of the system. Ce11 reassernbly, multicasting, protocol management, al1 executed in

software, significantly reduce CPU time to do regular data operations. Test results show that the

video fiames are transported to the application level at only 47 Mbit/s, while the host interface

can send that data at 230 Mbit/s. In the case of a single stream of data this bandwidth is

sufficient, but if more are present, the bandwidth can degrade even fûrther.

The increased size of the ATM ce11 to 56 bytes, although convenient from the point of view of

the host interface transfers, has two major disadvantages. First, it strays from the standard,

making the system incompatible with mainstream ATM equipment. This nses the cost of the

systern implementation in decreases the network interoperability. Second, additional three empty

bytes have to be sent over the VuNet network. This increases the total ce11 delivery time and the

consumption of the link bandwidth. The operation of padding to the required aligned access

width of the host interface could be done locally, without sending the extra bytes through the

VuNet switching fabric.

The control mechanisrn of the VuNet system seems very economical. Since the same path is used

for sending data and control information to the switches, links, and penpherals, there is no need

for a separate control interface.


The designers of this architecture from the Information Sciences Institute at University of

Southem California, view a workstation as a group of CO-operatively c o ~ e c t e d subsystems [ 1 O].

They propose a LAN interconnect, internal to the workstation, connecting those peripheral

subsystems as seen in Figure 2.5. Netstation can be viewed as a heterogeneous message passing

rnulticomputer, with nodes communicating with each other by established point-to-point

connections over the intemal LAN. Each Netstation node, a so called MOSAIC node, contains an

internal LAN interface, a network protocol processor with its rnemory, and an interface to the

attached peripheral. The diagram of the MOSAIC node is presented in Figure 2.6. Network

interface in each node is ver/ small, with the ability to execute the network protocols at a speed

at least matching the speed at which the associated device can send or receive data.

To make al1 the intemal LAN components easily accessible from outside of the workstation, the

extemal and intemal LANs of such system are link-layer compatible. The only differences

between the extemal and intemal LANs are the latency of cornponent access and the secunty

access privileges. Such architecture allows a very fast communication between the workstations

c o ~ e c t e d to the external LANs, and is therefore ideally suited for distributed,

bandwidth-intensive applications.

TO OTHER LIOSAIC NODES OR

WTERXAL wrwouIi;s

TO OTiIER UOS,\IC SODES OR

EYTERNAL NETWORKS

TO 0-ïtIER SIOSAIC SODES OR

EXTERML NETWURKS

TO OTIIER MOShlC SODES OR

EXTERSAL SETWORKS

Figure 2.5 Netstation interna1 LAN

PERIPHERAL ASSOCIATED

WITH THE NODE

PACKET INTERFACE

ASYNCHRONOUS ROUTER _c__I,

Figure 2.6 Netstation MOSAXC node

Using a point-to-point interconnect rather than the system wide bus has many advantages. Only a

single device can be a bus master at one given time, while point-to-point connections use

dedicated channels to transport information. The number of charnels that can be open

simultaneously is limited only by the number of paths that can be routed through the interconnect

switching fabric. Due to transmission line effects, a bus is restricted in size and in the number of

devices comected to it, while the same restrictions are greatly reduced in a point-to-point

connection with only a sender and a receiver present on the line. The aggregate bandwidth of a

bus is constant. Hence, adding more devices reduces the bandwidth per device when the bus is

highly utilized. In a point-to-point system, added devices increase the total available bandwidth

of the interconnect. The connection of a new node to the Netstation increases the amount of

routes available through the interconnect, which at the same time increases the number of

point-to-point data transfer channels. Since each channel has a constant bandwidth associated

with it, the bandwidth of the newly introduced channels adds to the aggregate bandwidth of the

interconnect. The proponents of the Netstation architecture suggest the replacement of the

intemal system bus with a point-to-point intemal LAN that connects al1 peripherals, external

networks, and system processors. However, they argue that such interconnect should not be used

for processor-memory traffic.

Today's workstations contain a single, distinct network interface and a system wide bus for data

delivery to the attached peripherals. Although such setup works well with both local applications

and those requiring srna11 and medium external network bandwidth, it creates a bottleneck at

access speeds approaching gigabits per second. Currently, the path of the data coming From the

extemal LAN involves a number of copying and processing operations. First, on amval of the

packet, the network interface copies the received packet to the memory for protocol processing.

The processor then executes suitable protocols to extract the application data From the packet. In

addition, the data is fûrther copied From the kernel space to the application space. Such

operations require a lot of intemal in terco~ect bandwidth, especially with the fiequency of

incoming data reaching gigabits per second. To avoid burdening the processor and the internai

interconnect, the Netstation system delegates network information processing to the individual

devices that produce and receive network traffic. This can reduce the minimum speed

requirements for the system processor, the memory, and the interconnect.

The intemal LAN must offer very low latency and high reliability to be practical for connecting

workstation peripherals. In order to achieve this goal, the LAN has a very fast routing

mechanism that introduces minimum overhead while sending packets through the intermediate

nodes. The packet size is variable and appropriate for each device to rninimize the processing

overhead (Le. video frame sized packets for teleconferencing equipment). The choice of the

packet size reduces the time of packet assembly and reassembly. Since the extemal and intemal

LANs are link-layer compatible. no gateways or translations are necessary at their boundary.

2.4.3.2 Im~lementation details

There are currently two types of nodes developed for the purpose of the Netstation architecture:

the processing node (MOSAIC-C) and the peripheral interface node. The MOSAIC-C nodes

contain 14 MIPS rnicroprocessor. 64K of M M , 2K of ROM. as well as eight channels of 0.64

Gbit/s each connecting the node to the interna1 LAN. The MOSAIC interface chips are used for

connecting the peripheral devices to the internai LAN. In addition to the elements available in the

MOSAIC-C nodes, the interface chips have 128K of extemal dual ported RAM and extemal

peripheral bus. Both types of nodes can be prognmmed through the incoming channels to

perform di fferent network processing protocols, or to change the characteristics and objectives of

the associated device. The Asynchronous Router redirects the packets not dest ined for its node

using cut-through technique to minimize the routing delays. This operation is perfomed without

internipting the node processor. When a node is a destination for the incoming packet, data is

sent through the DMA chamel to the packet interface. Data is fùrther processed by the node

processor which has the ability to filter messages, execute appropnate protocols, and arrange data

to be presented to the application layers or to the associated peripheral device. The propagation

time through the node is 12.5 ns, with 25 ns for the routing decision to be made.

A typical data transfer on the Netstation LAN involves three stages: the data channel

establishment, the information transfer, and the chamel termination. During the first stage, the

source MOSAIC node sends a message header to the destination node indicting the beginning of

the information transfer. As the header passes through the intermediate MOSAIC nodes, it

allocates a data transfer chamel in each node. If a particular link behveen the nodes is busy, the

header is stopped. Only afier the iink is freed, is the header allowed to advance to the next node.

When it reaches the final MOSAIC node, the whole data transfer chamel has already been

allocated. The information transfer starts and continues until a message teminating the channel

reaches the destination node. When this message traverses the intermediate nodes, it frees a11

previously allocated data channe1 links.

2.4.3.3 Critique

The Netstation architecture provides an appealing framework for high bandwidth,

communication intensive multimedia applications. First, the interna1 and extemal LANs'

link-layer equivalence elirninates the bottleneck of the single, centralized extemal network

interface. The information flows without interruption through the boundary of a workstation,

unless secunty or administrative restrictions apply. The intemal devices can be transparently

accessed by both inremal and extemal nodes with appropriate security privileges. The

performance of the interconnect is much higher than that of a system bus, due to c o ~ e c t i o n

based information channels and distributed network protocol processing.

However, there are some drawbacks in the presented solution. The workstation LAN is based on

a c o ~ e c t i o n of nodes with an unspecified, user defined topology. Such configuration introduces

some unpredictability to the system; the distance and the number of intermediate nodes to be

traversed between nodes varies according to their position and the topology. Therefore, chamel

delays may be different for different pairs of communicating nodes. Additionally, in case of

failure of one of the intermediate nodes, some devices may be cut off fiom other workstation

components. Also, since the node chips (MOSAIC chips) have a fixed number of ports. they

cannot form a LAN as easily scaleable as central switch or a crossbar. The third problem

involves bottlenecks in available bandwidth which may anse when the trafic fkom a nurnber of

devices needs to traverse the same node; the increased delay jitter may be the immediate result.

The Netstation uses a proprietary routing mechanism, which decreases the delivery delays and

minimizes the node processor overhead. However, not adhering to the standard network

protocols forces custom network interface implementation, which in turn may increase the

overall cost of the workstation. n i e Netstation proponents argue that making the packet sizes

variable and adjusted to the data unit s i x of the devices or peripherals will reduce the overhead

in packet processing. However, such approach makes the design of hardware suitable to perform

vanous packet operations quite difficult and costly. The success of ATM is mainly due to its

fixed size cells which minimize and simpliQ the processing hardware.

2.4.4 Desk Areo Nehvork (DAN)


The onginaiors of the concept of DAN. Computer Laboratory at University of Cambridge,

maintain that a highly efficient network communication of workstations can be achieved by a

carefbl design of a workstation architecture. The architecture must a1Iow fast transfers of

information fiom the network interface to the real data consumers. The DAN architecture [14.16]

combines the aspects of both ATM LANs and multiprocessor interconnect networks. Al1

elements connected to the DAN through an ATM switch fabric, which foms a central part of the

interconnect, are considered to be integral parts of the workstation. A typical configuration of a

DAN workstation is presented in Figure 2.7.

In a typical workstation, the functions of the network interface are to demultiplex the data

incoming from the external network and to send it over the interna1 bus to the destination device

inside the workstation (Le. a memory, a secondary storage. or a frame buffer). While sending to

the network, the data from the originating device has to be translated by the network interface to

the form and protocol acceptable by the extemal network. Therefore, the main hnction of such

interface is to translate between the multiplexing techniques used on the workstation bus and the

extemal network. DAN eliminates those translations by using the same data transport mechanism

both inside the workstation as on the network. DAN is totally enclosed within the physical limits

of the workstation. which greatly simplifies the control structure and the protocols on the

workstation interconnect, making them faster and cheaper to implement. On the DAN. the

functions of the network interface are confined to enforcing of the secunty features, to protecting

the access to the workstation cornponents, as well as to the conversion of signalling protocol

between the implementations inside and outside the workstation; the irnplementations of the

same signalling protocol rnay differ since the intemal DAN protocols are simplified to achieve

CPU WITH FIRST AND

SECOND LEVEL CACHES

SYNCHRONIZATION NODE

VIDE0 FRAME STORE \

ATM SWTTCH

MAIN MEMORY

SECONDARY STORAGE

DECOMPRESSION

ATM CAMERA

Figure 2.7 Typical configuration of DAN multimedia workstation

higher speed.

Most of the high level hnctions like queuing and scheduling algorithms for each device

connected to the DAN are delegated to the operating system. Those algorithms are tuned to

maximize performance based on the characteristics of each device. The operating system must be

aware of the devices and their characteristic traffic patterns to avoid congestion and to minimize

delays and contention. Al1 components of the DAN execute the sarne single, distributed

operating system for a flawless and congestion fiee operation. The system can, in fact, be

considered a highly asymmetric multiprocessor with the attached devices forming the nodes of

this machine as seen in Figure 2.7. The following is the sample list of devices which can be

connected to the DAN:

A CPU node: usrd for general-purpose data processing, as well as the control hnctions of the DAN ( switch connections setup, scheduling algorithms, queuing, resource allocation, synchronisation of multiple data streams).

A main memory node: the memory system of the DAN is divided by the switching fabric. The CPU with first and second level caches form one node, while the main memory forms another. The second level cache lines requested from the main memory have to be sent through the ATM switch fabric. The saine is true for the cache lines written back to the main memory.

A secondary storage node: the main data store (or stores) for the DAN. It can consist of a nurnber of disks and CD-ROM'S.

A display node: contains the frarne store for graphics manipulation and windowing fùnctions. as well as the video handling hardware.

A canera node: the source of video data streams. The camera ideally generates ATM cells rather than the traditional frarnes of video information.

A LAN interface node: a very simple node. It implemenis only the routing functions and the secunty features of the system (if required).

An audio node: dedicated to the audio input and output.

A compression/decompression node: shared by al1 devices. Used for conserving the secondary storage (disk compression procedures) and for reducing the network bandwidth (video, audio, graphics compression methods like JPEG. MPEG, etc.).

2.4.4.2 Irn~lernentation details

The Cambridge Laboratory irnplemented a test DAN with a small nurnber of devices attached. A

Fairisle switch fabric [23], created fiorn 4x4 self routing crossbar switching elements, constitutes

the heart of the systern. Each crossbar element has four input and four 8-bit output ports. The

fabric is clocked at 20 MHz achieving a throughput of 160 Mbit/s per port. In order to implernent

the self routing, the &bit routing tag has to precede the ATM cell.

On the device side. the dedicated port controllers provide the interface to the switching fabric

[15]. The functions of the pon controllers also include buffering of the incorning ce11 fiom the

fabnc, shapiiig the traffic going into the fabric, sensing the ce11 losses, and retransrnitting the lost

cells back into the switching fabnc. Each port controller has an ARM RISC processor and an

expansion bus to which the peripherals and devices arc attached. The operation of the DAN

should be transparent to the applications running on the system. Therefore, the switching fabric

must be fully reliable on the hardware level; a ceil rnust reach its intended destination or the

source (in this case the port controllers) must be informed.

Since al1 data paths on the DAN require an established connection, a connection management is

needed. As mentioned earlier, the CPU node is responsible for most of the connection setup

operations, since it has the most processing power. However, there may be other devices capable

of the set-up and the temination of the data path comections. Hence, there are three possible

device types on the DAN: "dumb". intemediate, and "smart". "Dumbtt devices implernent only

the simplest functions for the data management and control in the form of programmable interna1

registers, i.e. an audio recording node. "Smart" devices have substantial processing power and

can manage their own connections as well as establish connections for "dumb" and intermediate

devices. The intermediate devices fa11 somewhere between those two, having a varying arnount

of the processing power and the ability to perform some management hinctions independently.

The architecture proposes a separate control and data paths. Such implementation is a natural

requirement if very low latency of control commands is required. Since the main memory is

separated form the second level cache by the switching fabric, each cache read request has to

occupy a whole ce11 and must be transmitted over the switching fabric. This significantly

increases the caches miss tirnes and degrade the CPU performance.

The expenmental setup of the DAN involves testing the main memory service times of second

level cache misses. The time is measured as the time when the cache miss is detected till the time

the processor resumes execution. The designers tested three different memory server

configurations. The first one implemented memory request handling entirel y in software, the

second one used fast interrupi requests (FIQ) available on the ARM processors. The third one

was irnplemented as a dedicated hardware cornponent. The mean service times varied

significantly from implementation to implementation. The mean service time ranges are

presented below:

Software 373 - 42 1 ps

FIQ 33.1 - 40.7 PS

Hardware 8.4 ps.

For cornpanson, the mean memory service time on a typical workstation is in the order of

hundreds of nanoseconds.

The last architecture reviewed here is closest in concept to the one proposed in this thesis. The

summary of the memory mean service times shows that the separation of the main memory and

the second level cache substantially degndes the performance of the CPU. The service times are

orders of magnitude iarger than those found on a typical workstation utilizing a bus-based

memory system. The software implementation of rnemory handling is not acceptable in a

workstation because of its mean service time reaching hundreds of microseconds. Based on the

DAN expenments, the hardware implementation provides the performance closest to the one

found in typical workstation.

Separating the control and data paths is necessary for low latency access between main memory

and second level cache in this particular configuration. However, it adds to the cost of the final

system since it requires additional wiring, extra interfaces on each device on the DAN, as well as

the development of the control interconnect protocols.

The separation of some multimedia components (compression/decompression node) may lead to

a wasted bandwidth on the DAN when the data streams are passed fiom device to device for

vanous processing tasks. Also, since such components have to be shared, i t introduces contention

and puts more demands on the performance of such shared component.

2.5 MB86680B, ATM Switch Element From Fujitsu

In the heart of the architecture presented in this thesis is an ATM switch. An ATM switch is a

device used to deliver ATM cells amving at its input pons to the appropriate output ports.

Comrnercially available ATM switches [13] designed by AT&T, Fujitsu, IGT. TranSwitch, and

TnQuint were considered for the role of the performance simulation mode1 of an ATM switch.

AAer careful consideration. the MB86680B [26] frorn Fujitsu was selected. The MB86680B is

the only device which processes ATM cells as integral data units, detects the beginning and end

of cells, and switches entire cells rather than generic bit streams. Fujitsu's integrated circuit is a

self-routing design, allowing it to extract the ce11 destination address fiom the ce11 itself; the

switch does not have to be extemally controlled dunng the normal operation. The MB86680Bs

have built-in features for cascading them into various switching fabrics without any additional

hardware; this allows an increase in the number of available input and output pons.

The MB86680B From Fujitsu is implemented on a single piece of silicon. A simplified functional

diagram of the switch is sketched in Figure 2.8. The switch provides four input and four output

ports per chip. In addition to the standard input and output ports, it offers four expansion and

regeneration ports to allow easy implementation of matrix switching fabrics with more than four

input and output pons. Each port on the chip is eight-bits wide and can be clocked at 25 MHz,

providing a total bandwidth of 200 Mbitk per port. Each output port has a buffer with room for

75 cells, which can be further subdivided into high and low priority queues. The high priority

queue consists of 25 cells and the low priority of 50 celis. Additionaily, the switch also supports

multicasting and provides information about the discarded cells and queue overflow events. ï h e

REGENERATION PORTS

INPUT REGISTER it.c_7 NPUT REGISTER +

EXPANSION PORTS

PORTS

Figure 2.8 Functional diagram of MBS6680B

four port interfaces: input, output, expansion, and regeneration can be clocked independentl y for

greater flexibility with the attached devices. The pin-out of the MB86680B is presented in

Figure 2.9.

In order to impiement routing, each incoming cell has to be preceded by a 24-bit routing tag. The

tag contains the address of the destination port on the switch. If the switching fabric foms a

matrix or other configuration with a number of switching elements, the tag specities the route

through the fabnc. The tag is also used to indicted the priority of the incoming cell, and the

multicasting information.

The beginning of a new cell is indicted using a separate start of cell signal (SOC) associated with

each input port. The routing tag of a ce11 arrives before the cell itself and is processed

immediately by the address filter. Based on the information in the tag, the address filter directs

the ceIl to the appropnate output port and to the appropriate high or low priority queue. Each

output port uses a fast input multiplexer to allow receiving of up to three cells simultaneously.

one from the input, one from the expansion, and one from the multicasting interfaces. While

sending cells fiom the output ports, the cells in the high priority queue are always sent before any

cells in the low pnority queue.

Statistics serial daisy chah highway

STATUS IN STATUS OUT

MPUT PORTS

INPUT CLOCK

FUJITSU

ELEMENT

L

OUTPUT PORTS CIIII)

OUTPUT CLOCK

Switch and switch matrix initialization

daisy chain

MB 866808 ATM

SWITCHING INITIALIZE IN ' MlTlALlZE OUT

Figure 2.9 Block diagram of Fujitsu ATM switch

Chapter 3

Architecture

3.1 Introduction

This chapter describes the details of the new multimedia architecture proposed in the thesis. The

following section begins with a discussion of the rationale behind the development of the

architecture. The shortcomings of current rnainstream multimedia solutions are also mentioned.

Section three outlines the specific features needed in the new design and the major objectives it is

intended to achieve.

Section four concentrates on the general structure of the proposed architecture. It shows how this

particular design ful fils the objectives. It descri bes a minimum functional system configuration

and the topology of the data path. This section also discusses the implementation of control

mechanisrns and handling of high load situations which may occur during the workstation

operation. Typical data and control information flows as well as a workstation bootstrap

procedure are presented.

Section five of this chapter focuses on the individual components of the architecture. Major node

types which form the workstation are reviewed. General structure, implementation details, and

some interface and performance issues for each of the nodes are discussed. The last section

compares the characteristics of the presented multimedia architecture with the features of the

previously mentioned multimedia workstations developments.

3.2 Architecture Rationale

There are three major reasons for developing this new architecture. The fint follows fiom the

observation that people, both direct and indirect computer users, are very interested in

multimedia representation of information. The use of various media and audio-visualization

techniques makes the presentation of raw textual data rnuch easier to digest and analyze

[7,17,19,28]. The presentation becomes attractive and interesting to the usen. The second reason

is people's need to freely interact with information. Thus, the ability to alter the form of a

presentation of data on-the-fly can be advantageous in many situations. This is especially true

when the reaction time of the system is critical; an interactive heart visualization during an

operation provides a good exarnple [3 1 , 331. An increased demand for human communication

across large distances is the third incentive for the creation of the new multimedia architecture.

The need for simultaneous teleconferencing and remote workgroup environments is suppressed

only by the high cost of the necessary equipment. Such environments allow browsing and editing

of documents and visual exchange of ideas in groups connected by a cornputer network but

separated by large distances.

The three applications mentioned - multimedia representation of data, interactive access to

multimedia data. and multimedia communication - al1 require transporting, processing, and

displaying large arnounts of data. In order to make them feasible, one must provide a

communication infrastructure and workstations which are able to efficiently handle high

bandwidth information streams. Thc communication infrastructure in the form of high speed

LANs, MANS, and WANs is quickly becoming a reality, with the increased corporate and

personal interest in advcrtising and information exchange over the Intemet. Multimedia

workstations also have the necessary components for dealing with hi& bandwidth information,

including powerfùl and inexpensive processors, large and inexpensive storage, affordable

audio-video equipment, and ample amount of fast RAM.

However, today's workstations lack the means for the efficient utilization of those powerful

components in handling large amounts of multimedia data. A typical workstation employs a

system bus for connecting al1 its devices and peripherals. The performance of current bus-based

systems, well suited for delivery and processing of non-multimedia data, suffers when faced with

the huge loads brought by multimedia. The system bus. due to its fixed bandwidth becomes an

immediate bottleneck; the bandwidth available for each device and the speed of data

transmission become lower with every new device attached to the bus.

There are various anempts to remçdy the bus bottleneck problem. Some of them employ a new

type of bus with improved throughput to satisfy the needs of current generation of applications.

However, the total bandwidth of the bus rernains fixed. When the next generation applications

amve, their required bandwidths will most likely exceed the capacity of the new bus, making it

obsolete again. Others propose systems with a secondary bus for transporting and displaying

multimedia data [29]. However. if multimedia data is to be processed, it must be transferred to

the primary system bus for submission to the CPU. In such case, the interface between the two

buses or again the primary system bus becornes a bottleneck. Solutions wi th secondary buses are

not only costly, but also make the processing of the multimedia information difticult. They are

suitable for first generation multimedia applications which do not require multimedia data

processing.

In order to provide good handling of high bandwidth information corning fiom the external

networks, workstations must provide a fast network interface. Typical bus-based systems employ

a single, centralized network interface which becornes another bottleneck, this time for the high

bandwidth data being delivered to the workstarion frorn outside of its boundaries. Most solutions

to that problem incorporate special network access accelerators, but that shifis the bottleneck

fiom the workstation boundary to the system bus.

3.3 Architecture Objectives

The purpose of this thesis is to propose a multimedia workstation architecture that would allow

efficient processing and handling of both multimedia and non-multimedia data. In addition. the

architecture should allow a cost effective implementation and flexibility in c o ~ e c t i n g various

workstation peripherals. The cost aspect is very important, since it will help bring about quick

market acceptance. A successful architecture shouid also be flexible, allowing both current and

future penpherals to be attached without performance degradation. Both cost and flexibility

influence the lifetime of the new design.

As mentioned in the previous section, currently used workstations are not well suited to

efficiently handle both multimedia and non-multimedia information. Therefore, to compete in the

market, the new architecture must not only be efficient and flexible when dealing with

multimedia data, it must also match or even exceed the performance of current workstations in

general purpose data processing. It should eliminate the limitations and bottlenecks present in

bus-based systems.

The proposed architecture should replace the commonly used system bus with a new type of

interconnect, able to eliminate system bus limitations. The interconnect must tie the extemal

network and al1 devices and peripherals together, to form an integrated workstation system. To

achieve high bandwidth for ali data transfers, the interconnect must employ two important

features. First, it must allow maximum parallelism in data transfers, to permit simultaneous data

exchange among several peripherals. Second, it must offer high interconnect clocking speeds.

Further. to add flexibility and avoid blocking of nodes, the system should permit easy

multicasting of data, so that any information transported over the intemal network can be

replicated and sent to multiple destinations. This can facilitate second generation multimedia

applications which c m both display and submit information for processing at the same time. For

example, a Stream of video frames coming from the extemal network interface can be multicast

to the video node to be displayed on the monitor and to the CPU node where additional

information fiom the frames can be extracted. Both operations can be performed simultaneously.

On the other hand, the information coming fiom different sources and amving at a single

destination node at the same time should also be handled properly. The intemal network and the

interface of the receiving node must prevent both blocking of the node and any loss of

information.

The new workstation should provide efficient access to the extemal network, far exceeding the

capabilities of the single, centralized network interface of today's workstations. The topology of

the design should be as simple as possible to maintain low cost of the workstation and to

eliminate any performance bottlenecks on the data path between comrnunicating devices and

penpherals. The design must also be easily scaleable to facilitate upgrades to the workstation and

any new devices that will be connected in the future, withsut sacnficing its performance and

flexibility. In addition, the control structure of the architecture must allow full control of the

system and peripherals under any load conditions. The control structure should also enable

scaling and downgrading of the traffic on the interconnect to avoid system lockups when the load

becomes unacceptably high. This should be done by reconfiguring the attached devices or their

interfaces by the operating system.

3.4 General Structure

To achieve the objectives outlined in the previous paragraphs, this thesis proposes a new

multimedia workstation architectiirc. The general stmcture of the architecture is presented in

Figure 3.1. A new system board is proposed, which in an innovative way connects al1 peripherals

typically incorporated into the system board: the CPU, the main memory. and the graphics

interface. These peripherals communicate with each other through a fast, direct access path,

allowing efficient generai purpose data processing and display. To facilitate the information

exchange between system board components and extemal peripherals, an inexpensive,

single-chip ATM switch is built into the system board. It implements an internal ATM LAN.

providing high-speed, point-to-point connections to a11 workstation peripherals. The proposed

system board also allows merging of live video and computer graphics. To provide a highly

efficient network access for al1 peripherals, this thesis proposes to eliminate a network interface

and integrate the intemal ATM LAN with the extemal ATM network by means of the ATM

switch on the system board.

3.4.1 Interna1 A TM LAlV

The proposed architecture uses an internal ATM LAN to connect al1 workstation peripherals and

devices as presented on Figure 3.1. Each peripheral or device foms a separate node of this

intemal network and connects to it directly through an optimized ATM interface. A single ATM

switch with a full internal crossbar structure serves as a central element of the intercomect and

binds al1 components together, forming an integral workstation.

Asynchronous Transfer Mode is selected as a protocol for the intemal network, because it is

suitable for supporting both continuous (time dependent) and discrete (time independent) media

[24, 301. Therefore, an ATM interconnect should perform well while transporting both general

SYSTEM BOARD

MAIN I M A i N 12 MEMORY I A STORAGE 1 IC>

1 GKAPHICS 1 L AN ' 1 ,,.-.- L L V L

f COLOL'R DISPLAY

Figure 3.1 General structure of the architecture

purpose and multimedia data. The high speed of the ATM LAN is achieved by hardware

processing of the routing information. As mentioned before. an ATM ce11 has a fixed size, which

makes it suitable for hardware processing.

The delays on the ATM LAN are also fairly small due to the srna11 size of the ATM cells.

However, those delays determine the choice of peripherals that can successhilly communicate

through the intemal workstation LAN. The average delays through the interconnect should not

approach access or response times of comrnunicating peripherals. The memory and graphical

interface acceptable access timec are in the range of tens of nanoseconds [ 1 11. Therefore, as seen

on Figure 3.1. the CPU-rnemory and CPU-frame buffer traffic is not routed through the ATM

switch. but rather through a fast direct access path.

3.4.2 Interconneci topology

The topology of the intemal ATM LAN is simple and cost effective. The intemal LAN

components and al1 peripheral interfaces are tied to a single ATM switch. They are al1 enclosed

within a physical workstation boundary. The small overall size of the internal ATM LAN makes

the distances between the peripherals short, which in tum allows high reliability of the internal

LAN. The LAN connections are implemented on the system board inside a workstation chassis,

where environmental factors can be easily controlled. Compared to an extemal LAN, the intemal

LAN is more reliable and more immune to external interference and noise.

The preferred topology for the intemal LAN is a star with the ATM switch as its centre and al1

attached penpherals as its arms. Small distances and point-to-point connections in the star

topology allow a very high clocking speed of the peripheral channels. In such configuration, the

transmission delay experienced by data transferred between any pair of comrnunicating

penpherals is constant and equal to two ce11 transmission times. The delays on the intemal ATM

LAN are expressed in ce11 transmission times since it makes the measurement independent of the

clock speed of the intemal LAN. The ce11 transmission time is equal to the length of the ATM

ce11 divided by the clock speed of the interconnect and expresses the amount of time needed to

send a whole ce11 through a single hop of an ATM link.

3.4.3 Interconnect fleribility

The presented interconnect is also easily scaleable. Adding peripherals to the system does not

degrade the overall performance. Attaching new peripherals to one or more ports of the central

ATM switch is al1 that is required. Since the switch has a crossbar structure, the bandwidth

available for each communication channel of the interconnect is unchanged. The utilized

bandwidth actually increases, because the bandwidth available on the port used to connect the

new device adds to the total system bandwidth. The only assumption made in this scenano is that

the switch has enough ports to connect al1 desired penpherals. Ideally, the interco~ect crossbar

fabnc consists of a single switch for cost and performance reasons. Since the number of

peripherals used by a multimedia workstation is usually small [29], a single switch can indeed

provide a sufficient numbcr of ports.

Since the switch used as a fabric of the interconnect has a full crossbar structure, the system

enables maximum parallelism of data transfers. Any two devices which establish a connection

through the switch can communicate without interruption until the comection is teminated. The

system can support as many simultaneous, parallel connections as the number of ports on the

switch allows. The intnnsic properties of the Fujitsu ~ ~ 8 6 6 8 0 ~ ' ATM switch. used as an

instance of a switch in this thesis, permit easy multicasting of cells and handling of multiple data

screarns destined for the same destination device. Those operations can be performed without any

involvement of the CPU. A ceIl destined to multiple devices contains the appropriate

multicasting information in its header, which specifies al1 required destination ports. In an

extreme case, a ceIl can be broadcast to the entire system when it is multicast to al1 output ports

of the switch.

The use of an ATM switch provides gracefùl handling in situations when multiple information

streams are being sent to the same device simultaneously. Those information streams arrive in

the switch where they are both directed to the same output pon. Each output port of the switch

contains a considerable nurnber of ce11 buffers which are further divided into high and low

priority queues. Therefore, the cells from both streams are combined in the output queues and are

sen: to the destination device on a first-corne-first-served basis. The only exception is when one

'Refer to C h a p t e r 2 , page 35

46

of the streams is of higher prionty than the other. Since it is buffered in the high ptiority queue,

each cell belonging to this stream has precedence over the low priority one. If a burst of the high

priority cells is fairly long, the low priority queue may overflow. However, the switch has the

ability to inform the operating system about this event. The operating system can react to it by

either asking the stream source device to resend the cells or by reconfigunng the device to

ternporarily slow or stop the stream transmission.

3.4.4 Nehuork interface

The new architecture completely changes the shape of a typical extemal network interface. To

take fu l l advantage of the new architecture, however, the extemal network must be link-layer

compatible with the intemal network. If both extemal and internal LANs use ATM as a transport

mode, the extemal network becomes de facto another penpheral of the workstation. A full

bandwidth of the switch port is dedicated to the network connection. The network information

streams can be multicast or rnerged with other streams transparently. Data format conversions are

minimized since there are only two formats present: a device specific format and an ATM ce11

format. For example, an audio signal is recorded, sampled, and stored in an audio node memory,

packed into ATM cells and transportcd to the destination workstaîion, where it is extracted from

the cells in another audio node. and sent to the speakers through a digitaVanalog converter.

The reverse is also true; al1 internal workstation peripherals become part of the extemal network;

any authonzed device on the network can establish a point-to-point connection with any

workstation penpheral. It can then directly access and control this peripheral without intervention

of the workstation CPU.

In many practical situations a need anses to increase the bandwidth of a computer network

interface, as in the case of heavily loaded network file and application servers. This is typically

accomplished by adding an additional network interface device with a separate dedicated

network cable. Similar bandwidth scaling is possible in the ATM workstation without the use of

any additional hardware. The total bandwidth of the network interface can be selected as needed

by dedicating a required nurnber of the switch ports. Each port comects to the ATM network

through its own link. As a consequence. this setup also requires a dedicated port fiom the switch

on the extemal ATM network.

3.4.5 Basic worksîation configuration

The basic ATM workstation configuration includes the following components: a CPU with

appropriate caches. a main mernory, a frame buffer for colour display, and disks for a permanent

storage. These components are essential, because they provide the basic functionality of the

workstation. The above components must be present even if the workstation is not involved in

the processing of multimedia information. As mentioned before. the maximum acceptable delay

for the CPU-memory traffic and the CPU-frame buffer traffic is much srnaller than what the

ATM interconnect can provide. Therefore, both of these data paths by-pass the intemal LAN. On

the other hand, the disks can be separated from the CPU and the memory because a typical disk

access time (5-20 ms) is considerably larger than the typical deiay through the ATM interconnect

(5-300 ps, equivalent to around 2- 150 ce11 transmission times over the interconnect).

A general topology of the basic systern components is presented in Figure 3.2. The structure of

the system comecting the CPU, the rnemory, the ATM switch, and the Frame buffer can be

proprietary to the system board manufacturer. This does not decrease the flexibility of the

architecture since those components traditionally reside on the system board. The designers of

the system board must provide ATM interfaces for each of the components as depicted on Figure

3.2. Such setup allows easy upgrades in case the CPU or the memory technology changes. An

upgrade involves only an exchange of the system board without altering the rest of the

workstation since the board connects to the rest of the workstation through the standard ATM

interfaces.

Other components of the system, which expand the minimum functionality, inciude the

following nodes: live video processing. video camera, audio, additional storage and CPU,

network interfaces, and modem pools. Al1 those peripherals include a fast ATM interface

designed and optimized for a paxticular peripheral. A single specialized peripheral node can also

combine a number of devices with very small bandwidth demands like mice, keyboards,

BOARD MA1N

I

MEMORY I iMEMORY I

I I

DISK

1 STORAGE I ' MEMORY ATM

1 A

1 INTERFACE INTERFACE

I

CPL SILICON CHlP l - - - - - - '

I I CPUWITH 1 1 LEVELl

- - 0 - - - 1 DISE; CONTROLLER

GRAPHICS CSh-_TR?LLER

r - - ' -

LEVEL2 GRAPFIICS 1 CACHE

I - I 1

I 1 I

1 I I

I 1 FRAME 1 I

BUFFER 1 I I

1 I I I I

1 I T O COLOLiR

DISPLAY

Figure 3.2 Connection of basic system components

digitizing tablets, and joysticks. It can facilitate their communication with other workstation

components without dedicating ATM ports to each of these devices.

The proposed workstation architecture does not include any nodes that are shared among a

nurnber of peripherals. To reduce the intemal LAN traffic, each device contains full hinctionality

within its own node. For example. if the live video processing node supports MPEG compression

standard, the MPEG compression/decompression circuitry should be a part of the node. This also

reduces the number of switch ports required in the workstation. Such setup is preferable from the

performance point of view. However, in pnctical situations, the decision whether to duplicate the

functionality in a number of peripheral nodes and Save a port on the intemal ATM switch or

employ a separate utility device shared among many peripherals is highly dependent on the cost

of the dedicated utility devices. Figure 3.3 visually shows the benetits of the chosen scheme.

3.4.6 Con trot path structure

Since the access time of connected peripherals is relatively long cornpared to the interconnect

delays, the control path of the system is implernented on the same intemal LAN as the data path;

the control information is distributed along the same intemal ATM interconnect as data. The

need for a separate control and management interface is eliminated, which signi ficantly reduces

the overall complexity of the workstation architecture. Since the control and management

information is usually small. it is sent in a single ATM ceIl with higher priority than the data

cells. The whole payload section of the ATM ce11 is dedicated to the control information. Giving

higher priority to the control cells allows them to bypass the data cell queue buffers and reach the

destination with a minimum delay. The ATM interface of each c o ~ e c t e d peripheral decodes the

information in the cell, looks for any control information, and acts on it. Since control data is

located at the b e g i ~ i n g of the cell payload section, the peripheral interface can perform the

requested action even before the whole control ce11 is fully received. This further reduces the

delivery time of the control and management information.

One of the most important aspects of the control architecture involves handling of very high

system loads. In such situations the system may experience ce11 loss due to a buffer overfiow in

the ATM switch. Output buffer overtlow can occur when two or more data streams are

COMPRESSOR M A N

Figure 3.3 Data flow in system with a) separate utility nodes and b) utility nodes

embedded into the peripheral nodes

competing for the same output port and their combined data rate and burst length exceeds the

sending capabilities of the output port. A loss of an ATM ce11 by the interconnect can only occur

intentionally. In addition, al1 devices allowed to drop cells intentionally are also required to

inform the source of the dropped ceIl and a workstation operating system about this event, so the

lost information can be retransmitted.

When a ce11 is lost in the output buffer, the switch automatically generates and sends the

appropriatz information through its statistics interface. This information is then embedded into a

ceIl and sent with high priority to the node which runs the operating system, narnely the

supervising CPU node. Upon receipt of this cell, the operating system sends a control ce11 to the

source device whose ceIl has been lost, and requests a retransmission, slowing, or stopping of the

ce11 stream. The originating device interface decodes the control ce11 and configures the attached

peripheral accordingly. Figure 3.4 provides a quick sketch of the path of the control information.

To avoid system lockup, the ceIl loss information is sent through the high priority queues,

therefore bypassing the overflowing switch buffers. The high priority queues are reserved for the

use of the operating system and other control and management information to minimize the

delays in the transport of the control cells. Hence, the total reaction tirne to a ce11 loss event is

detennined mainly by the transmission tirne of the control ce11 through the ATM interconnect.

The delays introduced when the system is working with the external network, either as a ce11

source or destination, are mostly determined by the extemal network characteristics. The network

delays are independent of the interna1 workstation architecture. Therefore, this topic is not

discussed further in the thesis. The only aspect of the thesis that is relevant to the network delay

issue is the additional delay added by the intemal interconnect to the total network delay.

However, since the network is treated as a generic workstation peripheral, the delays introduced

by the intercomect are identical to those for any other peripheral attached to the workstation.

To improve performance of the system with general purpose data processing, the control and

resource management burden of the CPU, introduced by multimedia devices, is minimized. Since

information transfers inside the workstation are comection oriented, the CPU is only required for

the set-up and termination of those connections. In many cases, if the devices have enough

52

Stop, slow down, \ or rctnnsmit

, Switchstatistics data ' (seriai interface)

\

' CeIl loss event \ t STATISTICS

CELL GENERATOR

h

Figure 3.4 Control data flow during a cell loss event

processing power, they can establ ish the connections themselves, only Infoming the CPU (an

operating system nodej of this fact. The only remaining responsibility of the CPU is the

necessary scheduling and resource allocation of the interconnect, as well as trafic shaping and

ce11 loss handling.

3.4.7 Initiai setup and îypicai usage scenarios

Naturally, the workstation requires an operating system to tie al1 the components together into a

seamlessly operating workstation. To manage the components properly, the operating system

must be aware of al1 components of the workstation and their characteristics. The operating

system is responsible for scheduling the flow of data to avoid congestion and blocking, and it

should react to such situations promptly to restore normal system operation. The general

rnechanism of congestion control was discussed in a previous section. The following paragraphs

describe how the operating system collects information about system structure and how the data

connections are establ ished.

The information about the system structure and the device characteristics should be obtained

during the operating system startup. First, the CPU initializes the system board components, then

sets up their respective ATM interfaces. In the next phase. it identifies the remaining workstation

cornponents by broadcasting a special status query cell. The reply cells should provide a detailed

information about ail peripherals attached to the switch ports, their maximum and typical

bandwidth requirements. and the control options they provide. Sincc this quety process can be

repeated during the normal workstation operation, new penpherals can be dynamically added and

configured after the bootstrap procedure.

After the workstation is completely initialized and the operating system is in full control of the

workstatior. components, the regular data processing may be started. A typical comection

between peripherals can be established by a request from the connecting peripheral to the

operating system. A reply to this query will supply the VCINPi pair and a port number of the

switch to include in al1 ce11 headers. If a two-way c o ~ e c t i o n must be established, both

connecting nodes receive appropriate replies from the operating system. The same procedure

applies to the multicast and broadcast connections. Once connected, the involved peripherals

require no further intervention from the operating system. unfess congestion or blocking occurs.

The connections with the extemal network difler slightly since the operating system node has to

establish a connection with the extemal ATM network first. Only after such connection is

negotiated, the openting system provides the corresponding workstation peripheral with the

switch port address and a VCINPI pair. The procedure described above can be bypassed if the

originating node has enough processing power to perform an ATM connection session with the

externat network by itself. In this case the node needs only to inform the openting system of the

connection and the required bandwidth. and to request the switch port address of the extemal

network node. If, on the other hand, some nrtwork entity is to contact an individual device of the

workstation, it has to contact the operating system node first to receive a comection grant. This

is necessary from a secunty point of view. so that any unauthorized access to the interna1

peripherals can be stopped.

Some connections in the workstation have to be permanent to minimize access delays during the

connection establishment. The comection between the memory and disk storage nodes is a good

example of such permanent connection. Since the main memory and the disks are involved in

paging, any additional delays introduced during the connection setup unnecessarily degrade the

overall performance of the workstation memory system. The memory-disk storage connection

should be terminated only during the operating system shutdown or restart.

3.5 The Components And Their Implementations.

3.5.1 System board

The system board contains three main components: the CPU with first and second level caches,

the main memory, and the graphics controller circuit. The connections of those components are

depicted on Figure 3.2. The CPU subsystem, with the first level cache and second level cache

controller integrated ont0 a single silicon chip, communicates with the rest of the workstation

through the input/output systern agent. The i/0 agent facilitates CPU access to the main memory,

the frarne buffer, and directly to the ATM switch of the internai interconnect. The direct access to

the switch facilitates operating system control fùnctions.

The system board allows two types of access to the main memory. The CPU uses the fast

connection via the I/O agent, while al1 other peripherals use the intemal ATM LAN. A page fault

process exploits both of these connections. When the CPU performs a memory operation, the

appropriate request is passed to the memory controller through the I/O agent. The memory

controller processes the request and acts upon it appropriately. If the referenced data is in fact in

the main rnemory. it performs the requested action. If it is not, it informs the operating system of

a page fault. The CPU then sends a request cell to the storage node to retrieve a page through a

permanent ATM memory-disk connection. AAer issuing this request the CPU is fiee to continue

execution of other rasks. The page retrieval through the ATM interconnect is analogous to a

DMA transfer in the bus-based systems.

The memory controller and the ATM interface of the memory node can both be integrated onto a

single chip. However, this can reduce the flexibility of the system board; when the memory

technology changes and the rnemory interface must be replaced, the ATM interface will

unnecessarily be replaced with it.

The third main component of the system board, the graphics controller circuit, facilitates the

access to the colour display monitor. There are many graphics controller architectures for the

bus-based systems [ I l ] , and rnost of them can be easily implemented on the proposed system

board with only minor modifications. In the ATM workstation, this circuit is used exclusively for

the processing of the low bandwidth visual information associated with the operation of the

graphical user interface. As opposed to the main memory, this circuit can oiily be accessed

through the fast data path via the I/O agent. Assigning a separate ATM port to the graphics

controller would be a waste of the switch resources, because of the very low bandwidth needed

for the manipulation of the display frame buffer, which is estimated at round 200 KBps [18].

Because of such small bandwidth requirements of the CPU-frame buffer link, the fast data path

to the CPU can be shared with the main memory without introducing any bottlenecks to the

CPU-memory data traffic. High bandwidth video information, such as video-teleconferencing,

motion pictures, and other live video matenal, is processed in the iive video node, described later

in the chapter. The system board incorporates a simple connector for the second external source

of video information as seen on Figure 3.2. The outputs fi-om the live video connector and the

graphics controller circuit are merged on the system board to be displayed on a single video

screen.

3.5.2 Disk storage node

The disk storage node can be implemented in two different ways. For compatibility with the

existing disk standards. the ATM circuitry of this nodr can incorporate an appropriate interface

to the disk subsystem. As seen on Figure 3.2, the peripheral side of the storage node interface can

be compatible with SCSI, IDE, or any other disk protocol. However, such setup is subject to al1

limitations of the above mentioned protocols. On the other hand, the integration of the ATM

interface protocol within the disk controller is as practical and cost effective as for other disk

protocols. Figure 3.5 shows the configuration of the ATM workstation with vanous storage

nodes using the ATM interface protocol. Then, the minimum data unit sent or recrived by the

disk controller is an ATM cell.

3.5.3 Live video processing node

The live video node is intended to process the high bandwidth visual information received by the

workstation, such as teleconferenc ing streams and motion pictures, and transmit the video

recorded locally to the extemal network. The general structure of this node is presented on Figure

3.6. The cost and complexity of the video node directly relates to the flexibility and processing

power of the video processing circuitry. This circuihy can take form of a codec module to

support vanous video compression standards like MPEG- 1, MPEG-2, or AVI, or i t can contain a

signal processor with supporting devices to perforrn more complex operations [8]. The live video

node sends the processed signal to the special connector on the system board where it is

displayed on the main screen as seen on Figure 3.2. During video recording, the signal fiom the

video carnera is compressed or processed as required, packed into the ATM cells in the ATM

interface, and sent to the intemal ATM network through an established co~ec t ion . The

compression circuitry and the ATM interface may be built into the video carnera itself, allowing

it to form a separate node on the interna1 ATM LAN [3].

BOARD MAIN

MEMORY

A L - , - - A

p - 0 - 9 , v

CPU 1 I/O

AGENT

ATM LAN

CD-ROM 1 I I

Figure 3.5 ATM workstation with ATM circuitry integrated

into mass storage devices

Livc vidco nods - - - To systcm board

I ~ V C vidco 1 conncctor VIDE0

BUFFER

VlDEO

VlDEO PROCESSMG CIRCUITRY SWITCH Q

Figure 3.6 Configuration of a live video

processing node

3.6 Cornparison With Other Multimedia Architectures

The proposed architecture improves the design of the multimedia architectures presented in the

background chapter. This section provides a quick review of how it deals with the drawbacks and

shortcomings described earlier in the appropriate critique sections.

The shared, centralized penpherals in VuNet are eliminated. All functionality required by a

particular node is embedded into that node (compression, decornpression. synchronization, etc.).

Similarly, by eliminating those shared cornponents, the chaining of the nodes wiih a single

function to perform complex tasks is no longer needed. The software intensive approach in the

workstation control is replaced by a decentralized hardware approach. Most functions such as

multicasting, ce11 assembly and reassembly, and protocol management are al1 executed in the

ATM hardware.

Compared to the unspeci fied topology of the Netstation architecture, the topology of the

proposed architecture is a star with a single ATM switch as a central element of the interconnect.

Therefore, the distances to al1 devices on the interconnect are constant. There is no possibility of

a node being cut off unless the central switch is damaged, which is equivalent to a rnalfunction of

the workstation as a whole. The number of ports on the central switch is selected by a

workstation designer and is not fixed to four, such as in the case of the proprietary MOSAIC

nodes in the Netstation architecture.

Compared to the DAN architecture, the proposed architecture offee a much better performance

of the memory system. The main memory is not separated fiom the CPU by the relatively slow

ATM interconnect. The control path is implemented on the same interconnect as the data path,

saving in the cost and complexity of the architecture. Unfortunately, the integration of the data

and control paths adds some overhead to the existing intemal LAN traffic. As mentioned before,

the compression, decompression, synchronization, and other similar utility peripherals are

embedded into the nodes of devices which require those services, thus reducing the use of

bandwidth of the internai LAN.

There are reasons to believe that the implementation cost of an ATM workstation will be smaller

than the implementation cost of other architectures. While other architectures use proprietary

interconnect protocols and hardware, the proposed architecture uses a standard ATM protocol.

Therefore, off-the-shelf ATM cornponents can be used in the interconnect and al1 peripheral

interfaces. Many devices traditionally used in computer workstations has been deemed obsolete

inciuding DMA controllers, network interfaces, and bus control circuits. The use of the same

interco~ect for both data and control information further reduces the implementation cost,

because there is no need for separate control protocols and hardware.

Chapter 4

Simulation

4.1 Introduction

The previous chapter presented a detailed description of the proposed multimedia workstation

architecture. In order to validate the performance stûternents contained in the previous chapter. a

cornparison between the bus-based and ATM-based architectures is needed. To accomp!ish this,

a system bus simulator and an ATM interconnect simulator were created.

This chapter contains the description of the two simulators and the analysis of results fiorn the

simulation nins. The following two sections explain the structure and the features of the

simulators, the assumptions made in their design, and their limitations. These sections also

explain the choice of the performance metrics and the simulation factors.

Section four presents the simulation results and their analysis. The choice of the simulation

scenanos is justified. For each scenario, a nurnber of graphs which represent the most significant

findings is included. A full listing of simulation results from all runs is offered in Appendix A.

The last section discusses the performance figures obtained from simulations and estimates the

impact the simulator assurnptions and limitations have on the reliability of the results.

4.2 ATM Interconnect Simulator

4.2.1 Simula for components

Four devices of an ATM multimedia workstation are rnodelled in this simulator, which include: a

main rnemory, a disk, a network c o ~ e c t i o n , and a live-video device. A single ATM switch

modelled after the Fujitsu ~ ~ 8 6 6 8 0 ~ ' provides an interconnect for al1 devices. An object

onented language was selected to impiement the simulator since al1 the above mentioned devices

'Refer to C h a p t e r 2, page 35

62

are easily mapped into separate software objects. The simulator was written in GNU C++ and nin

under üNIX operating system. There are six main objects in the sirnulator: a memory node, a

disk node, a network node, a video node, an ATM switch, and a main simulator module. The

relation between the simulator objects and the physical devices are depicted on Figure 4.1. The

main simulator module facilitates the connections and the data exchange among al1 object nodes.

4.2.2 Sirnulutor events

In order to rninirnize the time of the simulation runs, the simulator is event-dnven. Events are

generated by ail devices capable of transmitting and receiving data. The event queue

management and event execution is implemented by the main simulator module. This module is

also responsible for the initialization of al1 modelled objects and of the simulator as a whole.

To approxirnate the real data traffic, the data transfer events must occur at random intervals with

probabilities determined by the characteristics of the data traffic load. A prime module

multiplicative linear congruential generator (PMMLCG) [2 1.221 was used as a random event

generator. PMMLCGs generate random integers Zi . Zz, ... according to the following recursive

formula:

ZI = (~2 , - + c ) mod m

In this formula, u is the multiplier, c is the increment. Zo is the seed or the string value, and the modulus rn is the largest prime number less than 2 b , with b being the nurnber of bits in a word

on a computer used for the simulation. The PMMLCG used in this simuIator was of the form:

It was thoroughly tested and proved to produce an uniform and uncorrelated series of randorn

numbers, and is supported by the W I X operating system [Z I l . Each device capable of

generating events uses its own random number seed. These seeds are carefully selected to avoid

correlation between the randorn number streams generated fiom each seed. To accomplish this a

separate utility program was created which produced the seeds according to the procedure

described in [2 1 1.

L , , , , , , I L , , - , , , l L ,,,,,- I L , , , , - , I VIDE0 MODULE MEMORY hlODL'LE DISK MODULE KETWORK hIODCLE

c - -

1 , - I SOFTWARE OBJECT

0 PHYSICAL

DEVlCE

Figure 4.1 Relation between ATiM sirnulator objects and modelled physical devices

4.2.3 Simulation metrics

The delay through the interconnect, expressed in clock cyc tes, was selected as a measure of

performance for the simulated architecture. The delay reflects the time spent by each ATM ce11

in the ATM interconnect before being received by the destination device. The delay is calculated

as the nurnber of cycles between the time when the cell is ready to be transrnitted by the source

device till the time when the ce11 is fùlly received by the destination device, except when

caiculating the delay for cells coming from the extemal network. Since there is no physical

device that acts as the network interface3, the delay is measured From the time a cell is fully

received fiorn the network by the interna1 ATM switch.

4.2.4 Simulation parameters

Each device included in the simulator has an inherent data type, specific to the task it is intended

to perfom. This subsection describes the parameters that characterize the device data types.

These parameters determine the behaviour of al1 simulated objects.

Memory node. Throughout the simulation, the mernory is involved in the paging process with

the disk node. Therefore. the inherent data type for memory node is a page. Two parameters, a

page size and a page fault probability, characterize the paging process. Common page sizes of

4K and 8K [18] and two page fault probabilities of 1 O4 and 1 O-5 are used for al1 simulation

mns. These probability values are typical of average and heavy paging activity [18]. The page

fault probability together with the page size detemine the data rate of communication between

the memory and the disk.

Disk node. The disk node uses the sarne data type as the rnemory node. Therefore. the

simulation parameters for these two are identical.

Video node. This node is involved in video-teleconferencing, during which the video frames

are constantly being exchanged between this node and the extemal network. Therefore, a vide0

frarne is the most suitable data type for this node. Two parameters, video frame size and data

rate, descnbe the video-trleconferencing process. The video frame size determines how many

consecutive cells are used to send a single fiame. The maximum frame size was adopted fkom

the experiments with the full motion picture transmission over a local area network reported by

' Refer to Chapter 3, page 4 4

65

[6]. The data rate, expressed in Mbps, reflects the amount of video data transmitted per unit

time, and is used to specify the quality of the teleconferencing connection. The minimum rate

used in al1 simulation runs, 9 Mbps. corresponds to a currently used standard TV-quality

teleconferencing. I t is characterized by a 30 framesk refresh rate, an interlaced 720 by 480

pixels resolution, and an 8-bit colour display. Table 2.1 in Chapter 2 lists the characteristics of

other currently used teleconferencing standards. The maximum rate used in the simulations,

200 Mbps, approximates the teleconferencing quality of the Grand Alliance HDTV System

Specification Version 2.0. It defines the picture as 720 by 1280 pixels at 60 frarnes/s. and

24-bit colour. The above rnentioned data rates take into account an average MPEG

compression of the video data.

Nehvork node. Since this node is involved in video-teleconferencing with the video node. the

same set of parameters is used to characterize its behaviour.

4.2.5 Assumptions

The simulation assumes that the modelled system is already Fully initialized. The goal of the

simulation is to observe the behaviour of the system under normal operating conditions. Since

the initialization phase does not have any influence on the performance of the system. it can be

ignored.

The clocking speed of the ATM switch is 33 MHz, higher than the 25 MHz specified by

Fujitsu. This clock frequency was selected so that the aggregate bandwidth of the switch is

identical to the total bandwidth of the simutated bus. The rnodelled bus is 32-bits wide and

runs at 33 MHz. providing a total bandwidth of 1 O56 Mbps; the switch has four 8-bit wide

pons, and provides aggregate bandwidth of 1056 Mbps if ruming at 33 MHz.

The bandwidth of the connections among the simulated devices is equal to the maximum data

transmission and reception rate of the ATM switch pons. namely 264 Mbps. The intention of

this simulation is to measure the impact of the device parameten on system performance, not

the impact of the physical connections between the devices. In addition, point-to-point

comections c m easily exceed the assumed bandwidth of 264 Mbps, as s h o w by examples of

RAMBUS [4, 271 and Ramlink [12, 201 memory interface intercomects.

Al1 modelled devices are able to send and receive data at the rate equal to the interco~ect

bandwidth, which is a realistic assumption since the devices with rates in excess of 264 Mbps

are readily available on the market. As an example, the RAMBUS memory interface has a data

throughput of 500 MBps. A single SCSI-2 Ultra Wide hard disk controller cm handle data

transfers at 320 Mbps. This assumption eliminates the case when the device refuses to receive

the data from the ATM switch due to the inability to process the previously received cells. If it

was the case, the incorning ceIl would be dropped and retransmitted, resulting in the increased

ce11 delays and data trafic. The increase would be proportional to the time required by the

device to resume the ce11 reception. To remedy the problem. an input buffer could be added to

the ATM interface of each device. The ATM interface would receive the ce11 from the switch

and allow the device to retrieve the data iater.

Thc extemal network hardware is capable of generating a routing tag for the intemal ATM

switch. If the extemal network hardware did not have this ability. a separate device would have

to be used insiead. The addition of such device would increase the cell delays associated with

the traffic coming from the external network. Since the delays incurred by the extemal network

are most likely higher than the delays produced by the routing tag generation device. the

impact of this device on the total ce11 delay would be very small. However. the addition of this

device would definitely affect the cost of the ATM system. When sending cells to the extemal

network, the routing tags are automatically stnpped by the intemal ATM switch, resulting in a

standard ATM cell.

The smallest data object manipulated by the simulator is a cell. A ceIl consists of 56 bytes.

which is composed of a standard ATM ce11 (53 bytes) and a 3-byte routing tag required by the

self-routing Fuj itsu ATM switch.

4.2.6 Limitations

Only the data transfers are simulated. The control information flow is not implemented.

Therefore, if a ce11 is dropped in an ATM switch, only the statistics information is updated,

without the ce11 being retransmitted. Similarly, no data request cells are implemented. The data

transfer events are generated by a random eveiit generator.

Each of the modelled devices can only generate cells to one destination in a single simulation

m.

+ The drop rate of the cells in the ATM switch is calculated on a per-output-port basis. It is not

possible to directly determine the number of dropped cells which originated from a particular

input port.

Only the minimum. maximum. and average delays of the cells are recorded.

4.3 Bus Simulator

The bus simulator is a modification of the ATM interconnect simulator. Thetefore, al1

information presented in the previous section applies to the bus simulator as well. The

differences between these two are outlined below.

4.3.1 Simula for components

This simulator rnodels the sarne four devices as the ATM switch simulator: a main memory, a

disk, a live-video node, and a network interface. The relation between the simulator objects and

physical devices is depicted on Figure 4.2. The main simulator module provides the connection

between the simulated objects in the form of a systcm bus, perfoms bus scheduling, arbitration,

and data transfers. The generic bus specification chosen for the sin~ulation is loosely based on the

PCI standard. It is 32-bits wide and runs at 33 MHz. providing a total bandwidth of 1056 Mbps.

4.3.2 Simulation metrics

As in the case of the ATM simulator, the delay through the bus as experienced by the transferred

data was selected as the measure of performance for the system. For al1 devices, the delay is

calculated as the number of cycles from the time a data unit is ready to be transmitted in a source

device till the tirne it is fùlly received by a destination device.

4.3.3 Assumptions

+ The smallest data object manipulated by the simuiator is a 32-bit word. However. the bus

system implements data transfers using a DMA technique. The size of the D M . block is a

simulation parameter. Two common DMA block sizes [36], 256 and 5 12 bytes, were used in

al1 simulations.

Al1 devices are able to send and receive data at a rate of 264 Mbps. This value is selected to

allow fair cornparison between the bus and ATM interco~ects, and corresponds to the data

MEMOEYMODULE 0 - 0

DiSK MODULE M A P i SlhlLrLATOR MODULE - - O - - - -

I I r - - - - - - - - - I I

D i

. I I I I I I I I I I I

LIVE + . I

I ? 4 fi I . XETWORR VIDE0 I INTERFACE I I

I I I I I I L - - - - , , , , d

I I I I L o o o , , , I L , , o , , o I

P O -

I -, I SOFTWARE 0 PHYSfCAL

OBIECT DEVICE

MEMORY

I

i. I V!

VIDE0 MODULE I NETWORK MODULE I

- - o - o o -

I I

Figure 4.2 Relation behveen the bus simulator objects and modelled physical devices

rate of any device in the ATM simulator.

Each device bus interface is able to communicate with the bus at a rate equal to the throughput

of the bus. To accommodate this, each intertàce has a data buffer. In case of a bus congestion,

al\ data awaiting transmission in the device bus interface, and al1 new data generated by the

device is stored in this buffer. Similarly. data received from the bus is always stored in this

buffer before the device retrieves it. It is assumed that no bus interface buffer wiIl overf'low.

This assumption has minimal impact on the simulation results since the data rates used in al1

simulations never approach the bus bandwidth. They are always in the range of the device data

rate.

4.3.4 Limitations

Each device c m perforrn data transfers to only one destination dunng a simulation nin.

Only the DMA burst mode is implemented.

The process of bus ownership transfer is not implemented.

4.4 Analysis And Presentation Of Results

Al1 simulation runs are divided into four scenanos. This section provides a brief description of

each scenario, followed by the analysis of generated results.

6.4.1 Scenario 1 Uninterrupted unidirectional video transfer

4.4.1.1 Description

In this scenario a single stream of video fiames is being delivered from the network to the

workstation. There is no other activity on the interna1 workstation intercomect. Such scenario

corresponds to the situation when a movie or any other high bandwidth data stream is being sent

to the end user !tom a network source. The workstation is used solely as a display for the

incoming information. The delay on the intercomect is a sum of the tirne data spends in the

device buffer awaiting transmission and the transmission time through the interconnect. Since

there are no queuing delays, this scenario shows the differences in inh-însic transmission delays

for the two simulated intercomects.

4.4.1.2 Analvsis

The average and the maximum delays experienced by the data are depicted on Figure 4.3. Al1

data transfer rates and video fnme sizes tested proditce the same results for each of the simulated

intercomects. The incoming information flows without any interruption and reaches its

destination with a minimum delay. In this scenario. the delay observed is purely a transmission

delay. The ATM ceIl delay involves the time needed to deliver a ceil fiom the switch to the

destination device.

4.4.2 Scenario 2 Two- way teleconJerencing

4.4.2.1 Descn~tion

In this scenano, a live-video node sends a video-teleconferencing data stream to the extemal

network. At the same time, a data stream with the same data rate and the same video frarne size

amves fiom the network to be displayed by the workstation. This scenano shows the impact of

interference on the average data delay.

4.4.2.2 Analvsis

Figures 4.4a and 4.4b present the average delays observed for vanous teleconferencing data rates

and video fiame sizes. The ATM system shows identical and constant delays for al1 simulation

parameters. Therefore, only one line is sufficient to represent al1 results. The bus system shows a

different behaviour. For small video fiarne sizes, the delays are very similar to the ones in

Scenano 1 and increase very slowly with increasing data rates. As the data rate of the video

stream increases, for larger video frames, a significant increase in the average delay for the bus

system is observed. For 200 Mbps and a 20 Kbyte video frame, the average delay for the bus is

approximately nine tirnes longer than for the ATM system.

Graph 4.4b shows the average delays for the smaller size of the DMA block than graph 4.4a. The

corresponding curves on graphs 4.4a and 4.4b have very similar shapes except for the cases of

srnaIl video fiame sizes. Simulation results show that for increasing data rate and video frame

size the delays for both DMA block sizes converge. This is due to the fact that DMA block size

and the corresponding block transmission time are becomlng insignifiant as compared to the

7 1

Delays during unintemipted video transfer D M tramfer size of 51 2 bytes

9 50 1 O0 150 200 Data rate (Mbps)

Figure 4.3 Uninterrupted transfer delays as recorded by the video node

Average delay during teleconfemncing DMA îramfer size 256 bytes

- --

9 50 100 150 200 Data rate (Mbps)

- *

ATM ail frarnes + Bus lK Irama Bus 5K lrame -3- Bus t0Kfram.e. - - I - - ! !Js-~~. ! rame~- -- - - - - . - --- -d

Video node

Average delay during teleconferencing DMA kansfer size 5 12 bytes

01 9 50 100 150 200

Data rate (Mbps) - - - - -- - - - --

ATM ail frarncs + Bus l K hama + &sSKframs . - 0 ~ s 1OK !rame --i- Bus 2OK trama

Video node

Figure 4.4 Average teleconferencing delays as experienced by the video node for DMA

size of a) 256 bytes b) 512 bytes

time spent waiting for the bus to become available.

Graphs 4.5a and 4.31 show the maximum delays of the teleconferencing traffic from the same

simulation runs as Figures 4.4a and 4.4b. The graphs indicate very long waiting delays in the bus

system. For data rate of 200 Mbps. 20 Kbyte video fiame size, and DMA size of 256 bytes, the

maximum delay is 10 195 cycles for the bus system compared to 57 cycles for the ATM system.

ln the case of the ATM interconnect, the average and maximum delays are constant as seen on

Figures 4.4 and 4.5. This is due to the fact that the switch provides a point-to-point path for the

incoming and outgoing data streams. The streams do not compete for the same shared resource,

namely the interconnect bandwidth. as in the case of the bus. One should note that even in the

case of the highest traffic load simulated, 200 Mbps, the bus utilization is only 38%. Therefore,

one can expect even longer bus delays for higher interconnect utilization.

The graphs 4.4 and 4.5 present only the delays for the video node since the simulated system is

symmetrical and the results for the network node are very similar.

4.4.3 Scenario 3 Two- way tekcon ferencirrg with page faults

4.4.3.1 Description

In this scenano, four data streams are being delivered through the workstation interconnect.

M i l e the network and video nodes are engaged in video-teleconferencing, the memory and disk

nodes are involved in paging. This scenario shows the impact of the high-bandwidth multimedia

information, in the form of video-teleconferencing, on the performance of the workstation paging

system. It also examines the influence of paging activity on the delays of the teleconferencing

c o ~ e c t i o n .

4.4.3.2 Analysis

The teleconferencing streams expenence delays very similar to the ones from Scenario 2,

because the page faults do not constitute a significant bandwidth burden. However, the effect of

the teleconferencing streams on the paging delays is worth noting. Figures 4.6a and 4.6b present

Maximum delay during tdeconfemncing DUA tramfer size 256 bytes

r

O rn = 7 - O

9 50 100 150 200 Data rate (Mbps)

Video node

Maximum delay durlng teleconferencing D M transfér sue 512 b y m

9 50 100 150 200 Data atc (Mbps)

Figure 4.5 Maximum teleconferencing delays as experienced by the video

node for DMA size of a) 256 bytes b) 512 bytes

the average and the maximum delays experienced by the memory node as a function of the

teleconferencing data rate with a fixed paging rate. Only the figures for the 256 byte DMA block

size are given, because the results for DMA size of 5 12 bytes are very similar. The disk node

results are also very similar to the memory results, because the simulation setup is symrnetrical.

Figure 4.6a shows a steady increase in the average delay experienced by the bus system with the

increase of the video data rate and frame size. This is due to the fact that the high-rate

teleconferencing uses a significant portion of the shared bus bandwidth, thus increasing the

waiting times of the pages before they can be transmitted over the bus. The average delays of the

ATM interconnect remain constant because al1 the connections present in this scenano do not

interfere with each other. Each device uses a separate path to its destination device and each

destination device is associated with a separate output port on the switch. For teleconferencing

rates below 75 Mbps, the bus system outperforms ATM. This is because the pure transmission

delay of the bus is smaller cornpared to the ATM interconnect. The crossover point falls between

the data rates of 75 Mbps and 130 Mbps, depcnding on the size of the video Frame. The

maximum delays are significantly bigger in the bus system, for the whole range of the simulated

video data rates.

Figures 4.7a and 4.7b show the average and maximum delays analogous to those in Figures 4.6a

and 4.6b, but for a heavy paging load. The delay increase with the increasing teleconferencing

rate observed for the bus system is even more pronounced. In most cases, the average and

maximum delays are at least doubled as compared to the average paging load. The bus

outperforms the ATM system only for the teleconferencing rates below 50 Mbps. For higher

paging load the crossover point falls somewhere between the data rates of 30 Mbps and 50 Mbps.

Some delay curves in Figures 4.6b and 4.7b show a counterintuitive behaviour; for increasing

data rate, the decrease in the maximum delay can be observed. This behaviour stems fiom the

fact that the maximum delay is a random variable of the statistical pattern of interference among

al! data streams in scenario three. Regardless of this behaviour, the most important conclusion

drawn fiom these figures is that the maximum delay in the bus system is more than one order of

magnitude larger than in the ATM system.

Average delay during teleconferencing' with paging DMA tmmjier size 256 bytes

Page ske IKbytes, Page fault 1 E-6

- - - - - - - - . - -. + ATM ail framas -t- Bus lK frarno Bus SU trama -M- Bus 10% framo + Bus ZOK frame - -

Memory node

Maximum delay during teleconferencing with paging DM4 transfer size 256 bytes

Page sée JK&IS, Pagefïuft IE-6

01 m - - - - - I

9 50 100 150 200 Data rate (Mbps)

- - - - - .. -- - - - - - ATM al1 frames + Bus I K frame + Bus 5K fmmo

* . m ~ s m K m e -_BusOKhamo

Memory node

Figure 4.6 a) Average and b) maximum teleconferencing delays as experienced by the

memory node with average paging

Average delay during telwonferencing with paging DMA tramfer size 256 bytes

Page size BKbyfa, Page /au11 1E-5 c 600 1 0 500 C

400

= 300 Y C

1 200 C

s 100

O ' 9 50 100 1 50 200

Dafa rate (Mbps)

' + ATM ail hamas Bus 1Kframa -+- Bus 5K frarno

Maximum delay during teleconfennclng with paging DMA tramfer size 256 bytes

Page s ix BKbytes, Pagejâulr IE-5

O ' 9 50 100 150 200

Data rate (Mbps) - . - - -. . . - - - - - - - - - -- - -- - - - -- . + ATM al1 harnas Bus l K lrame * Bus 5K frarno

Bus lOK !rama Bus-2OK @ma _ -- . - - - - - - . Memory node

Figure 4.7 a) Average and b) maximum teleconferen cing delal experienced by the memory node with high paging rate

78

4.4.4 Scenario 4 Browsing and video transmission with page faults

4.4.4.1 Description

This scenano consists of three simultaneous operations. The first operation, browsing, involves a

high-bandwidth document transfer from the extemal network directly to the workstation

memory. The document is not sent in one long burst. but is subdivided into smaller data units.

The size of the data units is treated as one of the simulation parameters. The second operation is

a transmission of a video Stream recorded at the workstation from the live-video node to the

network. In addition to these two processes. page faults occur between the rnemory and the disk.

This scenario is intended to show the consequences of sending two data streams to a single

device, the memory. The rnemory node receives pages from the disk and the browsed document

to be displayed. The video transmission is used to create interfering traffic.

4.4.4.2 Analysis

Figures 4.8a and 4.8b show the comparison of the ATM and the bus system with a high, fixed

data rate of the interfering traffic and an average page fault probability. Figure 4.9 complements

them with the graph depicting the effects of the simulation panmeters on the cell drop rate. The

graphs indicate that even though the delay through the ATM interconnect is still fairly constant

for al1 data rates, a small percentage of cells is being dropped for higher data rates. The drop rate

will affect the cell delays because these cells will have to be retransmitted. If the dropped cells

onginate fiom the extemal network, the increase in their delays can be significant. However, as

seen on Figure 4.9, the drop rate never exceeds 0.35%, so the impact of the delay of retransmitted

cells on the average ce11 delay should be negligible. The increase in the drop rates is due to the

limited size of the output pon buffer in the ATM switch. The buffer starts to overflow when it

receives large amounts of data from more than one source. The steady increase in the average

delays for the bus system, as observed in Figures 4.8a and 4.8b, is consistent with the results

fiorn previous scenanos. Larger data rates translate into longer delays through the interconnect.

The ATM system outperforms the bus system in al1 but one case, when the document is

subdivided into srnall 1 Kbyte data units.

Average delay: browsing, paging, and video transmit Paging at I E-6, page size 4K

Video trafic ut 2OOMbps with drua unir 20K c 240 i

" 9 50 100 150 200

Data rate (Mbps)

Average delay: browsing, paging, and video transmit Paging ut 1 E-6, page sùe 4K

Video trafic ai 2OOMbps wirh data unit 2ûK c 6001

9 50 100 150 200 Data rate (Mbps)

Figure 4.8 a) Average and b) maximum teleconferencing delays as experienced by the memory node with average pagiog

rate

Cell drop rate: browsing, paging, and video transmit Paging at IE-6, page size 4K

Video haflic Ü t 2 0 0 ~ b ~ s k t h dora unit 20K

9 50 100 150 200 Data rate (Mbps)

- - -- - - . - - --C- 1 ~ " i d e o frame 5K video frarne - + l0K video frame

20K video frame - -- ----

Memory node

Figure 4.9 Cell drop rates for average paging load in the ATM system

Figures 4.1Oa and 4. lob present the average data delays for the high paging load, while Figure

4.1 1 shows the corresponding ce11 drop rates. In the case of a heavy paging load, the resuits are

quite favourable for the bus system. As seen on Figure 4.10, the ATM interconnect experiences

significantly longer delays compared to the bus system. The cell drop rates are also high, and

reach up to 7%. Therefore, the impact of the delay of the retransmitted cells on the average ce11

delay can be considerable. However. the utilization of the bus is quite low, around 35%. One can

expect that for higher utilization values, the bus delays will be comparable to the ATM

interconnect delays. In the current fom, the simulator is not capable of producing higher bus

utilization, because of the small number of modelied devices.

It is worth noting that in the ATM system only the memory node, acting as a destination for two

data streams, expenences larger delays and drop rates. The other nodes maintain a very good

level of performance and a lossless operation as in the previous scenarios.

4.5 Discussion

4.5.1 Performance

The previous sections present a performance cornparison between the proposed ATM

interconnect and a generic bus. These interconnects were subjected to different loads in four

different scenanos. The analysis of the results obtained fiom the simulation runs indicates that

there are some cost/performance trade-offs that should be considered in the design of the ATM - based workstation.

The first scenario shows the base performance equivalence of a genenc bus and the ATM

intercomect. Both systems exhibit similar performance under identical load with no queuing

delays and interfenng traffic present, which serves as a basis for a companson of the two systems

under different load configurations.

The second scenario illustrates the improvement of performance resulting fiom using an ATM

interconnect instead of a bus. A significant performance degradation in the form of longer delays

can be observed in the bus system, with the ATM systern delays remaining small and constant.

This delay gap between the two simulated interco~ects is especially visible for large data rates

Average delay: browsing, paging, and vicko transmit Paging at I E-5, page size 8K

W e o ircific ut ZûûMbps willi data irnit ZOK

9 50 100 150 200 Data mtc (Mbps)

Average delay: browsing, paging, and video transmit P aging ut I E-5, page size 8K

Vidm tratc ot ZûûMbps with data unit 20K C A L W

1 0 1OOO C

800

600 Y C

1 C

s 200

01 9 50 100 150 200

Data rate Wps)

Figure 4.10 a) Average and b) maximum teleconferencing delays as experienced by the memory node with high paging rate

Cell drop rate: browsing, paging, and video transmit - -

Paging at Ï E - ~ , page size 8K Video trafic at 2OûMbps with data unit 20K

9 50 100 150 200 Data rate (Mbps)

- -A- - - --- - - - -- - -- --- -.)- 1 K video frame - + 5~ vide~ frame + 10K video frame

2OK video frame A --

Memory noûe

Figure 4.1 1 Cell drop rates for high paging load in the ATM system

of the load. This behaviour is expected. because while in ATM system the two teleconferencing

streams are transmitted over separate point-to-point channels, the bus shows signs of their

interference.

The impact of high-bandwidth data streams on the performance of the memory system is

analyzed in the third scenario. The bus memory system exliibits better performance for low

teleconferencing data rates, while the ATM interconnect is supenor for high data rates. The delay

of the ATM memory system is constant and independent of the load data rate, because the

memory-disk connection is separatc from the video-network connection. The performance of the

bus memory system gets progressively worse with the increasing teleconferencing data rate. At a

crossover point, the bus delays become longer than for ATM. The crossover point depends on the

paging load and the data rate of the teleconferencing connection. which determine the bus

utilization. The crossover point is centered around the teleconferencing data rate of 100 Mbps,

which corresponds to the broadcast-quality NTSC video. For comparison, high-quality image

transmission, such as for example studio-quality NTSC and high-definition video, requires

bandwidths far surpassing the 100 Mbps mark [25]. Therefore. to obtain better performance of

the memory system with high-quality images, the ATM workstation is preferred to the bus

workstation. At the same time, the delays experienced by the teleconferencing streams are similar

to the ones fiom the second scenario. where the ATM interconnect is superior to the bus

interconnect for al1 load data rates.

The last scenario, in which memory is a destination device for two different data strearns,

exposes a cost/performance trade-off for the ATM intercomect. The performance of the bus

systern is far supenor to the ATM system. The ATM switch output buffer overflows where two

data streams compete fgr the same output port, which should be avoided in order to maintain

good performance of the ATM interconnect. Dedicating a separate port for each of the competing

data streams brings the delays back to the data rate independent and constant minimums, as

observed in scenano two and three; two switch ports should be dedicated to the memory node,

one for the network-memory browsing comection, the other for the memory-disk paging.

When designing an ATM interconnect, the ports of the ATM switch must be carefully allocated,

with any data streams competing for the same output port and their respective cliaracteristics

taken into account. The allocation of additional ports increases the overall cost of the

workstation, since the ATM switch with a larger amount of ports may be needed to facilitate al1

necessary comections. If some connections can accept the performance penalty indicated by the

graphs fiom scenano four. using only a single pon can krep the workstation cost low.

4.5.2 Resuifs relia biliîy

There are three limitations of the simulator that have an impact on the reliability of results

obtained fiom the simulations. Each of them is discussed below in an attempt to estimate the

degree of error they may bring into the conclusions.

4.5.2.1 Control information

The first of these limitations is the lack of implementation of the control information commands.

In the case of the ATM system, one has to deal with two types of control cells: connection

setup/termination and read/write request cells. The impact of the connection setup/termination

cells on the ATM system performance is negligible. The number of conaol cells per connection

is usually limited to three: setup, connect. and comect acknowledgement cells [2]. Considering

the large amounts of cells transferred during a connection, the delay increase is unnoticeable. The

reacUwrite request cells are used during the paging operations. One readwrite ceil is sent through

the interconnect per disk page. This additional traffic is therefore less than 1.4% of the total

paging traffic. There are no read/write request cells in the teleconferencing process, only the data

and setup/termination cells described above.

The impact of the control information is higher in the case of the bus system. ïhere are typically

five words required to set up each DMA transfer [36]. The words contain the DMA destination

address, the amount of data transferred, the direction of the transfer, the source data address, and

the execution start cornmand. Hence, the trafic increase can be estimated at less than 8% for a

256 byte DMA block and less than 4% for the 5 12 byte block. Since al1 control information is

sent over the bus, the bus transfers fiom a11 sources are affected. The burden of the

teleconferencing c o ~ e c t i o n establishment cannot be easily calculated, because it depends on the

type of the extemal network and the features of the physical network interface device used.

However, as in the case of the ATM. connection setup occun infrequently and should have

negligible impact on the simulation metrics.

4.5.2.2 DMA mode selection and bus ownership transfer

The simulator implements DMA burst mode, but not the cycle-stealing mode. The main reason is

that cycle stealing is advantageous only when the DMA transfers occur on the same data path as

the CPU-rnernory traffic. Cycle stealing sends the contents of the DMA block one word at a t h e ,

separating each word transfer by several bus cycles, to allow CPU access to the memory. Burst

mode sends the whole DMA block in one unintempted transfer. blocking the CPU access to the

bus and the memory for the duration of the transfer. Since the simulated bus connects only the

peripheral devices and does not provide the CPU-memory connection, burst mode is preferable

to cycle-stealing mode. In addition. cycle stealing brings a significant overhead associated with

the bus ownership transfer. For every word transferred, two bus cycles are wasted for obtaining

and releasing the bus mastenhip. In burst mode, bus ownership changes occur only at the

bçginning and end of the D M A block transfer. The additional traffic due to the bus ownership

change in burst mode can be estimated at 3.125% for a 256 byte DMA block, and 1.56% for a

5 12 block, with the assumption that bus ownership transfer requires a single bus cycle to

complete.

4.5.2.3 Ce11 retransmission

In scenario four, ceIl &op rates of up to 7% can be observed. However, this drop rate is not

reflected in the simulation results, because the simulator does not have the ability to request and

retransmit the dropped cells. For each dropped cell, two control cells should be generated, one

destined for the CPU, and one for the device from which the dropped ce11 originated4. Based on

this information, there will be an increase in the delay for al1 cells present in the switch queues

associated with the CPU and the originating device. Since the CPU node is not implemented, the

additional delays have no impact on the obtained results. The additional delays incurred in the

onginating device switch queue will be proportional to the &op rate of cells coming fiom this

'Refe r t o C h a p t e r 3 , page 53

87

node. There will also be an increase in the data rate from the originating device, which in turn

might further escalate the drop rate. However, the inclusion of the additional delays caused by

the ce11 retransmission in the simulation results will not change the conclusions drawn in

scenario four. It will only widen the performance gap between the ATM system and the bus

system.

4.5.2.4 Conclusion

The review of the possible effects of the simulator limitations on the generated results indicates

that those limitations do not change the in ferences drawn. If the existing limitations were

removed fiom the simulator, the increase in traffic would enforce the performance analysis

statements and conclusions.

Chapter 5

Conclusions

5.1 Introduction

The proposed multimedia architecture was intended to fulf i l l two main goals. The first goal was

to remove the constraints typically present in the bus-based systems. The second one was to add

new functionality which would allow a more efficient execution of the current and future

generations of multimedia applications. The ideal systern, while adding functionality, would also

match or exceed the performance of a bus-based system without being difficult or costly to

implement.

ARer describing the particulars of the ATM-based architecture in Chapter 3 and presenting the

performance evaluation in Chapter 4, i t can be concluded that the new architecture fulfils the

above mentioned goals.

5.2 Performance

Chapter 4 focuses on comparing the ATM intercomect central to the new architecture with a

bus-based system loosely based on the PCI standard. The graphs from different pnctical usage

scenarios indicate that the ATM-based solution exhibits superior performance in most tested

cases. The data delays through the interconnect, selected as the performance metrics, are

signiticantly smaller than the corresponding ones for the bus. Only in the case of multiple

devices sending the data streams to a single device is the delay for the ATM interconnect larger.

which can be remedied by allocating more ATM switch pans to this devices. However, even in

this case, the performance of al1 other nodes of the ATM system is still supetior to the bus-based

system. Another important characteristic of the ATM system is that its performance is

R e f e r to discussion in Chapter 4, page 82

89

independent of the load data rate", except when two data streams are directed to the same switch

output port. The performance of the bus degades with the increasing data rate and data unit size.

5.3 Features

The ATM system makes new functionality available for the curent and future generations of the

multimedia applications. It irnplernents in hardware, through the built-in features of the ATM

switch, the multicasting and broadcasting of the information. Al1 devices present in the system

can be both transmitters and receivers of multimedia information. This not only allows the

system to display the multimedia data, but also to process it in a variety of ways.

By niaking the intemal cell switching interconnect compatible with the ATM protocol, the need

for a separate network interface has been eliminated. The intemal interconnect becornes an

extension of the external ATM network. The reverse is also true; the external network is treated

in the same manner as any workstation cornponent.

The scalability of the new architecture is also superior to the bus-based architectures. The

addition of new devices to the system requires only an available port on the intemal ATM

switch. The added components do not degrade the overall performance of the interconneci as in

the case of a bus system. The total bandwidth of the interconnect can be easily increased without

system redesign by using an ATM switch with a Iarger number of ports.

5.4 Future Work

This section proposes possible topics for further study of the properties and performance

characteristics of the ATM multimedia system. The currently used simulator has many

limitations and provides only a first level approximation of the processes occumng in a real

computer system. Therefore, a more detailed simulation is needed to give more accurate

performance figures. It would include a larger number of devices communicating with each other

over the ATM interconnect. These devices would be able to send data to multiple destinations

' Refer t o d i s c u s s i o n i n Chapter 4 , page 82

90

during the same simulatiori run. Including the control processes in this new simulation would

also be essential.

The hardware implementation usually exposes a different set of architectural problems, which

could lead to fùrther modifications to the architecture. Therefore, building a hardware prototype

of the ATM system can provide more insight than the detailed simulation. The prototype would

not only allow further verification of the simulated results presented in Chapter 4, but also the

testing of the behaviour of the ATM interconnect under a real workload.

Since the ATM architecture requires a very different approach in handling data and hardware, a

new operating system would have to be developed. This operating system needs to be

'ATM-aware' because it would be required to directly interact with al1 peripherals and the

extemal network using the ATM protocol. It would also be responsible for proper ATM

interconnect scheduling to ensure that al1 data streams can allocate the required bandwidths. The

development of such operating system would in tum permit the testing of the performance and

system characteristics from a user perspective. Multimedia application could be executed on the

hardware prototype to vaiidate the practicality of the ATM multimedia workstation.

[1 ] Adam. Joel F.. Henry H. Houh, Michael Ismert, and David L. Tennenhouse [ 19941.

"Media-Intensive Oata Communications in a 'Desk-Area' Network," IEEE Communications

Magazine (August). (pp 60-67)

[2] A TM: UW-Nertwork Interjiuce Specici/icu&ion. Version 3.0. Englewood Cli ffs. NJ: Prentice

Hall, 1993.

[3] Burham, P., M. Hayter, D. McAuley, and 1. Pratt [1995]. "Devices on the Desk Area

Network," lEEE Journal on sekcted ureas in commitnicatons 13:4 (May). (pp 722-732)

[4] Bursky, Dave [1992]. "Memory-CPU interface speeds up data transfers," Ekctronic Design

(March 19). (pp 137- 142)

[5] Cheung, Nim K. [1992]. "The Infrastructure for Gigabit Computer Neiworks," IEEE

Communications Mugarine ( Apri 1). (pp 60-68)

[6] Chou, Chih-Che, and Kang G. Shin [1994]. "Statistical Real-Time Video Charnels over a

Multiaccess Network," Higll-Speed Netivorking and Muftimedia Computing, SPIE Proceedings,

volume 21 88, San Jose. CA. (pp 86-96)

[7] Comerford, Richard [ 1 9961. "Interactive media: an intemet reality," IEEE Spectrum (Apnl).

(PP 29-32)

[8] "Digital video compression on personal computers: algorithms and technologies."

Proceedings of SPIE. 7-8 Febntary 1994. San Jose, Cafi/ornia. Bellingham, Washington: S P I E ,

1994.

[9] Fanighetti, Robert (editor). The rvorld Almanuch and book of facrr 1995. Mahweh, NJ: Funk

& Wagnalls, 1995.

[IO] Finn, Gregory G. [1991]. "An Integration of Network Communication with Workstation

Architecture," ACM SIGCOMM, Cornputer Cornmunicurions Review (October). (pp 18-29)

[ l I l Foley, J. D., A. van Dam. S. K. Feincr. and J. F. Hughes, Computer graphies. principles und

practice. New York: Addison-Wesley, 1992.

[12] Gjessing, Stein, David B. Gustavson, David V. James, Glen Stone, and Hans Wiggers

[1992]. "A RAM link for high speed." lEEE Specrntm (October). (pp 52-53)

[13] Goldberg, Lee [1994]. "ATM switching: a bief introduction," EIectronics design

(December 16). (pp 87-103)

[14] Hayter. Mark David [1993]. A CVorkrtdon Architectirrr to Support Mitltirnedia. Ph. D.

Thesis, St John's College, University of Cambridge (September).

[15] Hayter, Mark, and Richard Black [1992]. "Fairisle Pon Controller Design and Ideas," ATM

Document Collection 2 (ûrunge Book). Cambridge University. (p23)

[16] Hayter, Mark, and Derek McAuley [199 1). "The Desk Area Network," AChf Operuting

Systems Review (October). (pp 1 4-2 1 )

[17] Heath, Michael T., Allen D. Malony, and Diane T. Rover [1995]. "The Visual Display of

Pardlel Performance Data," IEEE Computer (November). ( p p 2 1-28)

[18] Hennesy, J. L., and D. A. Patterson, Computer architecture a qituntitative approach. San

Mateo, CA: Morgan Kaufmam, 1993.

[19] Hohnem, K. H., B. Pflesser, A. Pornmert, M. Riemer, T. Schiemam, R. Schuber, and U.

Tiede [1996]. "A 'Virtual Body' Mode1 for Surgical Education and Rehearsal," IEEE Computer

(January). @p 25-3 1 )

[20] IEEE Drap Standard for High-Bandwidrh Memory Interface Based on SC1 Signalling

Technologv (RamLink). P1596.4. New York: IEEE, 1995.

[2 11 Jain, Raj, The art of computer svstems performance ana[vsis. New York: Wiley, 1991.

[22] Law, Averill M.. and W. David Kelton. Simulation modeling di analwvsis. New York:

McGraw-Hill, 1992.

[23] Leslie, 1. M., and D. R. McAuley [1991]. "Fairisle: An ATM network for the local area,"

Proc. ACM SIGCQMM (Septernber). (pp 2 1 -35)

[24] Levy, Roger, and Hank Kafka [1992]. "Are YOU ready for ATM," Telephony (Novernber

30). (pp 32-35)

[25] Lyles, J. Bryan, and Daniel C. Swinehart [1992]. "The Emerging Gigabit Environment and

the Role of Local ATM," IEEE Commrtnications Magazine (Apnl). (pp 52-58)

[26] MB86680B ATM Swirch elment (SRE) Dura Shm. Fujitsu Limited. 1994.

[27] Ota, Hiroo [1992]. "Rarnbus vs synchronous DRAM," Ekctronic Engineering (November).

(pp 104- 105)

[28] Perry, Tekla S. [1996]. "The trials and travails of interactive TV," IEEE Spectntm (April).

(PP 22-28)

[29] Prabhat, K. Andleigh, and Kiran Thakrar, Multimedia *rems Design. Upper Saddle River:

Prentice Hall, 1996.

[30] Prycker, Martin de, Asynchronoiis rransfer mode: solution for broadband ISDN, 2nd ed.

New York: E. Horwood, 1993.

[3 11 Robb, R. A., D. P. Hanson, and J. J. Camp [1996]. "Cornputer-Aided Surgery Planning and

Rehearsal at Mayo Clinic," [EEE Cornputer (January). @p 39-47)

[32] Roy, Radhica R. [1994]. "Networking Constraints in Multimedia Conferencing And the

Role of ATM Networks." AT& T Technical Joiirnal (JulflAugust). (pp 97- 108)

[33] Soaliyappan, M., T. Poston. P. A. Heng. E. R. McVeigh, M. A. Guttman, and E. A.

Zerhouni [1996]. "Interactive Visualization for Rapid Noninvasive Cardiac Assessment," IEEE

Cornputer (January). (pp 55-6 1 )

[34] Spragins, John D.. Joseph L. Hammond, and Krzysnof Pawlikowski, Telecommunications:

Protocols and Design. New York: Addison-Wesley, 199 1.

[35] Steinmetz, Ral f, and Klara Nahrs tedt, Muitimediu: computing, commtrnicutions and

applications. Upper Saddle River: Prentice Hal 1, 1 995.

[36] Vranesic, Zvonko G., and Sa fivat G. Zaky, hlicrucompirter structures. New York: Saunders

College Publishing, 1989.

Appendix A

Tabulated Simulation Results

A. 1 Uninterrupted Video Transfer

II -p - -- -

DMA size=5 11 bytes; Vidco frame s i x = I K ~ K , I OK, 2OK II II Transfer data rate (Mbps) If

1 9

II DMA sizc=256 bytes; Vidco frame sizc= I K, 5K, IOK, 20K II

ATM average ATM maximum

Bus average Bus maximum

Delay - Video node (clock cycles) 50

11 Delay - Video node (clock cycles) 11

100 1 150

57 57

64.5 128

Transfer data rate (M bps)

II Bus average 1 32.5 1 32.5 1 32.5 1 32.5 1 32.5 11

200 1 9 1 50

ATM average BUS maximum

Ii Bus maximum 64 1 64 1- 64 1 64 1 64 1 1

57 57

64.5 128

100 1 150

57 57

64.5 128

57 57

64.5 128

57 57

57 57

64.5 128

57 57

57 57

57 57 57 57 I

A.2 Two-Way Uninterrupted Teleconferencing

DMA size=5 12 bytcs

DMA size=256 bytcs 1 1

System type

ATM

Bus

ATM

Bus

*

II Systcm typc 1 Vidco fnmc (bytes) / Avcngc dclay - Vidco nodc (clock cycles) ,

Vidco fmmc (bytcs)

al1

1 K

5K

1 OK

20 K

alt

1 K

5K

1 OK

20 K

Transfer data rate (Mbps)

II 1 1 Maximum delav - Video node (clock cvcles) 1

9 1 50

II ATM al1 57 l 57 57 1 57 I 57


Maximum dclay - Vidco nodc (clock cyclcs)

Bus

57 384

1,267

2,462

4,931

1 O0

9 50 1 O0 1 150

57

473

1,528 3,120 5,315

150

200

Avcragc dchy - Vidco nodc (clock cycles)

200

57 64.647

68.578

73.387 77.639

57

591

2,290

4,267

8,508

57

65.393

86.81 9

1 15.449

17 2.029

57 582

2,775 4,558

8,689

57 71 8

2,742

5,152

10,131

57

66.283 109.372

168.632 286.349

57 67.286 132.994

223.833

408.303

57

68.349

156.993

279.1 17 520.239

A.3 Two-Way Teleconferencing With Average Paging Load

A.3.1 Video node

DMA sizc=5 12 bytcs

I Transfer data rate (Mbps)

Bus

1 Systcm type

ATM

ATM

Bus

DMA sizc=256 bytcs

Vidco franic (bytes)

al1

dl

1 K

5K

I 0K

20 K

Systcm typc

ATM

Bus

r

ATM

Bus

Avcragc dclay - Vidco nodc (clock cyclcs)

Maximum dclay - Vidco nodc (clock cycles)

Vidco frdrnc (bytcs)

all

1 K

5K

1 OK

20K

ail

1 K

5K

10K

20 K

57 1 57

57

304

1,267 2,462 4,931

57 1 57 1 57


Maximum dclay - Vidco node (clock cycles)

57 753

1,813

3,513

5,315

9

57 387

1,267

2,462 4,931

57

749

2,290 4,835

8,508

50

57 695

1,877

3,577

5,379

Avcragc dclay - Vidco node (dock cycles)

57

1,089

2,?75 4,558

8,689

1 O0 1 150

57

33.15

38.028 43.953 49.193

57 862

2,290 4,963 8,508

57

1,046 2,742 5,172

10,731

200

57

35.488 60.395 90.25

148.028

57 1,209 2,839

57

1,173

2,742

57

38.245 86.ï72

147.605 268.155

4,558

8,689

57

41.253 114.463

207.938 396.79

5,236

10,195

57

44.391 142.91

267.91 2 51 3.452

4

A.3.2 Memûry node

DMA sizc=5 12 bytcs; Pagc sizc=4K; Pagc fault probability= 10"


Systcm typc 1 - ATM

Il Maximum dclay - Mcmory nodc (clock cycles)

Bus

Vidco fnmc (bytes)

al l

DMA sizc=Z56 bytcs; Pagc size=.?K; Page fault probability=104

Averagc delay - Mcmory node (clock cycles)

113 1 113 1 113 1 113 1 113

5K

1 OK

ATM

Bus

Transfer data rate (M bps)

Systcm type Vidco fnmc (bytcs) Avcnge dclay - Mcmory nodc (clock cycles)

ATM al1 113 1 113 1 113 1 113 1 113

69.361 72.787

al1

I K

5K

10K

20K

Bus 1 OK 42.287 76.563 123.375 200.407 241 343 ZOK 36.756 72.869 140.747 202.605 328.856

100.535

102.1 96

113 384

1,267 2,462 4,931

Maximum dclay - Mcmory nodc (clock cycles)

ATM al1 113 113 113 113 113 - 1 K 1 576 768 873 1,216 1,203

130.44 141.932

113 753

1,813 3,513 5,315

Bus

165.078 208.049

113 749

2,290 4,835 8,508

208.907 242.529

5K

IOK

$13 1,089 2,775 4,558

8,689

113

1,046 2,742 5,172 10,131

1,013

1,022 1,429 3,072

2,029 3,411

2,240 3,180

2,185 4,950

A.4 Two-Way Teleconferencing With Heavy Paging Load

A.4.1 Udeo node

DMA size=256 bytcs; Pagc sizc=8K; Pagc fadt probabilip IOe5

Transfer data rate (Mbps) 9 1 50 1 1 O0 150 200

Systcm type Vidco fnmc (bytes) Avcngc dclay - Vidco nodc (clock cyclcs) ATM al l 57 57 57 57 57

1 K 36.778 41.407 47.138 54.078 61.41

Bus 5K 63.27 86.092 118.603 153.08 188.575 IOK 98.106 140.373 208.667 275.207 348.076

II 1 1 Maximum dclav - Vidco nodc (dock cvclcs)

Bus 5K 2,855 3,603 4,240 6,511 5,529 '

1 OU 4,240 5,424 7,901 7,467 8,105

20K 6,752 + 9,664 9,864 10,673 12,631

II DMA sizc=5 12 bytcs; Pagc sizc=8K; Pagc fault probability= l oS5 II Transfer data rate (Mbps)

Bus 5K 91.015 110.498 138.269 167.71

I OK 124.91 1 163.648 226.83 288.256 355.285

9 1 50

Systcm typc

ATM

II 1 1 Maximum delay - Video node (dock cycles) II

1 O0

Video fnmc (bytcs)

dl

1 K

ATM

Bus

150

Avcrage delay - Video node (dock cycles)

200

57

66.37

al1

1 K

5K

I OK

20K

57

68.305

57

1,725

2,79f 4,112

6,688

57

70.845

57 2,094 3,411 5,232

9,472

57

74.174

57

3,207

4,112 7,773

9,800

57 77.872

57 2,671

6,383 7,330

10,609

57 3,847

5,401 7,786

12,503

A.4.2 Memory node

DMA sizc=256 bytcs; Pagc sizc=8K; Pagc fault probabilityr10"


Systcm type Vidco framc (bytes) Avcngc dclay - Mcmory nodc (clock cycles) ATM dl 113 113 113 113 113

1 K 65.156 121.35 187.934 251.083 311.238 / Bus 5K 68.895 142.765 230.202 315.331 404.225

1 OU 72.1 37 144.692 248.206 340.712 470.097

Maximum dclay - Mcmory node (clock cyclcs) ATM al l 113 113 113 113 113

1 K 2,171 2,387 3,483 3,044 4 , m

Bus 5K 3,386 3,640 3,889 6,508 4,733 I OK 4,667 4,367 7,257 6,995 8,173

20 K - 3,975 10,336 7,782 8,456 10,403

DMA sizc=5 12 bytcs; Pagc sizs=8K; Pagc fault pr~bability=IO-~

Systcm typc

ATM

Bus

. ATM

Bus

Vidco framc (bytes)

al1

1 K

5K

1 OK

ZOK

ail

1 K

5K

10K

20K


Maximum delay - Memory node (dock cycles)

9

113 3,932

4,482 7,917 10,339

113 2,1 07

3,258 4,539 3,91 f

200 50 1

Avcngc dclay - Memory nodc (dock cycles)

113 2,259

3,448 4,175 10,208

113 90.567

97.14

100.492

97.594

1 O0

113 3,294

3,685 7,065 7,590

150

113 126.65 162.342

166.785 184.855

113 2,754

6,380 6,867 8,328

113 171.01

239.657

260.53 2M.326

113 214.851

314.927

344.594

416.503

113 257.34 392.357

462.069 555.537

A.5 Browsing And Two-Way Teleconferencing With Average Paging Load

DMA sizc=5 12 bytcs; Pagc sizc=4K: Pagc fault probability= 10b;Vidco data ntc=200Mbps l

I

1 1 Transfer data rate (Mbps) 1

Systcm typc Data unit sizc (bytcs) Avcrrige dclriy - Mcmory nodc (dock cycles)

1 K 102.519 104.634 112.175 114.152 1 118.255

ATM 5K 102.895 104.151 110.32 113.133 1 I l 5.a09

Bus 5K 1 150.383 1 163.777 1 1 200.362 1 217.02 - - -

IOK 227.053 1 26 1.232 286.072 303.735 322.327

20 K 448.818 [ 452.13 481.202 514.903 525.589

A.5.2 Cell drop rote for the A TM systern

-- -- -- -

DMA sizc=5 12 bytcs; Pagc sizc=4K; Pagc fault probability= l Oa;Vidco data ratc=200Mbps

Transfer data rate (Mbps) 9

Systcm type

ATM

50

Data unit sizc (bytcs)

1 K

5K

1OK

20K

100 1 150 1 200 - --

Cc11 drop ratc (%)

0.14

0.26

0.3

0.3 1

O

0.24

0.28

0.3

0.07

0.24

0.28

0.32

0.03

0.22

0.26

0.32

O. 1

0.24

0.3 1

0.33

A.6 Browsing And Two-Way Teleconferencing With High Paging Load

ATM

..

DMA sizc=5 12 bytes; Pagc sizc=8K; Page fault probability=l~'~;~idco data ratc=200Mbps

I I ! 20K ( 426.914 1 800.329 1 915.889 1 1,021.89 1 1,024.16 11

Systcm type

A. 6.2 Ceil drop rate for the A TM systern


I K

5 K

10K

Bus


1 K

5K

DMA sizc=5 12 bytcs; Pagc sizc=8K; Pagc fault probability= I 0'5;Vidco data ratc=200Mbps

9

Systcm type

ATM

139.933

226.398

50


l K

5K

1 OK

20 K

100 1 150 1 200

Avcragc dclay - Mcmory nodc (clock cycles)

142.69

235.167

448.047

380.989

385.861


144.148

250.97 1

9

863.583

688.128

724.578

15 1.883

274.532

962.9

797.586

815.033

165.977

294.39

200 50

CclI drop ntc (%)

977.285

856.863

875.057 ,

1 O0

5.58

6.19

6.69

7.16

0.12

1.85

2.35

2.64 .

978.809

89û.743

906.684

150

1.64

4.3

5.35

5.64 . . -. . .

3.5

5.19

5.95

6.5

4.68

5.77

6.53

7.2 1 . . .

" W . . .-- L W ' \Lu- 1 I V I Y

TEST TARGET (QA-3)

APPUED I W G E . lnc 1653 East Main Street - -. - - Rochester. NY 14609 USA -- -- - - Phone: 7161482-0300 -- -- - - Fax: 71 6/288-5989

O 1993. Applied Image. Inc.. All Rights Reserved

Documents

Tomasz Solkowski - University of Toronto T-Space...Multimedia Workstation Architecture with ATM lnterconnect Tomasz Solkowski Master of Applied Science. 1997 Department of Electrical