39
Software Engineering of Distributed Systems University of Colorado Boulder ECEN5053

Software Engineering of Distributed Systems University of Colorado Boulder ECEN5053

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Software Engineering of Distributed Systems

University of Colorado

Boulder

ECEN5053

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

2

Course Logistics

Introductions http://ece.colorado.edu/~swengctf http://ece.colorado.edu/~swengctf/distributed Format Calendar Exams -- final exam only Homework -- in teams of 2 to 3 Phone number for late arrival Contact information Text web site: www.cdk3.net -- see key pts.

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

3

Outline for this session

Definition of distributed systems Purposes Demands/challenges Hardware concepts Software concepts An example model

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

4

Definition of a Distributed System

A distributed system is a collection of independent computers that appears to its users as a single coherent system. Andrew Tanenbaum

A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages. Coulouris et al (your text) concurrency of components lack of a global clock independent failures of components

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

5

Alternative definition of a distributed system

“You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done.” Leslie Lamport

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

6

If true, implied characteristics?

Computer heterogeneity & the user Communication paths from user’s perspective User interaction with system from various locations User interaction with applications Scalability Availability Addition or temporary removal of certain

components

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

7

Examples?

internet -- Not quite there -- some internet applications

more so than others Some applications, user must be very aware of

which computer is being accessed and what else?

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

8

Timeline of what had to happen first

1945mainframes

~1985powerful microprocessors

high speed networks

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

9

Necessary Developments

Take an historical view 1945 - 1985

Computers are large & expensive Most organizations had only a few

lacked a way to connect themoperated independently from one another

By mid-80’s ... powerful microprocessors with power of a then-contemporary mainframe

High speed networks! Result: Easy to combine large numbers of

computers via a high-speed network.

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

10

Purposes -- what problems are solved?

Easily connect users to remote resources Share resources with remote users in a controlled

way Hide the fact that the resources are physically

distributed over a network -- transparency Should be an open system

Offers services by standard rules that describe the syntax and semantics of those services

Should be scalable size, geography, and administration

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

11

Purpose 1: Access and sharing remotely

Why share? economics ease of collaboration -- virtual organizations ease of info exchange commerce

Connectivity and sharing lead to security issues Currently, inadequate protection

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

12

Purpose 2: Transparency

Transparency Description -- Hide:

Access differences in data representation & how resource is accessed

Location where a resource is located

Migration that a resource may move locations

Relocation that a resource may be moved while in use

Replication that a resource is replicated

Concurrency that a resource may be shared by competitors

Failure failure and recovery of a resource

Persistence whether a sw resource is in memory or on disk

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

13

Degree of Transparency

Hiding all distribution aspects not always good idea Some times desirable to remain fixed Messages between processes that are thousands

of miles apart will take hundreds of milliseconds Trade-off between high degree of transparency and

performance -- why? The degree of desirable transparency should be

considered in context with other issues such as performance and cost

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

14

Purpose 3: Openness

Offers services according to standard rules describing syntax and semantics of the services.

Rules are formalized in protocols Services generally specified through interfaces

using Interface Definition Language (IDL) specify syntax only

natural language used to describe semantics allows arbitrary process that needs an interface

to talk to another process that provides it proper interfaces are complete and neutral

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

15

Goals of Openness

Interoperability and portability completeness and neutrality are prerequisites

Flexible easy to configure the system out of different

components from different developers easy to add new components without impact easy to replace existing ones without impact i.e. extensible easier said than done

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

16

Purpose 4: Flexibility -- Policy and Mechanism

System must be organized as a collection of relatively small and easily replaceable or adaptable components

Need for change: component does not provide optimal policy for a specific user or app

Example: differing caching policies Need to be able to separate policy & mechanism

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

17

Purpose 5: Scability Challenges -- Size

Size Limitations of centralized services, data, and

algorithms -- become bottleneck Unlimited processing power and storage cannot

overcome communication limitations Decentralization introduces some kinds of

uncertainty

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

18

Purpose 6: Scalability Challenges -- Geography

Existing distributed systems designed for LANs are based on synchronous communication

Communication in WANs is inherently unreliable and almost always point-to-point LANs provide reliable comm based on

broadcasting -- WAN needs special location services

Centralized components prevent geographic scale

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

19

Purpose 7: Scalability Challenges -- Administration

How to scale across multiple independent administrative domains

Conflicting policies usage (payment) management security

protect against malice from the new domains protect against malice from the distributed

system -- e.g. downloaded programs

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

20

Scaling Techniques

Scalability problems appear as performance ones hide communication latencies

avoid waiting for responses as much as possible i.e. construct the requestor to use

asynchronous comm as much as possible reduce overall communication

distribution -- spreading component parts across the system, e.g. DNS (see next slide)

replication across the distributed system increases availability (helps hide latency) helps balance the load between components

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

21

Example: Dividing DNS name space into zones

Generic

int com mil org ...govedu

Countries

Z1

Z2

Z3

colorado

cs ece ...

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

22

Outline

Definition Purposes Demands/challenges Hardware concepts Software concepts An example model

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

23

Hardware Concepts

Introduction to how distributed systems can be organized how they are interconnected how they communicate

Shared

bus-based

Private

bus-based

Shared

switch-based

Private

switch-based

MemoryInterconnection

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

24

Shared Memory & Private Memory Multiprocessors (not multicomputers)

Single physical address space shared by all CPUs CPU A writes 37 to address 1000 CPU B then reads from address 1000 and gets 37 e.g., multiple processors on a board with shared

memory Multicomputers

Every machine has its own private memory CPU A writes 37 to its address 1000 CPU B reads from its address 1000 and gets

whatever happens to be there; not affected by the other write

For example, PCs connected by a network

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

25

Bus-based & Switch-based

Bus architecture of the interconnection network single network, backplane, bus, cable or other

medium that connects all the machines For example, cable television

Switched architecture Individual wires from machine to machine with

many different wiring patterns in use Msgs move along wires with an explicit

switching decision made at each step to route the message along one of the outgoing wires.

e.g., worldwide public telephone system

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

26

Divide & conquer -- select and explain

Performance Impacts bus, shared memory switched, shared memory not quite shared memory homogeneous multicomputers private memory, bus-based network private memory, switch-based network heterogeneous multicomputer systems

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

27

Performance Impacts--bus, shared memory

Bus-based multiprocessor, shared memory Coherent memory Bus contention If cache memory for each CPU has a high hit

rate, bus traffic drops dramatically but introduces serious problem -- what is it? Caching and memory coherence is an issue for

distributed systems Limited scalability

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

28

Performance impacts -- switched, shared memory

1. Divide memory into modules; connect them to CPU’s with a matrix of switches called a crossbar switch Allows multiple CPU’s to access shared memory

simultaneously One still has to wait if both want to access same

module 2. Network of switches to route any input to any

output May be several switching stages in-between Need extremely fast switching to reduce latency=$$

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

29

Performance impacts--not quite shared memory

Reduce cost of switching with hierarchical system SOME memory associated with each CPU (not

shared) Access to own local memory is quick Accessing anybody else’s memory is available

but slower NUMA - “Non Uniform Memory Access”

better average access times than switched nw’s what’s the problem?

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

30

Performance impacts-- homogeneous multicomputers (SANs)

System of individual computers. Therefore... Each CPU has direct connection to its own local

memory Challenges surround communication between the

CPUs Traffic volume will be orders of magnitude lower

than when interconnection network is also used for CPU-to-memory traffic

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

31

Performance impacts - private memory, bus-based network (SANs) Processors connected thru shared multiaccess

network such as Fast Ethernet Limited scalability -- performance degrades with 25-

100 nodes depending on amt of communication

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

32

Performance impacts - private memory, switch-based network (SANs)

Messages are routed through an interconnection network instead of broadcast as in bus-based

Interconnection networks vary Grid -- suitable to 2-dimensional problems Hypercube -- n-dimensional cube

MPPs - massively parallel processors (1000’s) high-performance proprietary interconnection network

designed for low latency, high bandwidth COWs - clusters of workstations

Std wkstns connected by off-the-shelf communication components; no special measures for high bandwidth or reliability --> ??

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

33

Performance impacts - heterogeneous multicomputer systems Most distributed systems are these Computers are heterogeneous w.r.t. processor type,

memory size, I/O bandwidth, etc. Interconnection networks can be heterogeneous, too Many large-scale heterogeneous multicomputers lack a

global system view cannot assume same performance or services are

available everywhere THEREFORE sophisticated software is needed

shield application developers from what is going on at hardware level (transparency)

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

34

Software Concepts

Distributed systems software acts as resource manager(s) for the

underlying hardware Hide intricacies and heterogeneity of

underlying hardware The issues that this software faces are the

core of distributed systems principles we will study this semester

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

35

When is a distributed system not a distributed system?

Distributed operating system: Not intended to handle a collection of independent

computers Network operating system:

Does not provide a view of a single coherent system

“true” distributed system Goal: scalability and openness of network o.s. and

transparency and ease of use of distributed o.s. Additional layer called middleware

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

36

Various middleware models (paradigms)

A particular paradigm is a set of decisions about how to describe distribution and communication Distributed file systems Remote procedure calls Distributed objects Distributed documents

See table

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

37

Sample ParadigmsParadigm Distribution Communication

Distributed

file system

Dist. xparency supp’d for traditional files

Remote

proc calls

Network xparency allows process to call procedure on remote machine

Distributed

objects

meth. invocation: interface implementation on process’ mach. translates invoc into msg sent to remote object; reply msg --> return value

Distributed

documents

Info org’d as docs; each doc somewhere in the world

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

38

Each paradigm must address these issues:

Communication Processes & their synchronization Processes & their interaction Naming Consistency and replication Fault tolerance Security

August 30, 2002 University of Colorado ECEN5053 Software Engineering of Distributed Systems Week 1 Introduction

39

Software Engineering of Distributed Systems

Requirements specification of these issues in distributed systems -- how to recognize, analyze, specify, trace, and manage

Design -- how to choose, represent, and verify Implementation -- tools, language support Testing -- static and dynamic