Introduction & Background Lakshmish Ramaswamy. Why Distributed Systems? A collection of independent computers that appears to its users as a single coherent

Introduction & Background

Lakshmish Ramaswamy

Why Distributed Systems?• A collection of independent computers that

appears to its users as a single coherent system

• Reasons for distribution– Distributed (and mobile) users– Distributed data/information– Distributed organizations– Distributed resources

• Enabling technology – Communications and networking

Distributed System Organization

A distributed system organized as middleware.Note that the middleware layer extends over multiple machines.

1.1

Design Goals • Enable controlled resource sharing

• Transparency

• Openness

• Scalability

• Performance

• Failure resilience

• Security & privacy

Examples of Distributed Systems

• World Wide Web– Information disseminations– E-commerce

• Distributed file systems

• Distributed databases

• Web-farms

• P2P file sharing systems

• Ad-hoc networks

• Sensor networks

Middleware• Layer on top of Network OS services• Hide heterogeneity • Doesn’t manage individual nodes• Provides complete set of services

Client Server Model

• Earliest model– Simple– Still applicable in many scenarios

• Server– Implements specific service

• Client– Requests service

• Models of communication– Connectionless– Connection-oriented

Clients and Servers

General interaction between a client and a server.

1.25

Multitiered Architectures (2)

An example of a server acting as a client.

1-30

Modern Architectures

An example of horizontal distribution of a Web service.

1-31

• Vertical Distribution: Different components on different machines

• Horizontal Distribution: Each part operates on its own complete

• Hybrid: Incorporates features of both vertical and horizontal

Peer-to-Peer Architectures• No distinction between client and server

– Nodes can act both as client and server

• Promotes interaction within social groups

• Provides better scalability

• File sharing has been the dominant application– Napster, Gnutella, Kazaa

• Other applications are still in nascent stages

• Decentralized protocols

Network Protocols

Layers, interfaces, and protocols in the OSI model.

2-1

Functionalities of Layers

• Physical: Standardizes signaling interfaces• Data link: Organizes bits to form frames, detects and

corrects transmission errors• Network layer: Routing (Internet protocol [IP])• Transport layer: Reliability (retransmission, ordering of

packets)• Session layer: Dialog control and synchronization• Presentation layer: Formats of messages and records• Application layer: Specific to applications (HTTP, FTP)

Types of Communication

• Persistence – Persistent communication – Stores message until

communicated to user– Transient communication – Stored only when sending

and receiving processes are alive• Transport level protocols provide transient communication

• Synchronicity– Asynchronous – Sender continues after sending message– Synchronous – Sender blocks until message is stored at

receiver's local buffer, delivered to receiver or processed by receiver

Message Oriented Transient Communication -Berkeley Sockets

Communication pattern using TCP/IP sockets

• Interface for transport layer

• A communications end point

Processes & Threads

• Virtual processors– Created by OS to execute a program

• Process is a program in execution– Executed on one of the virtual processors

• Operating systems ensure that processes are independent and transparent– Resource sharing is transparent

• Creating processes is costly

• Switching processes is costly too

Threads• Similar to a process

– Perceived as execution of (a part of) program– Information maintained for sharing CPU is minimal

• Context of threads is captured by CPU context– May be a little more information is needed for

management (like locks)

• Very little overheads– Thread switching is easy

• Can provide performance gains

Names & Naming System• Required for identifying entities, locating them,

communicating to them• Name can be resolved to the entity it refers to• Name is a string of bits used to refer to an entity• Entity can resources/users/data/processes• Access Point – Host of another entity

– Name of access point is its address

• Naming system resolves names• Naming system in distributed systems can itself

be distributed

Name Spaces

A general naming graph with a single root node.

• Organization of names usually as a directed graph • Leaf Node – Represents named entity• Directory node – Enlists other names

Name Space Distribution

An example partitioning of the DNS name space, including Internet-accessible files, into three layers.

Importance of Clocks & Synchronization

• Avoiding simultaneous access of resources

• Process may need to agree upon ordering of events

• Synchronization & ordering is difficult in distributed setting

• Notion of time is tricky in distributed setting– How to deal with clock drifts?

• Logical clocks– Agreement with regards to ordering of events suffices

• Happens-before relation

Mutual Exclusion

• Ensuring consistency of data sometimes needs exclusive access to data

• Critical regions for mutual exclusion

• When a process wants to read/update shared data structures it first enters a critical region

• Only one process allowed to be in the critical region

• Coordinator-based centralized algorithm

• Ricart and Agrawala’s algorithm

• Token ring algorithm

Transactions• Protects data and allows processes to access and modify

multiple data items as a single atomic transaction– If process backs out halfway, everything is restored back

• Originated in business world– Parties free to negotiate and back-off during negotiation

– No backing-off after the contract is signed

• Initiator process announces the beginning of a transaction• Processes create, update, and delete entries• Initiator announces that it wants others to “commit”

– Transaction made permanent if everyone agrees

– Otherwise transaction is aborted and all entries are restored back

Transaction Primitives

Examples of primitives for transactions.

Primitive Description

BEGIN_TRANSACTION Make the start of a transaction

END_TRANSACTION Terminate the transaction and try to commit

ABORT_TRANSACTION Kill the transaction and restore the old values

READ Read data from a file, a table, or otherwise

WRITE Write data to a file, a table, or otherwise

ACID Properties of Transactions• Atomic – Happens indivisibly to the outside world

• Consistent – Does not violate system constraints

• Isolated – Concurrent transactions do not interfere with each other

• Durable – Changes are permanent when a transaction commits

How to Implement Transactions?• Private workspace

– When a process starts a transaction, it gets a private workspace of all files it needs to use

– Operations only on private workspace– Private workspace is written back (ignored) on commit

(abort)– Efficiency problems – copying everything is costly.

Distributed Transactions• Distributed transaction is a transaction where in data

is distributed• 2 Phase commit protocol• Commit request phase

– Coordinator sends query to commit message to all nodes– Nodes place an entry into their undo and redo logs– Nodes send agreement/abort messages

• Commit phase– Coordinator places an entry into log– Sends commit/abort messages to all nodes– Nodes send acknowledgements

Concurrency Control• Concurrent transactions are isolated

– Final result should be the same as if the transactions were executed one after another in some order

• Synchronization classification– Locking– Timestamps

• Two phase locking – Growing & shrinking phases– Transaction acquires all locks before releasing any of

them

• Distributed 2PL– Coordinator manages all lock operations

Replication• Two primary reasons

– Improving reliability of system– Improving scalability and performance of system

• Reliability– Resilience to failures– Protection against data corruption: Byzantine failures and

quorum-based systems

• Scalability– Scaling in numbers– Geographical scaling

Problems of Replication

• Creating and maintaining replicas is not free

• Multiple copies leads to consistency problems– What happens when one of the replicas gets modified?– Modifications have to be carried out at all replicas– How and when determines the cost of replication

• WWW-based systems– Browser and client side caches– May lead to stale pages– TTL model, Update/Invalidate model

Consistency Models

• Strict

• Sequential

• Linearizable

• Causal

• Fifo

• Weak

• Release

• Entry

Fault Tolerance & Dependability• Availability

– Ready to be used IMMEDIATELY

• Reliability– Run continuously without FAILURE

• Safety– When fails, nothing catastrophic happens

• Maintainability– How easy a failed system can be repaired

• Failures can be malicious or non-malicious

Failure Masking

• Hiding failures from other processes

• Fault tolerance by redundancy

• Information redundancy – Error correcting codes

• Temporal redundancy – Transactions

• Physical redundancy – Multiple disks

Documents

Introduction & Background Lakshmish Ramaswamy. Why Distributed Systems? A collection of independent computers that appears to its users as a single coherent