Chapter 16-2 Distributed System Chapter 16-2 Distributed System StructuresStructures
17.2 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Chapter 16 Distributed System StructuresChapter 16 Distributed System Structures
Chapter 16.1
Background
Motivation
Types of Distributed Operating Systems
Network Structure
Chapter 16.2
Network Topology
Communication Structure
Communication Protocols
Robustness
Design Issues
17.3 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Network Topology
When we speak of topology, we are speaking of physical connections.
Each of the types I will present differ in
installation cost (cost of linking up the sites),
communication costs (amount of time / money it takes to send a message from node A to node B, and
availability, essentially the ability to use the topology in the face of a downed links or sites.
Some topologies have all nodes directly connected to every other node, some have only ‘some’ nodes directly connected and others indirectly connected, some topologies look like trees, stars, and rings.
Each has advantages and disadvantages.
17.4 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Fully & Partially Connected Networks and Trees
Fully Connected: Here, every node is connected to every other node. Adv: no switching or broadcasting is needed. Dis: as the number of nodes increases, cost rises dramatically! Good for small network.
Partially Connected: Adv: Clearly, installation costs are lower since not all nodes are connected
to every node – only some. Dis: for nodes that wish to communicate and are not directly connected,
messages must be routed through communication links, which, of course, raises the cost.
Trees: Adv: installation and communication costs are low but the very nature of a
tree implies that there is only one path to a node. Dis: If this path ‘goes down’ we have the network ‘partitioned.’ Partitioning refers to the situation where the network is broken into two (or
more) subsystems that cannot communicate between themselves.
17.5 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Rings and Star Network Topologies
Rings:
Adv: higher degree of reliability,
Dis: but communication costs are high because a message may need to travel through a number of links before it arrives at its destination.
Adv: Better availability than the tree – not likely to result in a partition…
Adv: At least two links must go down for a partition to occur.
Star:
Failure of any link results in a partition, but a partition may be only a single site.
Adv: low communication costs, because every node is at most two nodes away from the target node, but
Dis: the central site is critical. If it goes down, the entire network is down.
17.6 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Communication Structure
Need to look away from some of the physical aspects of networking to the internal workings of communications.
While this might appear to have become a course in communications, understanding of these topics is absolutely essential to understanding how distributed operating systems work.
So, we will look at common issues that a communications network must address:
Naming and name resolution
Routing Strategies
Packet Strategies
Connection Strategies, and
Contention
17.7 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Naming and Name Resolution - DNS
How do two processes locate each other in order to communicate?
Processes need to be able to reference each other by a name.
So, within a computer system, each process has a process identifier.
Processes on remote systems are identified by a <host name, identifier>
‘host name’ is unique within a network – usually alphanumeric; ‘identifier’ may be a process id or other unique number at host site.
But computers like numbers, so we try to bind names to a host-id that describes the destination system to the networking hardware.
Nowadays we distribute names among systems on the network, and the network must use a protocol to distribute and receive the information.
We call this the domain-name system (DNS).
17.8 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
DNS Naming Resolution DNS specifies naming structure of the host as well as name-to-address resolution
Component separated by periods. Hosts (on the Internet) - logically addressed w/multi-part name:
More specific to more general. We know there are several popular domains: .com, .org, .mil, .gov,… and countries.
Each component has a name server, which is only a process on a system. Name servers accept a name, return address of name server responsible for that name
The location of the name server for domain .edu is known and is issued a request for the address of the name server for csuchico.edu. The domain name server returns the address of the host on which the csuchico.edu
name server resides. This name server is sent a request for the name server of cs.chico.edu. Address retnd. Then a request to this name server for broggio.cs.csuchico.edu returns an Internet
address host-id for that host, such as 137.62.37.20.
In practice using local caches makes this process quick. The .edu name server would have csuchico.edu in its cache and would inform the sending process that it could resolve two parts of the address, then returns a pointer to the cs.csuchico.edu name server.
17.9 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Routing Strategies How are messages sent? If there’s only one path, then the path is clear. But this is often
not the case, and we have many options.
Normally, each site has a routing table, which points to sites ‘along the way’ that may be used in transmitting a message.
These tables are often updated, as sites go down and are changed.
Fixed routing: a specific path is fixed ahead of time;
generally a shortest path is preferred.
Cannot change the path despite potentially better ones.
If path goes down, communication is lost.
Virtual routing: a specific path is established for a session.
Later sessions would likely have a different path.
Since fixed path is determined / maintained at session time, different routes will be selected for different sessions.
This is a more reliable routing mechanism
Dynamic routing: Path determined only when a message is sent.
Message may go from site to site to site usually advancing to site with least traffic.
Path may not be direct, but it’ll get there!
Unix uses fixed routing for simple networks; dynamic for the rest.
17.10 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Routing Strategies – Comparisons Gateways: Sometimes we only need to know how to route to a gateway,
These usually connect a local network to other networks and the Internet.
So, here we might use a fixed route to the gateway server realizing that the gateway will use dynamic routing from there on out.
Router: a host computer with routing software or a special purpose device that has at least two network connections. (A simple PC can serve as a router)
Router has special software that enables it to access routing tables to decide whether a received message on one network needs to be passed to any other network connected to the router.
Router checks its tables to determine location of the destination host or at least of the network to which it will send the message toward the destination host.
(Gateways and routers are normally dedicated hardware devices that run code out of firmware.)
So, without getting into too much detail, what is sent?
17.11 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Packet Strategies
We know messages may clearly be of variable size. But this can make things difficult.
In practice, messages are usually broken into fixed length messages called packets, frames, or, datagrams, and these are transmitted.
What is sent, however, is determined by the connection strategy.
Let’s turn our attention to Connection Strategies…
17.12 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Connection Strategies
Processes that need to communicate via communication sessions using a number of mechanisms, the most popular of which are:
Circuit Switching
Message Switching, and
Packet Switching
These are quite different and each may be appropriate under specific circumstances.
17.13 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Circuit, Message, Packet Switching
Circuit switching is a fixed physical link – as in our telephone system. The connection exists for the length of the session, and no other process can use
this particular connection. Circuit switching requires a lot of setup and may incur a real waste of bandwidth
during potentially idle periods. But it incurs less overhead per message sent. Message switching is a communications pathway is temporary and only exists for the
time it takes to transfer the message. Physical links are dynamically established for the short transfer. The message itself consists of a number of parts including source and destination
addresses, error-correction codes, start and end of text indicators, and a number of other items used for management and control of the transmission.
Requires little set up time but incurs more overhead per message in establishing a communications pathway.
Packet switching is used to facilitate the transmission of message packets likely over different routes. The paths are dynamically determined (previously discussed), and the packets are reassembled at the destination address. Lots of overhead here due to breaking up of messages, incorporation of
management and control items in the message (necessary to ensure packets are delivered and reassembled properly), and, of course, the cost of reassembly.
Packet Switching makes the best use of available bandwidth and is the most commonly used (save the telephone) switching strategy for data transmission.
17.14 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Contention
Realities of transmission imply that there is the real possibility that more than one source may transmit on common links at the same time.
This occurs often in a ring or multi-access bus topology.
Clearly, the transmissions can be garbled and sites must retransmit.
This problem must be addressed to avoid significantly degraded service.
Two techniques are in wide:
CSMA/CD – Carrier Sense Multiple Access w Collision Detection, and
Token Passing
Both are used in different topologies and with good success.
17.15 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Contention using CSMA/CD
In this approach, a node wishing to transmit ‘listens’ on the line for traffic.
If link is free, node (site) will start transmitting; otherwise it waits and listens before trying again.
But if two sites – detecting no network traffic – start to transmit at the same time, we have a collision.
If so, both sites must stop transmitting. Wait a while, then try again.
But if the system is loaded, performance may be seriously degraded!
This has become a successful contention strategy for quite some time.
Big advantage: we can add more hosts (nodes) to a network – as long as collisions don’t become too frequent.
All in all, network traffic makes this approach a standard one, and it is in widespread use.
17.16 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Contention using Token Passing
This approach is used a lot in a ring topology, where a token (small packet of information) circulates within the ring from node to node to node...
A site that wishes to transmit must wait for the token, remove it from the ring, transmit the data, and then retransmit the token to continue its journey around the ring.
Token Lost. If somehow a token gets lost (like a site goes down when it has the token) this network topology must first detect a loss and generate a new token. Election. It does this by declaring an election that results in a site that will
generate and propagate a new token. Advantages of token ring:
Ethernet (multi-bus architectures) can experience serious performance degradation if too many nodes are busy and transmitting.
In the ring approach, adding new sites may increase wait time, but it will normally not result in any serious performance losses
For networks with limited or modest transmission requirements, Ethernet is more efficient, because sites can transmit at any time.
17.17 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
===== Ethernet =====
Ethernet is a family of frame-based computer networking technologies for local area networks (LANs).
It defines a number of wiring and signaling standards for the Physical Layer of the OSI networking model, through means of network access at the Media Access Control (MAC)/Data Link Layer, and a common addressing format.
Ethernet is standardized as IEEE 802.3.
The combination of the twisted pair versions of Ethernet for connecting end systems to the network, along with the fiber optic versions for site backbones, is the most widespread wired LAN technology.
It has been in use from around 1980 to the present, largely replacing competing LAN standards such as token ring, FDDI, and ARCNET.
17.18 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Communication Protocols There are so very many activities involved in a communications network
where so much communication is asynchronous.
Systems on a network agree on a set of protocols such that
each layer of the protocols or set of protocols has specific responsibilities in overall communication and
each layer of the protocol at the sending site communications with its corresponding layer at the receiving site.
Again, each layer has things it expects to ‘get’ and things it will provide.
i.e., it looks for all kinds of communication parameters appropriate at that level of the protocol stack, so to speak.
Implementation:
Some of the layers are implemented in hardware (lower three levels)
Mid and upper layers are implemented in software.
Let’s look at the ISO network model – it is formal and provides a great framework for discussion – although the TCP/IP protocol has now largely replaced the ISO model…
17.19 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Layers in Hardware ISO Model
Three lowest levels of the protocol accommodated in Hardware
Physical Layer – here we are concerned with agreement on electrical representations of bit stream signals consisting of 1s and 0s, and how the sites are able to interpret these.
All 1s and 0s are not ‘created equal.’ We have different code sets, not just the most familiar ones!
Data Link Layer – data link control is responsible for
handling packets and also for providing for
error detection and recovery that might have occurred in the physical layer.
Network Layer – This layer is responsible for
routing packages within the communications network, decoding addresses of incoming packets, and
maintaining routing tables, etc.
Routers play the key role here.
17.20 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Layers – in Software Software Layers
Transport Layer – This layer is responsible for the
transfer of messages between clients,
partitioning the messages into packets,
maintaining packet order, controlling flow, and
generating physical addresses.
Session Layer –
Responsible for implementing sessions or
process to process communication protocols, such as communications via remote log ins and for file and mail.
Presentation Layer – Here we are looking for responsibilities for
resolving format differences between various sites in the network, such as necessary for character conversions, full and half duplex lines, etc.
Application Layer – Responsible for
interacting direction with the user to accommodate file transfer, remote log-in protocols, email, etc.
17.21 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
ISO Protocol Stack All these constitute what we refer to as the ISO Protocol Stack These represent a set of cooperating protocols such that each layer in the protocol
stack communicates with its peer at the other end. Each layer may modify the message – adding attributes, computing items, adding
headers or trailers or other ‘indicators’, etc. At the end of a transmission, the data reaches the data link control and moves
up through the protocol stack, where everything is acted upon by its respective layer - ultimately presented to the user at application level.
TCP/IP Protocol Stack – the most widely adopted set of protocols. The TCP, as its name implies, is the transport protocol. Almost all Internet sites now use this. Has fewer layers because it combines some activities of the ISO layers. More difficult to implement but more efficient than the ISO model Model identifies a number of protocols at the application area widely used today,
including http, ftp, telnet, DNS, SMTP, and more. The transport layer identifies both the
unreliable, connectionless user control protocol (UDP) and the reliable, connection-oriented transmission protocol, TCP. See figure 16.9, p. 632 in your textbook. This corresponds to the Transport layer
in the ISO model. The IP (Internet Protocol) is responsible for routing protocol through the Internet.
This corresponds to the Network layer in the ISO model. In the TCP/IP protocol model, we do not formally identify a link or physical layer as we
have in the ISO model; this model allows traffic to run across any physical network.
17.22 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Robustness
We can have all kinds of hardware failure.
Such networks must be able to
Detect a failure
Reconfigure the network to proceed (reconfigure), and
Recover from the failure
17.23 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Failures – Detection, Reconfiguration, Recovery. Detecting the Failure
This is the easy part. Detecting an error is easy; cause is oftentimes difficult. With no shared memory, however, we are not able to determine easily whether the
failure is due to the link, a site, or a message loss. We simply get a failure and it is difficult to ascertain the exact cause. We typically use handshaking to detect link and site failures.
Sites send ‘I am up’ messages periodically. If no message is received, it may be:
– Link has failed or Site is down. Host can resend and wait, but can only declare a failure.
Now, the site might try sending a message over a different route, if available. If after ‘some time’ the message is received and a response is received, then it is
possible that the desired link is down. If this approach does not result in reception at a target node using a different
route, the sending site can only assert:– Receiving site is down– Direct link (if available) from sender to receiver is down– Alternate path from sender to receiver is down, or– Message has been lost.
It is very difficult for the sending site to clearly determine the cause of the failure.
17.24 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Failures – Detection, Reconfiguration, Recovery.
Reconfiguring: site down or link down… Procedures must be invoked to inform the network to reconfigure and take a
node out or a link out so that normal operations may ensure. Data Link Layer is responsible for detection in most cases.
– Recall: responsible for handling packets and error recovery - normally Direct Link Down – actions required.:
If this is the case, the fact that this link is down must be broadcast. Routing tables will need to be updated at the Network Layer. This takes time for nodes on the net to modify their routing tables
so that when packets are received form the Transport Layer, it can route packets correctly..
Site Down: Here again, all sites in the system must be informed so that they will
not try to use the services available at that downed site.. If the downed site had special functions on the net such as central
coordination (book) for, say, deadlock detection, etc., then another site must be elected to become the new coordinator of this activity.
If failed node is in a token ring topology, then we must build a new ring and the failed node taken logically out of the ring network...
17.25 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Failures – Detection, Reconfiguration, Recovery.
Recovery from Failure – link and site After repair, the link or site must be integrated into the network
seemlessly, if possible.
IF the link was down but now repaired, and if the link was only between just two sites, repeated handshaking can accommodate notification. This is the rather simple case.
IF a site has failed and we have now recovered: All other sites must receive this information to update routing
tables, Must now ‘know’ that facilities at ‘that site’ are now
available, and perhaps ‘press on’ with undelivered messages, and more.
17.26 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues
There are several very key issues here.
I will divide these into these key issues
Transparency
Fault Tolerance
Scalability
17.27 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues – Transparency
In a distributed system, a site or link must be totally transparent to a user.
Making this so has challenged the brightest designers
A distributed system should appear to be a centralized system. Should not be able to distinguish between local and remote services.
Another element of transparency is user mobility. Here, conceivably, a user could log onto any system and obtain his/her
environment wherever logged on.
These would be nice – but very difficult to obtain.
17.28 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues – Fault Tolerance
Here, we must be mindful that sites can go down, links can go down, but the network must be maintained.
The question is simply how many faults and what kind of faults can the network tolerate and still provide services??
Continued performance should be proportional, of course, to the magnitude of the faults. If few faults grind the system to a halt, this system is not very fault tolerant.
Most commercial systems have only limited fault tolerance, Many scientific systems have much more tolerance, such as redundant units
(with voters) in space probes. Explain…
Fault tolerance can address processor issues, storage issues, link issues, and so very many factors.
Unfortunately, fault tolerant computing is difficult to accomplish.
17.29 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues – Scalability
A bigee!!
Network resources and their concomitant communications can become saturated with very heavy workloads.
Systems have bounded resources.
They are finite state machines with finite resources.
A scalable system should react gracefully to increased load and should degrade more gracefully.
A scalable system should have its resources react more modestly in the face of difficulties than a system not scalable.
But there are bounds. Adding more and more sites to a network can bog down the best of systems.
17.30 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues – Scalability - more
Scability is It is not simply a matter of shifting a load from one component to another that might be serving as backup.
In truth, a distributed system must have spares for ensuring reliability and for handling peak loads gracefully.
Scalable systems should have the potential for fault tolerance and scalability. But a poor design can kill this potential.
Very large distributed systems are, for the most part, theoretical. We look at and talk about scalabiity as a forerunner for large scale distributed
systems.
17.31 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues – Scalability – still more
Principle of Central Control: Issues of central control and central resources should not be used to build scalable systems.
Examples of centralize control include centralized authentication and servers, central file services, etc. brings us to centralization – and we don’t want this.
We want a functionally symmetric configuration were
all components have an equal role in operation of the system and
each machine has some degree of autonomy.
But this is darn near impossible to obtain with this principle.
There are many examples where we simply cannot have this symmetric configuration.
17.32 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Design Issues – Scalability – Clustering
One approach to symmetric and autonomy is clustering. Here the system is partitioned into collections of semi-autonomous clusters.
A cluster consists of a set of machines and a dedicated cluster server. We want cross-cluster references to be very infrequent, and so each cluster
should satisfy all requests of units in that cluster most of the time. Clearly, this requires careful design and selection of machines with appropriate
resources to meet these needs. If the centralized cluster can do this, it may be used as a modular building block
to scale up the system.
This is easier said than done, for servers must operate efficiently during peak load and provide for all clients simultaneously.
Thus, a single-process server is not a good choice, since a disk request would block the entire service.
Assigning a process for each client is better, but frequent context switches can be a negative factor.
Then too, all server processes often need to share information.
17.33 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
End Chapter 16.2End Chapter 16.2