1 MMCA 404E04 DISTRIBUTED COMPUTING By Dr. S. SUBASREE Associate Professor SASTRA University

1

MMCA 404E04

DISTRIBUTED COMPUTING

By

Dr. S. SUBASREEAssociate Professor

SASTRA University

2

UNIT 2 : Communications in Distributed SystemRemote procedure call – Interprocess communications - Introduction to RMI – Process management – Threads – Process allocation algorithms – Group communications – Fault tolerance – Real time distributed systems..

UNIT 2 : Communications in Distributed System

Distributed Computing

3

4

Remote Procedure Call

Even though the client/server model is convenient but it suffers in the problem of I/O .

The process has to perform send and receive data, but at the same time , the CPU will be kept idle.

To overcome with this problem a new technique has been introduced that is called Remote Procedure

Call. The nutshell of RPC is to allowing programs to call

procedures located on the other machines.

5


When a process on machine A calls a procedure on machine B, the calling process on A is suspended, and execution of the called procedure take place on B.

Information can be transported from caller to callee and process the information and come back with the result.

No message passing or I/O is visible to the user. This method is called Remote Procedure call or RPC.

6

Basic Operations of RPC

For eg., count = read(fd,buf,nbytes)

where fd is an integer, buf is an array of characters, nbyte is another integer.

If the call is made from main program, then the caller pushes the parameters onto the stack in order,(i.e LIFO)

After the read has finishes running, it puts the return value in a register removes the return address and transfers control back to the caller,

The caller then removes the parameters from the stack, returning it to the original state,

7

Basic Operations of RPC

8

Remote Procedure Call• Problem in Client-Server Model Communication is based on I/O

• That has been overcome by using RPC

Kernel

Client machine

Client

Packparameters

Unpackresult

Server machine

Unpackparameters

Packresult

server

Call

Return

Call

Return

Message transport over the Network

Clientstub

Serverstub

Kernel

9


10

11

Parameter Passing

When a client wanted to process some with server it has to pack the parameter first.

It has to be done by Client stub This process is not that much simple, it has to collect the

relevant parameters and then it has to pack. This process is called parameter marshaling.

12


13

Remote Procedure Call• Idea behind RPC

• Transparency by client/server stub• Can’t achieve 100 % (Same type no problem)

• Parameter passing• Various machine type in Client & Server

• For example, IBM MainFrame and IBM PB uses EBCDIC and ASCII

• Little Endian & Big Endian problem• Intel 486 number their bytes from right to left ( Little)• Sun SPARC uses left to right(Big)

• Standard representation inefficient• Static Binding is a Serious Problem• Need Dynamic Binding

• Powerful Mechanisms required to achieve these things• Need to maintain Critical Path

14

– The main purpose of RPC is to hide the communication by making remote procedure call look like local ones.

– The problem comes when error occurs. It is then that the differences between local and remote calls are not always easy to mask.

– It has 5 different classes of failures that can occur in RPC. They are

• The client unable to locate the server• The request message from the client to the server is lost• The reply message from the server to client is lost• The server crashes after receiving the request• The client crashes after sending the request

RPC Semantics in the Presence of Failures

15

• Client cannot Locate the Server• It will return -1, which is the problem• New error type or New signal type need

• Lost Request Messages• If the Timer expires before a Reply • If no rely during timer, retransmit or• Can conclude Can’t locate Server

• Lost Reply Messages• It is more difficult to deal with• The Client can sure why there was no answer

• Req or Rep may lost• Server may slow• Server may be crashed

• In no idempotent request, use sequence #


16

• Server clashes• The first diagram shows the request arrive and execute and send a

reply

• The second request arrive and execute and before sending reply the server crashed

• The third one the request arrive and before that the server crashed

• Crash After Execution or Before Execution

• At least once semantics wait until server reboots

• At most once semantics give up immediately

• No guarantee


17

• Server crashed


18

• Client crashes

• Orphan after computation no parent is waiting the result

• The problem with orphans are,

– Waste of CPU cycles

– lock files

– When client reboot and RPC again, but reply from orphan comes back, create confusion.

• Different Solutions• Extermination use log -- orphan has to be killed/ drawback is

keeping the log waste the memory, if grand orphans are created it will be a problem.

• Reincarnation use numbered epoch -- broadcasting message of reboot and it create a new epoch. Then all the remote computation has to be killed.

• Gentle reincarnationonly kill computation without owner• Expiration each RPC is given standard amount of time


19

Remote Object Invocation• RPC unable to provide expected Transparency in Distributed

Communication Systems • Hence, RMI introduced• The key feature of this mechanism is object can be easily replaced

without changing the interface• Hence, the user can configure the methods and objects

• RMI, CORBA and DCOM• Encapsulating Data called state and the operations on those data,

called methods• Methods are made available through interface• Either Transient Objects or Persistent Objects can be used• For Parameter Passing

• Static Invocation vs Dynamic Invocation• Language Invocation (compile-time) vs method Invocation (Run-

Time)

20

Distributed Objects

2-16

like client stublike server stub

21

Binding a Client to an ObjectDistr_object* obj_ref; //Declare a systemwide object reference

obj_ref = …; // Initialize the reference to a distributed objectobj_ref-> do_something(); // Implicitly bind and invoke a method

Implicit binding

Distr_object objPref; //Declare a systemwide object referenceLocal_object* obj_ptr; //Declare a pointer to local objectsobj_ref = …; //Initialize the reference to a distributed objectobj_ptr = bind(obj_ref); //Explicitly bind and obtain a pointer to the local proxyobj_ptr -> do_something(); //Invoke a method on the local proxy

Explicit binding

22

Parameter Passing

2-18

Interprocess Communication

• Processes are concurrent if they exist at the same time

• Some processes are designed to operate concurrently and co-operatively

• Independent vs Cooperating processes

• Why Let processes cooperate?

• Information sharing

• Computation speedup

• Convenience

• Two fundamental models

• Message Passing

• Shared Memory

Resources

• Resources may be:

• Hardware (CPU, I/O, memory, devices)

• Data Structures (variables, queues, buffers)

• Are they re-usable?

• Re-usable: not destroyed by being used

• Consumable: eg message or signal

• Serially reusable: can only be used by one process at a time

25

Middleware layers

Applications, services

Middlewarelayers

request-reply protocol

marshalling and external data representation

UDP and TCP

Thischapter

RMI and RPC

26

Sockets and ports

message

agreed portany port socketsocket

Internet address = 138.37.88.249Internet address = 138.37.94.248

other ports

client server

27

Basic of Java RMI

• Java RMI is the Language Specific• It has the ability to provide more advanced features like

Serialization & Security• It has network-based registry • The rmi compiler called rmic, which is the part of the jdk

28

29

Process Management

Threads

30

Process Management

This chapter deals with multiple process.

It is possible to have multiple threads of control within a process.

It has multiple threads of control sharing one address space but running in

quasi parallel.

For eg, a file server that occasionally has to block waiting for the disk.

If the server has multiple threads of control, a second thread could run while the first one was sleeping.

The net result will be higher throughput and better performance

31

Process Management

This cannot be done with creating two independent server processes because they must share a common buffer cache, which requires them to be in the same address space.

This is not available in single processor operating systems.

32

Threads

In the above diagram, the first figure has three processes. Each process has its own program counter, its stack , its own register set, and its own address space.

The processes have nothing to do with each other, except that they may be able to communicate through the system’s interprocess communication

primitives , such as semaphores, monitors, or messages.

Figure two, is another single process. Only this process contains three threads or lightweight processes.

Here, each threads runs sequentially and each thread has got its own program counter and stack keep track of where it is.

Thread share the CPU. First one thread runs , other are waiting like timesharing.

33

Threads

In multiprocessor they actually runs parallel.

Threads can create child threads and can block waiting for system calls to complete,just like regular processes.

Different threads in process are not quite independent as different processes.

Different threads has to share the common address space i.e global variable.

The problem here is one thread can read, or another one write or another just wipe out the address and use it. So it is difficult operation has to be maintained.

If multiple thread used in single process then all the threads has to share the address space, share same set of open files, child processes, timers, and signals etc.

34

Threads

Multiple process can be used if the jobs are unrelated.

If the jobs are related one and we want to process the jobs then we have to use different threads and it will closely cooperate with each other.

35

States of Threads

The several states of Threads are

Running : It is currently has the CPU and is active.

Blocked : It is waiting for another thread to unblock it (semaphore)

Ready : It is scheduled to run, and will as soon as turn comes up.

Terminate : It is the state where thread exited, but it is not collected by parents.

36

Thread models and its usage

It has three models, they are

Dispatcher/ worker model

Here, the dispatcher reads the work from the system mailbox.

After collecting the request, it will chooses the idle worker thread and hand it the request.

The dispatcher wake up the sleeping thread worker.

If it wakes up, it check to see if the request can be satisfied from the share block cache, to which all threads have access. If not it send a message to the disk to get the needed block and goes to sleep awaiting completion of disk operation.

37


Now the scheduler invoked and another thread will be started, in order to acquire more work, or possibly another worker that is now ready to run.

38


The second model is,

Team model

Here all the threads are equal, and each gets and processes its own request. There is no dispatcher.

Sometimes work comes in that a thread cannot handle, especially if each thread is specialized to handle particular kind of work.

In this case, a job queue can be maintained, with pending work kept in the job queue.

With this organisation, a thread should check the job queue before looking in the system mailbox.

39


It team models,

Team model

40


The third model is

Pipeline model

41


Pipeline model

In this model the first thread generate some data and passes them on to the next thread for processing

The data continue from thread to thread with processing going on at each step

Although it is not appropriate for file servers, for other problem like producer and consumer it may be a good choice.

For eg, if a client want a file to be replicated on multiple servers, it can have one thread talk to each server. Another use of client threads is to handle signals, such as interrupts from the keyboard(DEL or BREAK).

One threads can be assigned waiting for signals.

42

Design issues for Thread packages

A set of library calls available to the user relating to thread is called thread packages.

The first issue is static and dynamic thread

With a static thread the choice of how many threads there will be is made when the program is written or when it is compiled.

Each thread is allocated fixed stack. This is simple but inflexible.

A more general approach is, a process starts out with one thread, but can create one or more threads as needed, and these can exit when finished.

There are two way in which thread can be terminated. One is thread can exit voluntarily when it finishes its job, or it can be killed from outside.

Since threads are using shared memory.

43

Design issues for Thread packages Access to shared memory can be controlled by using critical regions.

Accessing critical regions can be implemented by semaphores,monitors etc.

In thread packages uses mutex, it is a kind of watered-down semaphore.

A mutex will be either locked or unlocked state.

A another feature is condition variables in thread.

Mutex is short term and condition variable for long-term waiting until a resource became available.

The problem in mutex is when thread lock a mutex and gain entry to a CR. Inside the CR, it examine the tables and discovers that some resource it needs is busy.

What happen it lock the second mutex, the outer mutex is also locked and the thread holding the resources also not able enter in to the CR. It create deadlock.

44

System model

Processes runs on processor.

If there is only one processor, and there is no problem with the processes that runs on.

But in distributed system, there are many processors and also many processes.

So, we need special model to perform the processing.

There are three different models that has been followed in distributed processing,

The workstation model

The processor pool model

The Hybrid model

45

System model


It is a straightforward model, the system consists of workstations scattered throughout a building or campus and connected by high-speed LAN.

Some of them may be in offices, dedicated to single user, other may be in public areas have different users during the course of the day.

Some systems have local disk or other they do not.

The later is called diskless workstations and other one is called diskful workstations.

If the workstations are diskless, the file system may be implemented by one or more remote file servers.

It can be accessed through read and write operations by means of request and reply command.

46

System model


The advantage of diskless workstations are, speed of files servers, Update in file servers reflected in all machines, instead of loading in all machines, Backup and hardware maintenance is simple,

47

System model


It is a straightforward model, the system consists of workstations scattered throughout a building or campus and connected by high-speed LAN.

Some of them may be in offices, dedicated to single user, other may be in public areas have different users during the course of the day.

Some systems have local disk or other they do not.

The later is called diskless workstations and other one is called diskful workstations.

If the workstations are diskless, the file system may be implemented by one or more remote file servers.

It can be accessed through read and write operations by means of request and reply command.

48

System model


The advantages of diskless workstations are,

Having large no.of workstations equipped with small, slow disks is typically much more expensive than having one or two file servers.

Maintaining diskless workstations are very easy.

Diskless workstation provide symmetry and flexibility.

49

System model


When workstations have private disks, these disk can be used in these ways,

Paging and temporary files.

Paging, temporary files and system binaries

Paging, temporary files, system binaries and file caching.

Complete local file system.

50

System model

Using Idle workstations

When workstations are identified as idle or underutilized then it will be a problem. Identifying idle machine itself is a problem. There are so many research is going on to identify the idle machine.

How is an idle workstation found?

How can a remote process be run transparently?

What happens if the machine’s owner comes back?

51

System model

How is an idle workstation found? Here we are using a registry based algorithm for

finding and using idle workstations.

52

System model

How can a remote process be run transparently?

If the home process is running on remote machine and it wanted to perform the READ operation and the file is available on the remote file server it will be a problem. Because the file has to be sent to remote machine or home system.

All the system calls that query the state of the machine have to be done on the machine on which the process is running. These includes machine’s name. Network address, asking how much free memory it has, and so on.

System calls involving time is another problem because the clock on different machines may not be synchronised.

Making programs run on remote machines as though they were running on their home machines is possible, but it is complex and tricky business.

53

System model

What happens if the machine’s owner comes back?

If other people can run programs on your workstations at the same time that you are trying to use it, there goes your guaranteed response.

Another solution is kill of the intruding process.

Because of killing all the work will be lost.

Instead, we can give warning. By using signal we can send alarm, and shut down gracefully.

If it is not exited within a few seconds, then it should be terminated.

But, we need to write program to expect and handle this signal.

Another solution migrate the process to either home machine or another idle machine. But it is a very difficult process.

54

System model

The Processor Pool Model

Identifying Idle workstations and allocating jobs are not suitable for 10 or 100 times as many CPUs are needed for processing active users?

The solution is give everyone a personal multiprocessor. But however this is somewhat inefficient design.

The alternate approach is Processor pool model

In this model we have a rack full CPU in the machine room, which can be dynamically allocated to the user on demand.

Here the users get high performance graphics terminals, such as X terminals.

55

System model


56

System model


Here the file system is centralized in a small number of file servers to gain economies of scale, it should be possible to do the same thing for compute servers.

The advantage of putting the CPUs in racks are,

Power supply and packaging cost can be reduced Giving more computing power for given amount of money Use of cheap X terminals (diskless workstations) This model has easy incremental growth. For example, if computing power

increased 10% and then add 10% percent of Processors with the existing processor pool.

Depends on the request of the user the CPUs can be assigned for processing the information. Once the processing is completed the CPUs can be put into the rack and it can be used by some other users. No ownership is there. Everyone is having equal rights.

57

System model


The queuing theory can be used for processing information with centralized system.

A queuing system is a situation in which users generate random requests for work from a server.

When server is busy, the users queue for service and are processed in turn. The basic model as follows:

58

System model


The total input rate λ requests per second and the rate at which server process the request is μ .

For stable operation μ > λ. The mean time between issuing a request and getting a complete response, T, is

related to λ and μ by the formula T = 1/ μ – λ.

59

System model

A Hybrid Model

A possible compromise is to provide each user with a personal workstation and to have a processor pool in addition.

Although it is more expensive than either a pure workstation model or a pure processor pool model. It combine the advantages of both of the others.

Interactive work can be done in workstations, giving guaranteed response.

Lengthy computations or noninteractive processes run on the processor pool, that is heavy computing in general.

This model provides fast interactive response, an efficient use of resources and a simple design.

60

PROCESSOR ALLOCATION

61

Processor Allocation


The distributed system consists of multiple processors.

It may be organised as a collection of workstations, a public processor pool

What ever may be the model, we need some algorithm is needed for deciding which process should be run on which machine?

For workstation model, when to run process locally, when to look for an idle workstation.

For the processor pool model, a decision must be made for every new process.

62


Allocation Models

The processor allocation can be classified into two broad categories.

Nonmigratory model

When a process is created, a decision is made about where to put it. Once placed on a machine, the process stays there until it terminates. It may not move, no matter how badly overloaded its machines becomes and no matter how

many other machines are idle. Migratory model

Here, a process can be moved even if it has already started execution. This allow better load balancing, and they are more complex and have a major

impact on system design.CPU UtilizationMean Response time

63

64


Design Issues for Processor Allocation Algorithms

There are five different issues are possible

Deterministic Versus Heuristic algorithms

Centralized Vs Distributed algorithms

Optimal Vs Suboptimal algorithms

Local Vs global algorithms

Sender initiated Vs Receiver initiated algorithms

65


Deterministic Versus Heuristic algorithms Deterministic algorithm is appropriate when everything about the process behaviour is

known in advance.

If suppose, we know the complete list of all processes, their computing requirements, their file requirements, their communication requirements and so on.

It is enough to make a perfect assignments.

This type of algorithm is best suited for the applications like airline reservations, banking and insurance etc., where they know the process of work. They are going to carry the work which has been done previous days.

But, in Heuristic algorithm, the load of the system is unpredictable.

Request for work depend on who’s doing what, and can change dramatically from hour to hour,or even from minute to minute.

Here the processor allocations can be done in ad-hoc way that is called Heuristics approach.

66


Centralized Versus Distributed algorithms In the Centralized approach algorithm is collecting all the

information in one place allows a better decision to be made, but less robust and can put a heavy load on the central machine.

In the Decentralized algorithms the responsibility will be more for decentalised system, it is more preferable but some centralized algorithms have been proposed for lack of suitable decentralized algorithms.

The third problem here is whether we are trying to allocate the optimal allocation or merely an acceptable one.

Optimal solution can be obtained in both Centralized and decentralized approach but it more expensive than suboptimal one.

67


The fourth proble m is related to what is often called Transfer policy.

When a process is about to be created a decision has to be made whether or not it can be run on the machine where it is being generated.

If the machine is too busy, the new process must be transferred somewhere else. The choice here is whether or not to base the transfer decision entirely on local information. One school though of local algorithm, i.e if the machine load is below threshold it has to be executed locally otherwise globally.

But here the decision taken based on local variable alone but another schools talk about global variables.

68


The local algorithms are simple, but may be far from optimal, whereas global ones may only give a slightly better result at much higher cost.

The last problem is location policy.

Once transfer policy get rids of a process, the location policy has to figure out where to send it. The location policy cannot be local.

It needs the information about the load elsewhere to make an intelligent decision.

The sender has to initiated the operation first. For example,

69


70

Processor Allocation The first diagram shows that, it has been overloaded machine sends

out the requests for help to look for other machines, it is trying to offload its new process on some other machine.

The sender takes the initiative in locating more CPU cycles for processing the request.

In other figure, a machine that is idle or under loaded announces to other machines that it has little to do and is prepared to take on extra work.

The aim of this machine is searching for the machine that is willing to give it some work to complete.

For both cases different strategies for whom to probe, how long to continue probing, and what to do with the results.

71


The first diagram shows that, it has been overloaded machine sends out the requests for help to look for other machines, it is trying to offload its new process on some other machine.

The sender takes the initiative in locating more CPU cycles for processing the request.

In other figure, a machine that is idle or under loaded announces to other machines that it has little to do and is prepared to take on extra work.

The aim of this machine is searching for the machine that is willing to give it some work to complete.

72


Implementation issues for Processor Allocation Algorithms

The first problem is identifying the load of the machines. It can tell that load is underloaded or overloaded.

Measuring load is not very simple. One approach is identifying number of processes on the machine as the load.

But, even the idle machine have some processes running, like window manager and New and daemons etc. It is also counted as a load.

The second one count the process that is in running or ready state. Each running or runnable have some load. For example, a process is sleeping

and periodically it has to wake and check the status, at the time it has to be load some data(small load), after checking it goes to sleep.

Here, a fraction of time CPU is busy. The solution for this is have a timer and periodically interrupt and check the machine status.

Each interrupt will check the CPU utilization.

73


Implementation issues for Processor Allocation Algorithms

If the kernel is running a critical code, it often disable all the interrupts, including timer.

If the timer expires, during that time kernel is active, the interrupt will be delayed.

Another problem is identifying the overheads. That is, CPU time, Memory usage and network bandwidth etc.

So, we need suitable algorithm to predict these parameters.

The next problem is complexity of the algorithm.

There are 3 algorithms that has been used to identifying the machine a particular process has been allocated.

74


Implementation issues for Processor Allocation Algorithms Algorithm 1

It pick the machine at random and just send the new process there. If the receiving machine itself overloaded, it too pick up random machine and send the process off. It continues until some machine willing to take or hop counter is execeeded, then no more forwarding is possible.

Algorithm 2 In this method it will select the random machine and it ask whether the

machine or overloaded or underloaded. If the machine is underloaded, it get the new process, otherwise a new probe is tried. This will continue until it identify a machine or probe count is exceeded.

Algorithm 3 The probes has been send to K machines to determine the exact loads.

The process then sent to the machine with the smallest load.

75


Example Processor Allocation Algorithms A Graph-Theoretic Deterministic Algorithm

76


Allocating Nine processes on three processors

In the first diagram A,E,and G on one processor, B,F, and H on a second processor and C,D, and I on third processor.

If the total network traffic is the sum of the arcs intersects by the dotted cut lines.

In the first diagram the total cost is 30 units and in the second diagram the total cost is 28 units.

Assume that it meet all the contraints of memory and CPU.

The goal of this method is identifying the partition that minimize the network traffic while meeting all the contraints.

77


A Centralized algorithm

In the Graph theoretic algorithm is limited applicability because if we know the entire information in advance.

But in the heuristic algorithm does not require any advance information.

This algorithm is called as up-down algorithm. The co-ordinator maintain a usage table with one entry per personal

workstation. It will be initially zero. When a specific event occurs then it will be sent to co-ordinator to

update the information in the table. Allocations are done based on table. Scheduling can be done if processor is free or the clock has ticked.

78



This algorithm has been designed for maximizing the utilization of CPU.

In other algorithm, user take over all the machines if he promises to keep them all busy.

When a process has been created it ask the co-ordinator to allocate the processor, if processor is free it will be allocated, otherwise it will be noted, and temporarily denied.

When workstation owner running processes on other people’s machines, it accumulates penality points, a fixed number per second.

These points are added in usage table. When it has unsatisfied requests the penalty points are subtracted from the table.

If no request are pending or no processor are utilized then points will be nearly zero.

79


Example Processor Allocation Algorithms

80



Usage table entries can be positive, zero, or negative. A positive score indicates that the workstation is a net user of system resources, whereas a negative score mean that it need resources, a zero is neutral.

When a processor become free, the pending request who owner has the lowest score wins.

The purpose the workstation which need minimum processing time will be given preference.

81


A Hierarchical algorithm

The drawback of centralized up-down algorithm is it is noted suited for large systems.

Here, the central node becomes bottleneck, and also have a problem if it has a single point of failure.

It can be solved by using hierarchical algorithm. If suppose we divide the workstations in to groups, each group of K

workers, one manager machine(Dept. Head) is assigned the task of keeping track of who is busy and who is idle.

If system is large then the managers are controlled by Deans. The deans are monitored by hierarchically. Here, the communication alone is important. Here, the problem is the hierarchical system or Dept Head crashes

what happen?

82



83



The answer is to promote one of the direct subordinate of the faculty manager to fill in for the boss.

This system works well and the crashes are occasional. If the manager receive the request and it has few processors

available, it passes the request upward in the tree to its boss. If the boss cannot handle it either, the request continues propagate

upward until it reaches a level that has enough available worksers at its disposal.

At the stage, the manager split the request into parts and parcels them among the manager below it,.

If still it is not satisfied, then it marked busy and actual no.of processor allocated is indicated back up the tree.

84


A Sender – Initiated Distributed Heuristic algorithm

The algorithm described above are all centralized or semicentralized.

When process is created, the machine on which it originates sends probe messages to a randomly chosen machine, asking if its load is below some threshold value.

If so the process is sent there. If not, another machine is chosen for probing. Probing does not go on forever. If no suitable host is found within N probes, the algorithm

terminates and the process runs on the originating machine.

85


A Sender – Initiated Distributed Heuristic algorithm

This model behaves well and is stable under wide range of parameters, including different threshold values, transfer costs, and probe limits.

In the heavy load, all machines will constantly send probes to other machines to find out that which machine is attempt accept more work.

Few processes alone will be off-loaded, but considerable overhead may be incurred in the attempt to do so.

86


A Receiver – Initiated Distributed Heuristic algorithm

In this algorithm whenever a process finishes, the system checks to see if it has enough work. If not it picks some machine at random and asks it for work.

If that machine has nothing to offer, a second and then a third machine is asked.

If no work is found with N probes, the receiver temporarily stops asking, does any work it has queued up, and tries again when the next process finishes.

If no work is available, the machine goes idle. After some fixed time interval, it begins probing again.

An advantage of this algorithm is that it does not put extra load on the system at critical times.

87

Processor Allocation A Bidding algorithm

In this algorithm each process advertise its approximate price by putting it in a publicly readable file.

This price is not guaranteed, but gives an indication of what the service is worth.

Different processors may have different prices, depending on their speed, memory size, presence of floating point hardware, and other features.

An indication of the service provided, such as expected response time, can also be published.

When a process wants to start up a child process, it goes around and checks out who is currently offering the service that it needs.

It then determines the set of processors who service it can afford. From the set, it compute the best candidate, where best may mean

cheapest,fastest or best price/performance, depends on applications.

88

Processor Allocation A Bidding algorithm

It then generates the bid and sends the bid to its first choice. The bid may be lower or higher than the advertised price. Processors collect all the bids send to them, and make a choice, i.e it

picks the highest one. The winner and losers are informed, and the winning process is

executed. The published price of the server is then updated to reflect the new

going rate.

89

Group Communication

• RPC / RMI is the technique involve Communication between two-parties, Client and Server

• Sometimes, we need multiple communication processes

• For example, group of file servers cooperating to offer a single, fault-tolerant file service

• Group Communication concept has proposed to enjoy this identified requirements

• Group communication is nothing but Multicast Communication

• One-to-Many Communication

• Groups are Dynamic• Add / delete is possible

• A process can be a member of multiple groups

• Need as efficient mechanisms

90

Group Communication

• Implementation methods• Unicasting

• Multicasting

• Broadcasting

S R

R

S

R

RR

R

R R

R

Point-to-point communication

One-to-many communication

91

Design Issues

• There are nine Design Issues to improve the Performance of the Group Communication Approach. They are

• Closed Groups vs Open Groups

• Peer vs Hierarchical Groups

• Group Membership

• Group Addressing

• Send/Receive Primitives

• Atomicity

• Message Ordering

• Ovelaping Groups

• Scalability

92

Design Issues• Closed Groups versus Open Groups

• Closed groups used for parallel processing/games

• Open groups used for replicated server

Notallowed

Closed group

Processthat is not amember of the group

Closed group Open group

Allowed

Open group

Processthat is not amember of the group

93

Design IssuesPeer groups versus Hierarchical groups

• Peer groups

• symmetric no single point of failure

• complicated decision making

• Hierarchical groups

• Coordinator has to be identified from co-workers

• Coordinator has to decide the workers loads

• Loss of Coordinator brings to halt the entire group

94

Design Issues

Peer group

Hierarchical group

Peergroup Hierarchical

group Coordinator

Worker

95

Group Membership

• Group Membership

• Some method is need to create, delete, join and leave the groups

• Hence Group Server is used

• Group server easy to implement but single point of failure

• Another approach is Distributed way

• All can communicate to their Groups

• Synchronization is the serious problem

96

Group Membership

• Group Addressing

97

Group Addressing• Group Addressing

• Unique address

• Explicit list of all destination not transparent

• Predicate addressing - It has boolean expression

• Predicate have Receiver m/c No.,its local variables, etc.

• If the boolean expr is TRUE the message is accepted otherwise rejected.

• Example : Send message to only those m/c have at least 4M of free memory.

98

Send and Receive primitives• Send and Receive Primitives

• Here, the send and receive verbs cannot be used.

• Because in group communication n different replies will arises.

• How Procedure call will handle n number of reply

• Here, if it is point-to-point communication then it uses Send verb receive verb.

• If it is a group communication send primitives and receive primitives are used.

• different primitives group-send(), group-receive()

99

Atomicity• Most group communication systems are designed so that when a

message is sent to a group, it will arrive correctly at all members of the group, or at none of them.

• The property of all-or-nothing delivery is called atomicity or atomic broadcast.

• Atomicity is desirable because it makes programming distributed systems much easier.

• When any process sends a message to the group, it does not have to worry about what to do if some of them do not get it.

• For example, in a replicated distributed data base system, suppose that a process sends a message to all the data base machines to create a new record in the data base, and later sends a second message to update it.

• If some of the members miss the message creating the record, they will not be able to perform the update it.

100

Atomicity• If some of the members miss the message creating the record, they

will not be able to perform the update and the database become inconsistent.

• If the information is send to group, then it has to acknowledge otherwise it has to create a report of failure.

• But atomic broadcast is not that much simple to implement.

• This method is good as long as machines never crash.

• Here, there is simple algorithm to implement atomic broadcast.• Sender start out sending a message to all members of the group.

• Timers are set and retransmissions sent where necessary.

• When a process receives it, it checks the message if it is not seen earlier then send the message to all the group members otherwise discard it.

• No matter how many machines crashes or how many packets are lost, eventually all the surviving processes will get the message.

101

Message Ordering

• Message ordering• If process 0 send to 1,3,4 and process 4 to 0,1,3 simultaneously

• Global time ordering is the serious problem and difficult to implement

• Consistent time ordering

0 4

To 1

To 0

To 1

To 3

To 3

To 4

Tim

e

Three message order

102

Overlapping Groups• Overlapping Groups

• require well-defined ordering scheme in overlapping groups

AB

C

D

Group 1 Group 2

1

4 3

2

G1 G2

G3 G4

Multicast

LAN2

LAN1

LAN4

LAN3

Multicasting in internetworkOverlapping groups

103

Scalability• Many algorithms works well as long as all the groups only have

few members, but what happen when there are tens, hundreds or even thousands of members per group?

• What happens when the system is so large that it no longer fit on a single LAN, so multiple LANs an gateways are required?

• What happens when the groups are spread over several continents?

• The presence of gateway can affect many properties of the implementation.

• Here, the multicasting becomes more complicated.

• The above diagram consists of 4 LANs and four gateways, to provide protection against the failure of any gateway.

• Here LAN2 perform multicast. It is received by G1 and G3. If the gateway won’t permit it will discard it. Otherwise LAN1 and 4 will copy the multicast and forward to G2 and G4 and again LAN2 vice versa. It creates exponential growth of packet in the multicasting.

• So, scalability is a problem if Multiple networks with gateways.

104

Design issues• Group Communication in ISIS

• toolkit for building distributed application

• based on synchrony, ie Strictly Sequential

A B C D

M1

M2

Synchronous System

Time

Loose Synchrony Virtual Synchrony

A B C D

M1

M2

A B C D

M1

M2

M1 arrivesbefore M2

M2 arrivesbefore M1

• Synchronous system

• Loosely synchronous system (loose synchrony)

• Virtually synchronous system (virtual synchrony)

• Causally related events - The nature or behavior of the second message might have been influenced in any way by the first message

• Concurrent events - Two events that are unrelated.

• What virtual synchrony means is that if two messages are causally related, all processes must receive them in the same (correct) order

• However, if they are concurrent, no guarantees are made, and the system is free to deliver them in a different order to different processes if this is easier.

Group Communication in ISIS

• Communication Primitives in ISIS

• ABCAST - For Loosely Synchronous for transferring data

» It is like two phase commit protocol

» Each process assigned a timestamp, based on that process

select largest one and commit the message.

» It is complex and expensive

• CBCAST - same as ABCAST but this is having Virtual Synchronous for managing group membership

» Guranteed order of messages

• GBCAST - Used to Manage Group Membership


• Communication Primitives in ISIS CBCAST


CBCAST in ISIS

0468215

1378215

2358215

3378215

4268215

5378315

Accept Delay Accept Delay Accept

State of the vectors at the other machines

Vector in amessage sentby process 0

Fault Tolerance Systems

• A system is said to fail when it does not meet its specification

• Need an efficient mechanism to handle the failures of systems and

• Various faults• Component Faults

• System failures

• Synchronous vs Asynchronous Systems

• Use of redundancy

• Fault Tolerance using Active Replication

• Fault Tolerance using Primary backup

• Agreement in Faulty Systems

Component Faults• Computer System can fail due to processor, memory, I/O

device, cable or software• Various types of faults

• transient faults• occur once and then disappear• if operation is repeated, the fault goes away• Microwave Transmission error / Bit error

• intermittent faults• reappears• e.g. a loose contact on a connector

• permanent faults• e.g. burnt-out chips, software bugs, disk head crashed

• Mean time to failure = 1/p, p is the probability

System Failures• There are two types of System/Processor Faults• Fail-silent faults

• faulty processor stops and does not respond to subsequent input or produce output

• except perhaps to announce that no longer functioning• Called fail-stop faults

• Byzantine faults• faulty processor continues to run, issuing wrong answers to

questions• Reason due to undetected software faults• It is dangerous one• Need to concentrate more towards this types of faults• Need Synchronous comn. Model for FTolerance

Use of Redundancy• It is a general approach to Fault Tolerance

• Replication is needed

• Information redundancy

• extra bits are added to allow recovery

• e.g. Hamming code

• Time redundancy

• action performed and again

• e.g. atomic transactions (abort and redone)

• Physical redundancy

• extra equipment is added

• e.g. extra processors added

Fault Tolerance Using Active Replication

A1

A2

A3

V1

V2

V3

B1

B2

B3

V4

V5

V6

C1

C2

C3

V7

V8

V9

voter

Triple Modular Redundancy

Fault Tolerance Using Primary Backup

Client Primary Backup

1. Request

2. Do work

3. Update

4. Do work

6. Reply 5. Ack

Simple primary-backup protocol on a write operation

• It is nothing but any one time at any one instant, one server is the primary and does all the work.

• If the primary fails, the backup takes over.

• Like generators, co-pilots, Vice president etc are example.

• Compare to active replication backup has 2 advantages:– simpler during normal operation

– It need one primary and one backup

– Based on previous diagram

– Client request the primary to process the information, and then updation has been carried out after completion of updation the ack will ne send to primary and reply will be sent to client.

– If primary crashes before doing the work, then the client will time out and try again and again and one time it complete the job with the help of backup.

Primary Backup

• If the primary crashes after the work being done, but before sending the update, when the backup takes over, and request comes once again the work will be done second time with the help of backup.

• If the work has side effect then it could be problem.

• If primary crashes after step four but before step six then the work has to be done three times. In the first time in primary, once in the backup as a result of step 3, once after the backup becomes the primary.

• This problem can be solved by asking Are you alive? Periodically by backup to primary.

• If the primary won’t response then the backup can take over.

• What happen when primary does not crash it goes to slow because of load? There is no mechanism available to verify slow processors.

Primary Backup

• The goal of distributed algorithm depends on– Are messages delivered reliably all the time?

– Can processes crash, and if so, fail-silent or Byzantine?

– Is the system synchronous or asynchronous?

– The example will show you the communication is perfect but the processors are not.

– The example of this is Byzantine generals problem.

Agreement in Fault systems

Byzantine generals problem

• It is used to identify the faulty systems• Every Processor reports its id to others• Other processors are collected in the form of the vectors• passing its vector to every other processors• Agreement, n is the no. of processors and m is the no. of faulty processors• total of 3m+1, agreement can be achieved only if 2m+1 (m is faulty)

1 2

43

x 1

1

1

24 2

y4 2

z

4

1 Got (1,2,x,4)2 Got (1,2,y,4)3 Got (1,2,3,4)4 Got (1,2,z,4)

1 Got 2 Got 4 Got(1,2,y,4) (1,2,x,4) (1,2,x,4)(a,b,c,d) (e,f,g,h) (1,2,y,4)(1,2,z,4) (1,2,z,4) (i,j,k,l)

(a) (b) (c)

Real-time Distributed Systems

• It is a Deadline based one

• Need not be faster or an efficient system

• It should be more highly structured and much more than General purpose Distributed Systems

• For activating systems, the stimulus to be given ie thread

• Periodic, Aperiodic or Sporadic• Periodic – Regular interval eg getting new frames every seconds

• Aperiodic - occur but not regular eg arriving of aircraft

• sporadic - unexpected eg device get heated

Real-Time Distributed Systems

Dev

C

Dev

C

Dev

C

Dev

C

Dev

C

Dev

CC

Actuator

Sensor

Externaldevice

Computer

Network

• Soft Real-Time

• Missing an occasional deadline is all right• Telephone switch may misroute one call

• Hard Real-Time• Missing an occasional deadline is unacceptable one• Break, Flight Controllers

Hard Real Time vs. Soft Real Time

• Critical Real-time systems such as nuclear power plants or Embedded Systems for Airplanes are Hard Real Time

• Non-critical real time systems where missed transactions only degrade system quality are Soft Real Time

Hard deadline

Soft deadline

Design Issues • Clock Synchronization• Event-Triggered vs Time-Triggered systems

• event-triggered systems• when significant event happens, it is detected by sensor and get an

interrupt• interrupt driven• but can fail under heavy load

• time-triggered systems• a clock interrupt occurs every T

• Predictability - predicting behavior • Fault Tolerance• Language Support

• Minimum complexity with max speed • Less Memory and CPU utilization for loading/executing codes

• maximum execution time of every task can be computed at compile time e.g. while loop, recursion

Real-time Communication

• Communication in Real-Time DS is different from Communication in DS

• Achieving Predictability is the important key• Communication between Processors must be predictable

• Token ring LAN• It is having multiple priority classes• ie high priority packets will be sent than that of low priority

packets• a processor has a packet to send• if k machine on the ring, one n-byte packet per token capture• It will send packets of the highest priority class for kn byte

times • It is not an efficient technique

Real-time Communication

• TDMA (Time Division Multiple Access) is the alternate approach to Token Ring

• Fixed-size frames and contains n slots

• Each slot is assigned to one processor

• Processor use slot to transmit a packet when its time comes

• Hence, no collision, bounded delay, guaranteed bandwidth and predictable one.

Start of TDMA frame

TDMA frame

Slot reservedfor processor 0

TDMA frame

Processor 0did not use itsslot in this frame

Packet

Packet delayafter startof its slot

Real-time CommunicationTime-Triggered Protocol

• TTP is one of the Real-Time Communication Protocol

• It uses MAintenable Real-time Systems – MARS

• It consists of one CPU

• It is connected to two independent TDMA Broadcast networks

• Maintaining Clock Synchronization is the serious problem

The Time-Triggered Protocol

Start ofpacket

Mode ACK Data CRC

Initialize Used tochangethe mode

No sender, receiver,length or type of informationis included because this is all known implicitly

Checks sender’sglobal stateas well as thepacket bits

Control

Real-time Scheduling

• Hard real time vs soft real time

• hard real time

• must guaratee that all deadlines are met

• soft real time

• missing an occasional deadline is all right

• Preemptive vs nonpreemptive scheduling

• preemptive

• allows a task to be suspended temporarily when a higher priority task arrives

• nonpreemptive

• runs each task to completion

Real-time Scheduling • Dynamic vs static

• dynamic• scheduling decisions during execution• good fit with a event-triggered design

• static• scheduling decisions before execution• good fit with a time-triggered design

• Centralized vs decentralized• centralized

• one machine collecting all the information and making all the decisions

• decentralized• each processor making its own decisions

Real-Time Scheduling

• Dynamic Scheduling• Rate Monotonic vs EDF• RM, tasks are assigned fixed priorities and the task with the

smallest period receives the highest priority• EDF(Earliest Deadline First), priorities are assigned

dynamically and are inversely proportional to the absolute deadlines of the active jobs.

• Implementation Complexity• Run-time overhead• Resulting jitter• Efficiency in handling aperiodic activities• CPU Utilization

Static Scheduling

1

2 4

6

3 5

7 8 9 10

A

BResponse

Stimulus

Documents

1 MMCA 404E04 DISTRIBUTED COMPUTING By Dr. S. SUBASREE Associate Professor SASTRA University