Upload
gabriella-wilcox
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Resource Management
Every distributed system consists of a number of resources interconnected by a network.
Set of available resources in a distributed system seem like a single virtual system to the user.
A resource manager schedules the processes in a distributed system to make use of the system resources in a manner that resource usage, response time, network congestion, and scheduling overhead are optimized.
Scheduling of process can be done in three ways:– Task assignment approach– Load balancing approach– Load sharing approach
Desirable Features of Scheduling Algorithm
No priori knowledge about the processes– Else users have to specify resource requirements of
processes on submission for execution
Dynamic in nature– Process assignment based on current load of system not on
fixed static policy– Preemptive process migration facility
Quick decision making capability– Use heuristic methods requiring less computational effort
Balanced system performance and scheduling overhead– Tradeoff – global information / higher cost & timely
processing of information
Stability– Processor thrashing happens when scheduling decisions
taken independently or on basis of obsolete data & processes in transit to a lightly loaded node are not taken into account.
Scalability– Probe only m out of N nodes for selecting a host.
Fault tolerance– Should have decentralized decision making capability &
consider only available nodes for scheduling
Fairness of service– Use load sharing rather than load balancing. A node shares
its resources as long as its users are not significantly affected.
Task Assignment Approach
Find an optimal assignment for tasks of an individual process.
Assumptions– Process split into cohesive tasks with minimum
coupling– Amount of computation required by each task &
speed of each processor known– Cost of processing each task on every node is
known.– IPC costs between every pair of tasks is known.– Resource requirements, available resources,
precedence relationships among tasks known.– Reassignment of tasks not possible.
Task Assignment Approach
Goals– Minimization of IPC– Quick turn around time– High degree of parallelism– Efficient utilization of system resources
Drawbacks– Characteristics of all processes should be known
in advance– Does not take care of dynamically changing state
Serial assignment
Task node
t1
t2
t3
t4t5
t6
n1
n1
n1
n2n2n2
Optimal assignment
Task nodet1
t2
t3
t5
t4
t6
n1
n2
n1
n1
n1n1
t6
11
Inter-task communications cost
t1
t2
t3
t4
t5
t6
t1 t2 t3 t4 t512
12
0
6
4
6 4 0
0
0
0
0
0
0
0
0
0000
0
0
5
11
312
8
8
123
0
0
0
0
5
Execution costsNodes
n1 n2
Tasks
105t1t2
t3
t4
t5
t6
2
4 4
6 3
5 2
4∞∞
∞∞
Serial assignment execution cost
t11 + t21 + t31 + t42+ t52+ t62 = 5+2+4+3+2+4 = 20
Serial assignment communication cost
c14 + c15 + c16 + c24 + c25 + c26+ c34+ c35+ c36
=0+0+12+12+3+0+0+11+0 = 38 Serial assignment total cost = 20+38 = 58
Optimal assignment execution cost
t11 + t21 + t31 + t41+ t51+ t62 = 5+2+4+6+5+4 = 26
Optimal assignment communication cost
c16 + c26 + c36 + c46 + c56 = 12 +0+0+0+0 = 12
Optimal assignment total cost = 26+12 = 38
Cut Set– A set of edges of a graph which, if removed (or "cut"),
disconnects the graph (i.e., forms a disconnected graph).
Minimum cut set– A cut is minimum if the size of the cut is not larger than the
size of any other cut.
Optimal Assignment using Minimal Cutset
Optimal Assignment using Minimal Cutset
t1
t2 t6
n2
t3
t5
t4n1
54
6
12
8
12
4
3
2 53
6
4
5
2
4
10
11
Minimum cost cut
∞∞
Load Balancing Approach
Balance the total system load by transparently transferring the workload from heavily loaded nodes to lightly loaded nodes to maximize the total system throughput.
Classification of Load Balancing Algorithms
Static Dynamic
Deterministic
Probabilistic
Centralized
Distributed
Cooperative
Non - cooperative
Load Balancing Algorithms
Static– Decisions are hard-coded into an algorithm with a
priori knowledge of system.– Use average behavior of system, ignoring current
state of the system.– Simpler
Dynamic– React to the system state that changes
dynamically.– Have greater performance benefits
Static Algorithms Deterministic
– Uses information about the properties of the nodes and the characteristics of the processes to be scheduled to allocate processes to nodes. E.g. Task assignment approach
– Difficult to implement but efficient
Probabilistic– Uses information regarding static attributes of the
system such as number of nodes, processing capability of each node, network topology etc to allocate nodes to processes.
– Easier to implement but poor performance
Dynamic Algorithms Centralized
– Single node, centralized server node collects system state information and is responsible for scheduling.
– Efficient as knows both load at each node & number of processes requiring service.
– K+1 replications of server to survive k faults. Consistency?– Use reinstantiation.K entities monitor single server to detect
its failure. On failure new server brought up. Occasional delay in service.
Distributed– Uses k physically distributed entities that work as local
controller, where each is responsible for making scheduling decisions for processes of a predetermined set of nodes.
– In a fully distributed system each node acts as a local controller.
Dynamic Algorithms
Cooperative– Distributed entities cooperate with each other to
make scheduling decisions. – More complex and involve larger overhead.
Non-cooperative– Individual entities act as autonomous entities and
make scheduling decision independently.– Less stable.
Load Balancing Algorithm
All processes are distributed among nodes of the system to equalize workload among nodes.
Design Issues – Load estimation policy– Process transfer policy– State information exchange policy– Location policy– Priority assignment policy– Migration limitation policy
Load Estimation Policy
How to estimate workload of a particular node on the system?– Total number of processes on node (Remaining
service time: memoryless, pastrepeats, distribution)– Resource demands of processes– Instruction mixes of these processes– Architecture & speed of node’s processor
Above methods not suitable because of existence of daemon processes
Thus, use CPU utilization measured by observing CPU state.
Process Transfer Policy
Whether to execute a process remotely or locally?
Decide node is lightly or heavily loaded?
Threshold value - limiting value of a nodes workload
Static policy– Each node has a predefined threshold value
depending on its processing capability
Dynamic policy– Threshold value is calculated from average
workload of all nodes and a predefined constant
Node should accept remote process if it does not significantly affect service to local processes & it should transfer local process elsewhere if it improves performance of rest of its processes.
Single threshold value may lead to useless process transfers. May result in process thrashing.
In double threshold, policy new processes are accepted only in under loaded region.
Overloaded
Under loaded
Threshold
Single-threshold policy
Overloaded
Normal
Under loaded
Low mark
High mark
Double-threshold policy
Location Policy
To which node a selected process should be sent?
Threshold– Probe nodes randomly to check whether node is able to
receive the process– If yes then transfer the process, else another node is selected
randomly. This continues until probe limit is reached, else process is executed at originating node.
Shortest– L distinct nodes are chosen at random & polled to determine
its load. Process is transferred to node having minimum load unless its workload value prohibits to accept the process.
– If none of polled nodes accept process, it is executed at originating node.
– Discontinue probing whenever a node with zero load is encountered
Bidding– Each node can act as manager (send process) and contractor
(receive process)– Managers broadcast request for bid, contractors respond with
bids (prices based on capacity of the contractor node) and manager selects the best offer
– Winning contractor is notified and asked if it accepts bid or not. If bid is rejected, bidding is started again.
– Full autonomy to nodes (contractors/ managers)– Big communication overhead &– Difficult to decide a good pricing policy
Pairing– Reduce variance of load only between pairs of nodes– Two nodes that differ greatly in load are temporarily paired
with each other and migration starts– The pair is broken as soon as the migration is over – Each node asks some randomly chosen node to form a pair– On receiving rejection, randomly selects another node for
pairing– A node only tries to find a partner if it has at least two
processes
State Information Exchange Policy
How to exchange load information among nodes?
Periodic broadcast– Each node broadcasts its state information after
every time T– Heavy network traffic & poor scalability
Broadcast when state changes– Avoids fruitless messages by broadcasting state
only on arrival or departure of a process– Optimize by broadcasting only when state switches
to under loaded or over loaded region.
On-demand exchange– Node broadcasts a State-Information-Request
message when its state switches from normal to either under loaded or overloaded region.
– Other nodes which can co-operate with it in load balancing process reply with their own state information to the requesting node
Exchange by polling– Random polling used rather than broadcasting– Polling stops on finding suitable partner or poll
limit is reached
Priority Assignment Policy
How to set priority of execution of local and remote processes?
Selfish
– Local processes are given higher priority than remote processes.
Altruistic
– Remote processes are given higher priority than local processes.
Intermediate
– When number of local processes is greater or equal to number of remote processes, local processes are given higher priority. Otherwise, remote processes are given higher priority.
Migration Limiting Policy
How many times a process can migrate?
Uncontrolled– A process may be migrated any number of times.– Causes instability
Controlled– Use a migration count to fix a limit on the number of
time a process can migrate– Irrevocable migration policy: migration count is fixed
to 1– Migration count k: allow multiple process migrations
to adapt to dynamically changing states of nodes.
Load Balancing vs. Load Sharing
Load balancing algorithms attempt to equalize loads among nodes.
Load-balancing algorithms transfer tasks at a higher rate, thus incurring higher overhead. Also big overhead is generated by gathering exact state information.
Load balancing is not achievable since number of processes in a node is always fluctuating and temporal unbalance among the nodes exists every moment.
Load-sharing is much simpler than load-balancing since it only attempts to ensure that no node is idle when heavily node exists.
Load Sharing Approach
Load estimation policy– Simply counts total number of processes (or CPU
Utilization)
Process transfer policy– All-or-nothing strategy - threshold value 1– Nodes can receive process when it has no process,
and send process when it has more than 1 process– Threshold value set to 2 to avoid node being idle
while waiting for acquiring process.
Priority assignment policy & Migration limitation policy same as in Load balancing
Location policy
Sender initiated location policy– Heavily loaded nodes search for lightly loaded nodes– When node becomes overloaded, it either
broadcasts or randomly probes other nodes to find a node that is able to receive remote processes
– Initiated on process arrival
Receiver initiated location policy– Lightly loaded nodes search for heavily loaded nodes – When node becomes under loaded, it either
broadcasts or randomly probes other nodes indicating its willingness to receive remote processes
– Initiated on process departure– Require preemptive process migration
State information exchange policy
Broadcast when state changes– In sender-initiated/ receiver-initiated location policy
a node broadcasts State Information Request when it becomes overloaded/ under loaded
Poll when state changes– Randomly asks different nodes for state
information until find an appropriate one or probe limit is reached
Process Management
Process allocation– Which process should be assigned to which
processor
Process Migration– Movement of process from its current location to
new processor
Threads– Fine grain parallelism
Process Migration
Load balancing (load sharing) policy determines:– if process needs to be migrated from one node to
another.– which process needs to be migrated– what is the node to which process is to be moved
Process migration mechanism deals with the actual transfer of the process
Migration can be– Non preemptive– Preemptive
Process migration involves 3 steps:– Selection of process that should be migrated– Selection of the destination node.– Actual transfer of selected process.
Desirable Features of Good Process Migration Mechanism
Transparency– Object access level - access to objects (such as
files and devices) by process can be done in location -independent manner.
– System call and inter-process communication – System calls should be location independent & redirection of messages should be transparent
Minimal residual dependencies – Migrated process should not depend on the node it
migrated from; else previous node is still loaded & process will fail if the previous node fails.
Minimal interference – In process execution by minimizing freezing time
Efficiency– minimize time required to migrate a process, cost of
locating an object, cost of supporting remote execution after migration
Robustness– Failure of any node other than current, should not
affect the execution of the process
Communication between co-processes of a job– Communication between co-processes should be
directly possible irrespective of their location
Process Migration Mechanism
Sub activities– freezing the process on its source node and
restarting it at destination node– moving the process’s address space– forwarding messages meant for the migrant
process– handling communication between cooperating
processes that are separated
Freezing and Restarting Mechanism
Take snapshot of process’s state on its source node & reinstate it on the destination node
Immediate and delayed blocking of the process depending on kind of system call being executed
Fast and slow I/O operations– Wait for completion of fast I/O operations– Resume slow I/O operation at destination
Information about open files – names, identifiers, access modes, current positions of file pointers, etc.– System wide unique identifiers– Identify files by their pathnames
• Use of links to remote files• Reconstruct complete pathname
– Do not transfer replicated files & use local files as far as possible
Reinstating the process on its destination node– Process given temporary identifier at destination node– On completing migration, new copy’s identifier changed to
original, new copy unfrozen, old copy deleted.
Address Space Transfer Mechanism
Information to be transferred– Process’s state
• contents of registers, program counter, I/O buffers, interrupt signals
– Process’s address space • code, data & stack
Address space transfer mechanisms– Total freezing– Pretransferring– Transfer on reference
Total Freezing
Process’s execution is stopped while transferring the address space
Simplest but slowest
Transfer ofaddress space
Sourcenode
Destinationnode
Execution Suspended
Migrationdecision
Execution resumed
Freezingtime
Total Freezing
Pretransferring (Precopying)
Address space is transferred while the process is still running on the source node
After the transfer, the modified pages are picked up Freezing time is reduced Migration time may increase due to possibility of redundant
transfer of same pages
Transfer ofaddress space
Sourcenode
Destinationnode
Execution Suspended
Migrationdecision
Execution resumed
Freezingtime
Pretransferring
Transfer on Reference
Process starts executing at destination before the address space is migrated
Pages are fetched from the source node as required Freezing time very less Process continues to impose load on source node Failure of source node results in failure of process
On-demandtransfer
Sourcenode
Destinationnode
Execution Suspended
Migrationdecision
Freezingtime
Transfer-on-reference
Execution resumed
Message Forwarding Mechanism
Three types of messages:– Type1: Received when the process execution is
stopped on the source node and has not restarted on the destination node
– Type2: Received on the source node after the execution started on destination node
– Type3: Sent to the migrant process after it started execution on destination node
Mechanism of resending the message– Message type 1 and 2 are returned to sender or
dropped – Sender retries after locating the new node – Type 3 message directly sent to new node– Message forwarding mechanism is not transparent
Origin site mechanism– Each site keeps information about current locations
of all processes created on it.– All messages are sent to origin site– Origin site forwards messages to process’s current
location– If origin site fails, forwarding mechanism fails– Continuous load on the origin site
Link traversal mechanism– Message queue created at origin for type1 – All messages in queue sent to destination as part of
migration procedure– To redirect messages of type 2 and 3,on migration,
link of destination is left on source node– Process can’t be located if any node in chain of
links fail
Link update mechanism– Processes communicate over location independent
link mechanisms– During transfer phase, the source node sends link
update messages to the kernels controlling all of the migrant process’s communication partners.
Co-processes Handling
Need to provide efficient communication between a parent process and sub-processes
Mechanism for handling co-processes– Disallow separation of co-processes
• Disallow migration of processes that wait for their children to complete
• Ensure that when parent process migrates all its child process also migrates with it
– Home node origin• Communication between parent process & its
children processes take place via home node
Process Migration in Heterogeneous System
Data translation required if source CPU & destination CPU formats are different
Use External data representation
Serializing : change particular machine format to XDR format
Example -- Floating point representation• Exponent and mantissa
Handling Exponent & Mantissa
Number of bits used for exponent & mantissa varies in processors
XDR should have at least as many bits in the exponent (mantissa) as the longest exponent (mantissa) of any processor.
Restrict size of numbers used by processes
Restrict migration to only nodes whose representation size is at least as large as that of source node’s processor
Advantages of Process Migration
Reducing average response time of processes by balancing workload
Higher throughput Speeding up individual jobs by concurrent execution Utilizing resources effectively Reducing network traffic by migrating processes closer
to resource Improving system reliability by migrating critical
processes to more reliable node Improving system security by migrating sensitive
processes to more secure node
Threads
In certain cases, a single application may need to run several tasks at the same time
Create a new process for each task– Time consuming
Use a single process with multiple threads
Processes Processes are independent execution units that contain
their own state information, use their own address spaces, and only interact with each other via IPC.
Inter-process communication is expensive Context Switch expensive Secure: one process cannot corrupt another process
Threads
A single process might contain multiple threads.
All threads of a process share:– memory (program code and global data)– open file/socket descriptors– signal handlers and signal dispositions– working environment (current directory, user ID)
Each thread has it’s own:– Thread ID (integer)– Stack, Registers, Program Counter
Heavyweight Process = Process; Lightweight Process = Thread
Why Threads? Threads can be created more easily than new processes
All threads within a process share the same state and same memory space, and thus can communicate with each other directly.
A context change between threads is cheaper than the operating system's context change between processes
Resource sharing can be achieved more efficiently.
Allow parallelism to be combined with sequential execution & blocking system calls.
There is no protection between threads of a process as they share same address space.
Server Models
Single thread process– Blocking system calls with no parallelism
Finite state machine– Parallelism with non blocking system calls
Group of threads– Parallelism with blocking system calls– Server built with single dispatcher thread &
multiple worker threads
Models for Organizing Threads
Dispatcher thread
Worker Thread
Request
Worker Thread
Request
T TT
1 2 3 Each type of request being handled by a different thread.
Request
T TTOutput generated by first thread used for processing by second thread.
Dispatcher Worker Model Team Model
Pipeline Model
Threads Design Issues
Threads creation– Static– Dynamic
Threads termination
Thread synchronization– Mutex variable : Binary Semaphore that is either in
locked or unlocked state.– Condition variables (wait & signal) associated with
mutex variable
Thread scheduling– Preemptive/ Non Preemptive Priority assignment facility– Flexibility to vary quantum size dynamically– Handoff scheduling
• Directly switch CPU to thread specified by currently running thread.
– Affinity scheduling• Thread is scheduled on the CPU it last ran on in the hope
that part of the address space is still in the CPU’s cache
Signal handling– A signal is handled properly by creating a separate exception
handler thread in each process.– Signals are prevented from getting lost by assigning each
thread its own private global variables for signaling exception condition.
Implementing Threads Package Implemented in user space.
– Managed by run time system– Kernel not aware of the existence of threads.
Implemented in kernel.– Single level scheduling.
Processes and their threads
Runtime System (Maintains thread status
information)
Kernel (Maintains process status information)
Processes and their threads
Kernel (Maintains thread status
information)
Kernel space
User Space
User Level Kernel Level
User-level vs. Kernel-level
Advantages of User – level implementation– Threads package can be implemented on top of an existing
OS that does not support threads.– Users can use customized scheduling algorithms– Context switching faster as done by runtime system without
involving kernel.– Scalability of kernel level approach is poor.
Disadvantages– When a thread blocks, all threads of its process are stopped &
kernel schedules another process to run.– Use of round robin scheduling policy to time share CPU
cycles among threads on quantum by quantum basis is not possible, as there is no clock interrupts within single process.
DCE Threads
Threads standard - POSIX (Portable Operating system Interface for Computer Environments) – P threads
Implementations of POSIX– DCE threads– GNU threads
DCE provides a set of user-level library procedures for creation, termination, synchronization of threads.
Thread Management in DCE
Create– Create new thread in address space of calling
thread Exit
– Terminate calling thread. Join
– Calling thread blocks itself until thread specified in join command terminates.
Detach– Used by parent to disown a child thread.
Cancel– Used by thread to kill another thread.
Real Time Distributed Systems Real Time System: Any computer system where the
correct behavior of these systems depend not only on the result of its computations, but also on the time at which the results are produced. Aims at minimizing response time.
Real-time systems require timely responses to events– even under failure conditions– even under extreme load conditions
Soft vs. Hard Real Time Systems Soft real time System - Compute output response as fast as
possible, but no specific deadlines that must be met. If deadline is not met, system still works but with degraded performance. Ex. Games, Washing Machine, Camcorder.
Hard real time system - Output response must be computed by specified deadline or system fails. Ex. Missile guidance system, nuclear power stations, car engine control system.
Profit
time
release time
deadline
0
+
-
Profit
time
release time
deadline
0
+
-
Soft Real Time System Hard Real Time System
Real-time system– When an external device generates a stimulus for
computer, it must respond to it before a certain deadline.
– Stimuli can be periodic, aperiodic or sporadic
Dev
C
Dev
C
Dev
C
Dev
C
Dev
C
Dev
CC
Actuator
Sensor
Externaldevice
Computer
Network
A distributed real-time computer system
Design Issues
Clock Synchronization
Event-Triggered vs. Time-Triggered systems– Event-triggered systems
• When significant event happens, it is detected by sensor and interrupts the attached CPU
• Can fail when many events happen at same time (event shower)
– Time-triggered systems• A clock interrupt occurs every T
Fault Tolerance– System should be able to cope with maximum
number of faults and maximum load at the same time.
Predictability– System should meet all of its deadlines, even at peak
load– Worst case execution time of each of the system
calls is calculable
Language Support– Requires specialized real time languages– Maximum execution time of every task can be
computed at compile time e.g. cannot support while loop, recursion
– Variable clock, containing current time should be available.
Real Time Communication – Need to have predictability and determinism– Network should be able to give worst case bound on
packet transmission in advance– Use Token Ring LAN rather than Ethernet– TDMA (Time Division Multiple Access)
• Fixed-size frames and contains n slots• Processor use slot to transmit a packet when its
time comes.
Start of TDMA frame
TDMA frame
Slot reservedfor processor 0
TDMA frame
Processor 0did not use itsslot in this frame
Packet
Packet delayafter startof its slot
Time Triggered Protocol– Assumes all clocks are synchronized.– Every node maintains global state of system –
current mode, global time, bit map giving current system membership.
Start ofpacket
Mode ACK Data CRC
Initialize Used tochangemode
Checks sender’sglobal state & packet bits
Control
Real-Time Scheduling
– Hard real time vs. soft real time
– Preemptive vs. non-preemptive scheduling
– Dynamic vs. static
– Centralized vs. decentralized
Dynamic Scheduling
Rate monotonic algorithm– Each task is assigned a priority equal to its
execution frequency
Earliest deadline first– Event is sorted by deadline, closest deadline first
Least laxity– Chooses the task with the least laxity