Distributed Systems

Highly Concurrent and Fault-Tolerant h-out of-k Mutual ExclusionUsing Cohorts Coteriesfor Distributed Systems

Distributed SystemsA distributed system consists of interconnected, autonomous nodes which communicate with each other by passing messages.

CS vs Mutual ExclusionA node in the system may need to enter the critical section (CS) occasionally to access a shared resource, such as a shared file or a shared table, etc.How to control the nodes so that the shared resource is accessed by at most one node at a time is called the mutual exclusion problem.

K-Mutual ExclusionIf there are k, k1, identical copies of shared resources, such as a k-user software license, then there can be at most k nodes accessing the resources at a time.This raises the k-mutual exclusion problem.

h-out of k-mutual exclusionOn some occasions, a node may require to access h (1hk) copies out of the k shared resources at a time; for example, a node may need h disks from a pool of k disks to proceed.How to control the nodes to acquire the desired number of resources with the total number of resources accessed concurrently not exceeding k is called the h-out of k-mutual exclusion problem or the h-out of-k resource allocation problem [10].

Related WorkThere are four distributed h-out of-k mutual exclusion algorithms proposed in the literature [2, 5, 9. 10]. The first algorithm using request broadcast is proposed by Raynal in [10], and three algorithms using k-arbiters, (h, k)-arbiters, and k-coteries are later proposed in [2], [9], and [5], respectively.

Jiangs AlgorithmAmong the four algorithms, only Jiangs algorithm using k-coteries is fault-tolerant.It can tolerate node and/or network link failures even when the failures lead to network partitioning.Furthermore, it is shown in [5] to have lower message cost than others.

Basic Idea of Jiangs Alg.The basic idea of the algorithm is simple: a node should select h mutually disjoint sets and collect permissions from all the nodes of the h sets to enter CS for accessing h resources.To render the algorithm fault-tolerant, a node is demanded to repeatedly reselect h mutually disjoint sets for gathering incremental permissions when a node fails to gather enough permissions to enter CS after a time-out period.

Drawbacks of Jiangs Alg.First, it does not specify explicitly how a node can efficiently select and reselect h mutually disjoint sets.Second, when there is contention, a low-priority node always yields its gathered permissions to high-priority nodes, which causes higher message overhead and may prohibit nodes from entering CS concurrently.

Overview of the Proposed Alg.In this paper, we proposed another h-out of-k mutual exclusion algorithm using a specific k-coterie cohorts coterie to eliminate the drawbacks of Jiangs algorithm. Constant message cost in the best caseA candidate to achieve the highest availability, the probability that a node can gather enough permissions to enter CS in an error-prone environment, among all the algorithms using k-coteries.

k-Coterie

A k-coterie [3] is a collection of sets (called quorums) satisfying the following properties:

1. Intersection Property: There are at most k pairwise disjoint quorums.

2. Non-intersection Property: For any h (< k) pairwise disjoint quorums Q1,...,Qh, there exists a quorum Qh+1 such that Q1,...,Qh+1 are pairwise disjoint.

3. Minimality Property: Any quorum is not a super set of another quorum.

Cohorts Structure

A cohorts structure Coh(k, m)((C1,...,Cm), m(k, is a list of sets, where each set Ci is called a cohort. The cohorts structure Coh(k, m) should observe the following three properties:

P1.(C1( = k.

P2.(i: 1< i ( m : (Ci ( > 2k(2, for k>1 ( (Ci (>1, for k=1).

P3.(i, j: 1(i, j(m, i(j: Ci(Cj=(.

Quorum under Coh(k, m)

A set Q is said to be a quorum under Coh(k, m) if some cohort Ci in Coh(k, m) is Q's primary cohort, and each cohort Cj, j > i, is Q's supporting cohort, where a cohort C is Q's primary cohort if (Q(C(=(C(((k(1) (i.e., Q contains exactly all except k(1 members of C), and a cohort C is Q's supporting cohort if (Q(C(=1 (i.e., Q contains exactly one member of C).

An Example

For example, the following sets are quorums under Coh(2, 2)(({1, 2}, {3, 4, 5}): Q1={3, 4}, Q2={3, 5}, Q3={4, 5}, Q4={1, 3}, Q5={1, 4}, Q6={1, 5}, Q7={2, 3}, Q8={2, 4} and Q9={2, 5}.

Quorums Q1,...,Q3 take {3, 4, 5} as their primary cohort and no supporting cohort is needed, and quorums Q4,...,Q9 take {1, 2} as their primary cohort and {3, 4, 5} as their supporting cohort. It is easy to check that these nine sets constitute a 2-coterie.

Domination

Let C and D be two distinct k-coteries. C is said to dominate D if and only if every quorum in D is a super set of some quorum in C (i.e.,(Q, (Q(: Q(D, Q((C: Q((Q).

Obviously, the dominating one (C) has more chances than the dominated one (D) to have available quorums in an error-prone environment, where a quorum is said to be available if all of its members (nodes) are up.

ND k-coteriesSince an available quorum implies an available entry to CS, we should always concentrate on ND (nondominated) k-coteries that no other k-coterie can dominate.The algorithm using ND k-coteries, for example the proposed algorithm, is a candidate to achieve the highest availability.

The quorum construction procedure

FunctionGet_Quorum(h,k:Integer;(C1,...,Cm):Cohorts Structure):Set;

Var R, S: Set;

Var g: Integer;

g = h; //g: Storing the number of primary cohorts needed

R = (; //R: The set of replying nodes that will be returned

For (i =m,...,2 ) Do

S=Probe(Ci, g);

If (S( = (Ci(((k(1)+(g(1)

Then {R=R(S; g=g(1; If g=0 Then Return R;}

Else If (S(=g Then R=R(S;

EndFor

S=Probe(C1, g); //C1 is the primary cohort of g quorums

R=R(S;

Return R;

End Get_Quorum

Probe(Ci, g)

The function Probe(Ci, g) evoked in Get_Quorum performs the task of requesting all the nodes in set Ci for their exclusive permissions.

A node can only reply to one requesting node at a time to grant its permission. After a network turn around time, Probe(Ci, g) returns a set S of replying nodes of Ci for the following three cases:

Case 1 for Probe(Ci, g) to return

If i>1 and there are more than (Ci(((k(1)+(g(1) replying nodes, the returning set will be a set of (Ci(((k(1)+(g(1) replying nodes.

For this case, Ci can be the primary cohort of one quorum, and be the supporting cohorts of g(1 quorums concurrently.


If i>1 and there are more than g but less than (Ci(((k(1)+(g(1) replying nodes, the returning set will be a set of g replying nodes. (Note that (Ci(((k(1)+(g(1)>g because (Ci( > 2k(2 for k>1, or (Ci( > 1 for k=1.)

For this case, Ci can only be the supporting cohorts of g quorums.


If i=1 and there are more than g replying nodes, the returning set will be a set of g replying nodes. Because (C1(=k, only one node can make C1 the primary cohort of a quorum.

Thus, g replying nodes can make C1 be the primary cohorts of g quorums for this case

Pre-release

It is noted that Probe(Ci, g) will postpone the return if none of the three above-mentioned cases stands.

Furthermore, Probe(Ci, g) will execute the pre-release action before returning R; that is, it will send messages to the nodes in Ci R to release its permissions in advance.

The pre-release action plays an important role in the proposed algorithm. As we will show later, the action can allow more nodes in CS concurrently and is related to the deadlock-free property.

Maekawas Alg. (1983)uses six types of messages:REQUESTLOCKEDFAILEDRELEASEINQUIRERELINQUISH

Differences of Ours and MaekawasOur mechanism does not use FAILED message.Our mechanism uses an extra PRE-RELEASE message.Our mechanism sends INQUIRE message conditionally (instead of insistently) only when there is possibility of deadlock.

Conflict resolution mechanism #1

When a node u wants to enter CS to access h resources, it should invoke Get_Quorum and waits for Get_Quorum to return. The function Get_Quorum then invokes Probe(Ci, g) for i=m, m-1,,m(, where m(=m-1 or m-2 or 1. Function Probe(Ci, g) on behalf of node u probes all the nodes in set Ci for their permissions (locks) by sending timestamped REQUEST messages. The timestamp is a pair (u, s), where u is the node ID of the requester and s is the sequence number of the message. A REQUEST of timestamp (u, s) is assumed to precede (to have higher priority than) another REQUEST of timestamp (u(, s() if (s


On receiving a REQUEST from node u, a node v checks it is currently locked for another REQUEST. If not so, v marks itself locked, set u as the locker, records the number h of resources that u requests, and sends a LOCKED message to u. On the other hand, if v is locked for a REQUEST from another node w (i.e., w is the locker), the REQUEST from node u is inserted into R-QUEUE, a local priority request queue. It is noted that a node maintains two local priority queues: R-QUEUE and P-QUEUE to store REQUEST and PRE-RELEASE messages, respectively. After inserting us REQUEST into R-QUEUE, node v then checks if the conflict condition holds. We said that the conflict condition holds if h1++hx + hw > k for REQUEST messages R1,,Rx in R-QUEUE and P-QUEUE preceding Rw (REQUEST from node w), where h1,,hx, and hw are the numbers of requested resources for request messages R1,,Rx. It is noted that node w and the nodes sending the messages R1,,Rx are called conflicting nodes. If conflict condition holds, node v then sends an INQUIRE message to node w. Note that it is not necessary to send the INQUIRY message if an INQUIRY has already sent to w and w has not yet sent RELINQUISH or RELEASE (we will explain the two messages later).


On receiving a PRE-RELEASE message from node u, node v inserts the message into P-QUEUE. It then marks itself unlocked if R-QUEUE is empty; otherwise, it removes from R-QUEUE the node w, sets w as locker, and sends w a LOCKED message, where w is the node at the front of R-QUEUE.


When node w receives an INQUIRE message from node v, it replies a RELINQUISH message to cancel its lock if it is not in CS.

Otherwise, it replies a RELEASE message, but only after it exits CS. If an INQUIRE message has arrived after w has sent a RELEASE message, it is simply ignored.

Conflict resolution mechanism #5On receiving a RELINQUISH message form w (w must be the locker), node v swaps w with u, sets u as the locker, and sends a LOCKED message to u, where u is the node at the front of R-QUEUE.


As we have shown, Get_Quorum evoked by node u will eventually return a set R, which is the union of h pairwise disjoint cohorts quorums. Node u can enter CS and access h resources if node u has effective permissions of all nodes in R. Node u is said to have an effective permission of node v if u has received LOCKED from v and does not sent corresponding RELINQUISH or RELEASE to v. If u does not have effective permissions of all nodes in R, i.e., u has sent RELINQUISH messages to some nodes v1,..,vi of R, 1(i((R(, then u must wait for LOCKED messages from v1,..,vi to enter CS.


After existing CS, node u should send RELEASE message to all the nodes to which u has sent REQUEST. To be more precise, if node u invokes Get_Quorum and Get_Quorum invokes Probe(Ci, g) for i=m, m-1,,m(, then u should send RELEASE message to all nodes in Cm,,Cm(.


On receiving a RELEASE message from node u, node v removes us PRE-RELEASE message from P-QUEUE if us PRE-RELEASE message is in P-QUEUE. Then, node v marks itself unlocked if R-QUEUE is empty; otherwise, it removes from R-QUEUE the node w, sets w as locker, and sends w a LOCKED message, where w is the node at the front of R-QUEUE.

Comparisions

Algorithm

Message complexity

Quorum reselection

k-concurrency

Fault-Tolerance

Raynals algorithm [10]

between 2(n(1) and 3(n(1)

no

yes

no

The algorithm using k-arbiters [2]

between 3q to (3h+3)q, where q=(k+1)(nk/(k+1) for the (k+1)-cube arbiter and q=

for the uniform k-arbiter

no

yes

no

The algorithm using (h,k)-arbiters [9]

between 3q to (3h+3)q, where q=(k+2(h)(n(k+1(h)/(k+1) for the (k+1)-cube (h,k)-arbiter and q=

for the uniform (h,k)-arbiter

no

yes

no

Jiangs algorithm [5]

between 3h(q and 6e(n, where q is the quorum size of the k-coterie used, and 0 2k(2

no

yes

yes (maybe of the highest availability)

*n stands for the number of nodes, and h stands for the number of requested resources.

_1040976876.unknown

_1040976908.unknown

ConclusionThe proposed algorithm becomes a k-mutual exclusion algorithm for k>h=1, and becomes a mutual exclusion algorithm for k=h=1. It is resilient to node and/or link failures and has constant message cost in the best case. Furthermore, it is a candidate to achieve the highest availability among all the algorithms using k-coteries since the cohorts coterie is ND.It has the k-concurrency property to allow more nodes in CS concurrently.However, its mutual exclusion delay may be long since it probe nodes cohorts by cohorts instead of probing all nodes in a quorum at once.

Documents

Distributed Systems