Agreement under faulty interfaces

ELSEVIER Information Processing Letters 65 ( 1998) 125-129

Information Processing Letters

Agreement under faulty interfaces Pallab Dasgupta *

Department of Computer Science and Engineering, Indian Institute of Technology, 721302 Kharagpur, India

Received 23 September 1997; revised 10 November 1997 Communicated by T. Asano

Abstract

In this paper we study the problem of achieving Byzantine agreement among a set of processors, where the processors are computationally sound but their interfaces with the communication channels may be faulty. We consider three types of fault, namely message corruption, message loss, and spurious message generation. We present the following results for this model: (i) If all three types of faults are present then the problem is equivalent to the classical Byzantine generals problem. (ii) In the cases where only message corruption can occur, agreement becomes trivial and can be achieved in one round. (iii) If spurious message generation is ruled out, that is, when interfaces may fault only when sensitized, agreement

is possible irrespective of the ratio of the number of processors having faulty interfaces with the total number of processors. @ 1998 Published by Elsevier Science B.V.

Keywords: Algorithms; Byzantine agreement; Byzantine generals problem; Distributed systems; Message corruption; Message loss;

Spurious message generation

1. Introduction

The Byzantine generals problem [ 51 is one of the fundamental problems for reaching a mutual agreement in a distributed system. The problem is defined

as follows: There is a set of generals (processors) camped out- side an enemy city. The generals are located at ge- ographically distant positions from each other and can communicate through reliable messengers (net-

work channels). The generals must come to a common decision on

whether to attack or to retreat. Some of the generals are traitors (faulty processors) and may communicate arbitrary decisions to

different generals.

’ Email: [email protected].

Several protocols are known for solving the problem in its core form and also for solving different variants

of the problem [ I-61. In this paper we look at a variant of the Byzantine

agreement problem, where the processors represent- ing the traitors are sound, but their interfaces with the communication network may be faulty, causing them to send erroneous messages through the communication network. Our model may be briefly described as follows: l Each processor has one or more interfaces to the

network. These interfaces are analogous to network interface cards of computers.

l In order to communicate a message, the source processor passes the message to the appropriate network interface of the source processor. The destination processor receives the message from the

0020-0190/98/$19.00 @ 1998 Published by Elsevier Science B.V. All rights reserved.

PII SOO20-0190( 97) 00202-o

126 I! Dasgupta/Information Processing Letters 65 (1998) 125-129

appropriate network interface of the destination processor. A network channel receives a message from the network interface of the sender processor and delivers it to the network interface of the destination processor. As in the original Byzantine generals prob-

lem, we assume that all network channels are reliable. One or more interfaces of a processor may be faulty. An interface faults if it receives a message (say 1) from it’s processor and sends a different message (say 0) through the channel, or loses the message and sends none. It also faults if it receives some message (or no message) from the channel and reports a different message to it’s proces-

sor. The processors are themselves reliable. A processor with one or more faulty interfaces is called a

traitor. We categorize the types of faults that are possible in this model as follows:

m/m’fault: The interface receives a message m (ei- ther from the channel or from the processor) and communicates a different message m’ to the other side.

m/qb fault: The interface receives a message m (ei- ther from the channel or from the processor) and loses

it. 4/m’fault: The interface generates a spurious mes-

sage m’ without receiving any message. In this paper, we analyze the Byzantine agreement

problem under these types of faults and present the

following results. * k denotes the number of faulty processors.

If all three types of faults are possible, then the agreement problem reduces to the classical Byzan- tine generals problem, and therefore at least 3k + 1 processors are required for agreement. This result is fairly straightforward. Curiously, if only m/m’ faults are possible, then the

agreement problem becomes trivial. We present a protocol which achieves agreement in one round. This shows that the loss of messages, or the generation of spurious messages causes the main difficulty in agreement.

2Throughout this paper we assume that interprocessor communication is synchronous. All protocols presented in this paper are

synchronous.

l If only rn~mt and m/q5 faults are possible, then agreement is possible irrespective of the ratio of processors having faulty interfaces with the total num-

ber of processors. We present a protocol for this model which achieves agreement in k + 1 rounds. The practicality of this model lies in the fact that

often network interfaces fault only when they are sensitized, and therefore may not generate messages on their own.

The paper is organized as follows. In Section 2 we observe that the problem of agreement under all three types of interface faults reduces to the classical Byzan- tine generals problem. Section 3 shows that in the absence of m/4 and 4/m’ faults, the agreement problem becomes trivial. In Section 4 we study the problem of

agreement under m/m’ and m/r& faults and present an agreement protocol for the model.

2. Agreement under m/m', m/4 and 4/m faults

It is fairly easy to see that if m/m’, m/4 and 4/m’ faults are all possible then the agreement problem becomes equivalent to the original Byzantine generals problem. Let us examine the ways in which a processor may fault in the original Byzantine problem, and observe possible equivalent situations in our model:

(1)

(2)

(3)

A traitor receives a message and communicates some other message. A similar situation can oc-

cur in our model if, say, the interface used by the traitor while communicating the message has a m/m’ fault.

A traitor receives a message and communicates nothing. This may happen if the interface through which the traitor intended to send has a m /4 fault. A traitor receives no message in a round, but communicates a message to some other processor. This is possible if, say, an interface of the traitor has a 4/m’ fault.

Through these observations it is easy to see that agreement under m/m’, m/4, and 4/m’ faults is as difficult as the original Byzantine agreement problem. The re- verse, that all faults possible in our model is covered by the original model is easier to see, since in the original model, the processor itself can be faulty. We do not elaborate this any further, but conclude with the following theorem whose proof is now obvious.

?? Dasgupta/lnfortnation Processing Letters 65 (1998) 125-129 127

Theorem 1. If m/m’, m/d and qS/m’ faults are all are correct, the initiator attempts to send a message to possible, then agreement is possible in k + 1 rounds every other processor. Since m/qb faults are ruled out, among at least 3k f 1 processors, where k denotes the each of these other processors receive some message number of processors with one or more faulty inter- (could be a corrupted message) from the initiator, and faces. decide to attack. 0

Proof. Follows from the equivalence with the origi-

nal Byzantine agreement problem. Agreement can be reached in k + 1 rounds using the oral message protocol of Lamport et al. [ 51. 0

3. The importance of m/4 and 4/m’ faults

The above result shows that if the processors are themselves correct, then the main difficulty in achieving agreement is in the presence of m/4 and 4/m’

faults. It may be interesting to also observe that in the absence of qblm’ and rnlqb faults, the processors with faulty interfaces also reach the same consen- sus.

In this section we observe that in our model, the agreement problem becomes trivial if we rule out m/4

and 4/m’ faults. The absence of these two types of

faults implies that whenever a processor A attempts to send a message m to a processor B, the processor B is certain to receive some message m’, where m’ may be the same as m or may be a corrupted ver- sion of m in case a m/m’ fault has occurred in A’s

or B’s interface. This feature allows us to develop the following protocol, which achieves agreement in one round.

4. Agreement under m/m’ and m/c$ faults

Protocol for m/m’-only model. 1. One general (the initiator) decides whether to at-

tack or retreat. 1.1. If the decision is to retreat, the general remains

silent.

In this section, we study the agreement problem un-

der m/m’ and m/q5 faults, that is, we consider cases where 4/m’ faults are not possible. We feel that this

model is worth investigating, since often network interfaces fault only when sensitized, that is, when an attempt is made to send messages through them. We present a protocol which achieves agreement in at most k + 1 rounds, where k denotes the number of processors with faulty interfaces. In our protocol, the decision to retreat is modeled by silence and the decision to

attack is communicated by sending a message. Early stopping conditions are also incorporated. The protocol among n generals is recursively described by the

following algorithm.

1.2. If the decision is to attack, the general sends a message to all other generals.

2. If a general (other than the initiator) receives any message in the first round it decides to attack, oth-

erwise it decides to retreat.

Algorithm M(0, n).

Theorem 2. The protocol for the m/m’-only model

achieves agreement in one round.

Proof. Suppose the initiator decides to retreat. Since the processors (including the initiator) are correct, the initiator does not attempt to send any message to

any other processor. Since 4/m’ problems are ruled out, none of the other processors receive any message from the initiator and therefore all of them decide to retreat. Now, let us assume that the initiator decides to attack. Since the processors (including the initiator)

1. One general (we call him the commander) communicates a message to every other general if it has decided to attack. Otherwise it remains silent.

2. Each of the other generals, Gi, act as follows. If Gi has already decided, then it ignores all messages. If Gi has not yet decided, then it decides to attack if

it receives any message from the commander, and decides to retreat otherwise.

Algorithm M(k, n), k > 0.

1. One general (we call him the commander) communicates a message to every other general if it has decided to attack. Otherwise it remains silent.

2. Each of the other generals, Gi, act as follows. If G; has already decided, then it ignores all messages. If

128 P Dasgupta/lnfortnation Processing Letters 65 (1998) 125-129

Gi has not yet decided, then it decides to attack if Lemma 5. If the first processor with all correct in-

it receives any message from the commander, and tegaces to reach a decision to attack reaches this de-

remains undecided otherwise. General G; now acts cision in the round j, where j 6 k, then by the end

as the commander in Algorithm M( k - 1, II - 1) of round j + 1, all processors with correct interfaces

among the other n - 2 generals. agree to attack,

As in Lamport’s algorithm [5], the protocol starts when the initiator takes a decision on whether to attack or retreat, and initiates the protocol by act- ing as the commander in Algorithm M( k, n). The following results establish that in the presence of m/m’ and m/q5 faults only, Algorithm M(k, n) achieves Byzantine agreement in a cluster of n

processors, among which at most k processors have faulty interfaces. In other words, Byzantine agreement is possible in this model irrespective of the fraction of processors that have faulty inter-

faces.

Proof. As soon as a processor with correct interfaces receives a message m, it decides to attack, and in the next round it communicates messages to each processor which is not in the sender-set of m. If P is the first processor with correct interfaces to receive a message m (and decide to attack), then obviously none of the processors in the sender-set of m have all correct interfaces. Therefore, in the next round P sends messages to all processors with correct interfaces, and each of them decide to attack. 0

Lemma 3. If the initiator of Algorithm M( k, n) de-

cides to retreat, then all other processors (including those having faulty intelfaces) agree to retreat in

round k + 1.

Lemma 6. Ifno processor with all correct interfaces

reach a decision to attack by round k, then each pro-

cessor with all correct inte$aces will decide to retreat

inroundk+ 1.

Proof. If the initiator decides to retreat, then it sends no message in the Algorithm M( k, n). Since 4/m’

faults are ruled out, none of the processors receive any

message, and therefore send none. As a result, in round k + 1 (when M( 0, n - k) is executed), all processors (including the ones with faulty interfaces) decide to

retreat. q

In the proposed algorithm, except for the initiator, a processor sends out messages only if it receives a message in the previous round. Thus except for the messages sent out by the initiator, each message sent out by a processor is causally preceded by the receipt of some message by that processor.

Proof. We will show that if none of the processors with all correct interfaces receive a message (and decide to attack) by round k, then none of them can receive a message in round k + 1, and therefore all of

them decide to retreat. The sender-set of a message received in round k + 1 has k + 1 processors, at least one of which must have all correct interfaces. That

processor must have received a message by round k.

This is a contradiction since we are given that none of the processors with all correct interfaces have received a message by round k. 0

Theorem 7. If only m/m’ and m/4 faults are possible, then it is possible to reach Byzantine agree-

ment in a cluster of n processors of which at most

k are faulty, irrespective of the ratio of k and n.

Agreement can be reached in at most k + 1 rounds.

Definition 4 (Causal precedence). If a processor sends out a message m’ on receiving a message m,

then we say that ml is causally preceded by m, and denote the relation by m + m’. We further say that the causal precedence is transitive, and call all messages which causally precede message m as ancestors of m. We call the set of processors constituting the sender of 17~ and the senders of all it’s ancestors the sender-set of m.

Proof. We will show that Algorithm M( k, n) achieves this agreement. If n - k < 1, the proof is obvious. Otherwise, let us first consider the cases where the initiator decides to retreat. Then by Lemma 3, all processors agree to retreat in round k + 1. Now let us consider the cases where the initiator decides to attack. Two cases are possible depending on whether the interfaces of the initiator are all correct or not. We treat each of these cases separately.

P. Dasgul,ta/lnfbnnation Processing Letters 65 (1998) 125-129 129

Case 1: Initiator is correct. If all the interfaces of the initiator are correct and the initiator decides to attack, then it successfully sends a message to all other processors in the first round. As a result all processors with correct interfaces receive the message and decide to attack. Thus all loyal generals agree to attack in the first round.

Case 2: Initiator has faulty interfaces. If the initiator has one or more faulty interfaces and the initiator

decides to attack, then it may succeed in sending messages to some and none to others. In this case we need to prove that by the end of the last round (that

is, round k -t 1 ), processors with all correct interfaces reach a common decision. By Lemma 5, if a processor with all correct interfaces receives a message by

round j (j < k), then by the end of round j + 1, processors with all correct interfaces reach a common

decision to attack. On the other hand, by Lemma 6, if no processor with all correct interfaces receive any message by round k, then processors with all correct interfaces reach a common decision to retreat. There-

fore, even if the initiator has one or more faulty interfaces, processors with all correct interfaces reach a common decision. 0

References

111

121

L31

141

[51

[61

[71

D. Dolev, The Byzantine generals strike again, .I. Algorithms

3 (1982) 14-30.

D. Dolev et al., An efficient algorithm for Byzantine

agreement without authentication, Inform. and Control 3

(1983) 257-274.

D. Dolev, H.R. Strong, Authenticated algorithms for

Byzantine agreement, SIAM J. Comput. 12 (4) ( 1983) 656-

666.

D. Dolev, R. Reischuk, H.R. Strong, Early stopping in

Byzantine agreement, J. ACM 37 (4) (1990) 720-741.

L. Lamport, R. Shostak, M. Pease, The Byzantine generals

problem, ACM Trans. Programming Languages Systems 4

(30 ( 1982) 382-401.

L. Lamport, The weak Byzantine generals problem, J. ACM

30 (4) (1983) 668-676.

M. Pease, R. Shostak, L. Lamport, Reaching agreement in

the presence of faults, J. ACM (April 1980).

Documents

Agreement under faulty interfaces