The Byzantine Generals Problem - Technionwebcourse.cs.technion.ac.il/236357/Winter2010-2011/ho/WCFiles/The... · The Byzantine Generals Problem A paper by: Leslie Lamport, Robert

The Byzantine Generals Problem

A paper by: Leslie Lamport, Robert Shostak, and Marshall Pease.

Summary by: Roman Kaplan.

Every computer system must cope with computer malfunctions, whereas a malfunction does not necessarily mean that the component has stopped working. It can also mean the component will send a conflicting information to different parts of the system. The situation can be expressed abstractly in terms of a group of generals of the Byzantine army camped with their troops around an enemy city. In order to accomplish best achievements all generals must decide upon a common plan of action. However there are traitors among the generals that wish to a failure of accomplishing an agreement. The communication is done using messengers and it is reliable therefore, a traitor can confuse its fellow generals by sending misleading information that will cause all loyal generals to decide upon different plans of action and consequently lead to a system failure. Our objective is to find an algorithm to ensure that all loyal generals will decide upon the same plan of action, meaning they will achieve an agreement. We will show that using oral messages, agreement can be achieved if and only if more than two-thirds of the generals are loyal. The proof will be based on the fact that two loyal generals cannot reach an agreement in the presence of a single traitor. If we will add the assumption that messages cannot be forged, the problem is solvable for any number of generals and possible traitors. Later we will discuss the application of the aforementioned results to reliable computer systems.

1. Introduction

Reliable systems must know how to cope with multiple malfunctioning components. As previously said, these components may send conflicting information to different parts of the system. In order to increase reader's interest in the subject, the problem is abstractly expressed as an army dealing with treacherous generals rather than a CPU failing. We present the problem as the Byzantine army besieging a city, while each devision commanded by its own general. Each of the generals can be a traitor, which has different interests from his loyal fellows. Generals' communication is done only by a messenger and the loyal ones will use it to reach fellows. Generals' communication is done only by a messenger and the loyal ones will use it to reach unanimous plan of action. Possible plans of action are: "retreat", "attack", etc. Traitorous` generals main objective is to prevent the loyal ones from reaching a consensus. The loyal generals are aware of the fact that there might be traitors among them so they must have an algorithm that will ensure they will reach a consensus. We will now define the wanted results from the loyal generals' algorithm. First wanted result is they all must reach an agreement regarding their plan of action. The agreement must be achieved regardless of traitors' actions. Second wanted result is to ensure that all loyal generals will adopt a "good" plan. While it is hard to define what a good plan is, we will try explain the meaning to remove some obscurity: If every one of the loyal generals` opinion is to retreat, after sharing this information with every other general it`s not acceptable for them to reach the decision to attack, meaning the algorithm must use robust technique.

We can rephrase the condition: The decision that the loyal generals will take must not be greatly influenced by the traitors. To prevent a small number of traitors to interfere, the majority method will be used in the algorithm. In the simple form of this method every loyal general will share his opinion using messengers with the rest of the generals and accept a decision regarding his plan of action based on the majority vote among them. The simple form of the majority method may work in some cases, but it is not fail proof and it is based on the assumption that all loyal generals will receive the same values, even from traitorous generals. It is a necessity since a small number of traitors can confuse loyal generals' majority vote if they are almost equally divided between a few different opinions. Hence, we understand that a loyal general cannot use the values he received directly from the generals since traitors can send conflicting values to different generals. This limitation must be handled with care since we may use a different value from the one sent by some general while calculating the majority function, even though he is loyal. Therefore, we reach a conclusion regarding every loyal general: If a specific general is loyal then the value he sent must be used as his value by every other loyal general. Up to this point the problem was presented as a number of generals that need to reach an agreement and we found that the conditions for it to occur can be phrased in terms of how a single general sends his value to the others. We therefore will relate to the problem as a commanding general sending an order to his lieutenants and reaching the following problem:

Byzantine Generals Problem. A commanding general must send an order to his 1n − lieutenant generals such that:

IC1. All loyal lieutenants obey the same order.

IC2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.

The above conditions are called interactive consistency conditions. The first was mentioned several times in different variations, but the second is not trivial. If the commander is loyal, then his decision should be accepted by all the lieutenants and the first rule is derived from the second. If he is a traitor then he will probably send conflicting information to his lieutenants, forcing them to use a non-trivial algorithm to reach an agreement.

2. Impossibility results

In our problem the generals can send only oral messages and later we will extend our discussion to written messages. This model has two types of messages, oral and written. Oral messages are completely under the control of the sender, which means that a traitor can send any message he desires, in contrary to written messages which cannot be forged. The use of oral messages makes the problem harder then it may seem. We will later show that no solution exists to the problem unless more than two-thirds of the generals are loyal. The impossibility result is true only for oral messages, in Section 4 we will discuss the case of written and signed messages for which this result is not true. Our impossibilty result is based on a very simple case where there are three generals (a commander and two lieutenants) and one of them is a traitor. In order to simplify the analysis we will assume two possible messages: "attack" and "retreat". We will show two different scenarious where one lieutenant is loyal in

both, we will call him L1. In the first scenario the commander is loyal but the second lieutenant is a traitor. In the second scenario the commander is the traitor. In both cases L1 will receive conflicting information, making it impossible for him to make the right decision and not violating any of the interactive consistency conditions. We will present each scenario in an independent figure: Figure 1 shows us that L1 has received conflicting messages. In order to satisfy IC2, L1 must obey the order to attack. We will now examine Figure 2, in which the commander is a traitor.

In Figure 2 we see that the commander is a traitor and he is sending conflicting information to the lieutenants. L1 received the same messages as in Figure 1 and he cannot tell what message the commander sent to L2 so this scenario will appear exactly the same for L1. If he will obey the order to attack then IC1 will be violated in Figure 2 and if he will decide to retreat, then IC2 not be satisfied in Figure 1. L1 has no way to distinguish between the different scenarios if the traitor lies consistently. We therefore conclude that no solution exists for three generals that works in the presence of a single traitor. This solution will be used as the basis of proofing that no solution can cope with m traitors while there are fewer than 3m+1 generals. We first assume the existence of such solution, then we will show a reduction from 3m or fewer generals to three generals, finally we will be able to show the existence of a solution for three generals in the presence of a single traitor, which we know is impossible. In order to distinguish between the two different groups of generals

Commander

Lieutenant 1 Lieutenant 2

"“attack

“Commander said ‘retreat’ ”

Figure 2: The commander is a traitor

"“retreat

Commander


"“attack

“Commander said ‘retreat’ ”

Figure 1: Lieutenant 2 is a traitor

we will refer to the larger group (3m or fewer) as Albanian generals and the smaller group (three generals) as Byzantine generals. Let us assume the existence of a solution for the Albanian generals. Now we will simulate with each of the Byzantine generals approximately one-third of the Albanian generals, so that each Byzantine general is simulating at most m Albanian generals. The Byzantine commander will simulate the Albanian commander and at most m-1 Albanian lieutenants. There are m Albanian and one Byzantine traitors so we will choose a single Byzantine general to simulate all of the Albanian traitors. Since there is a solution that satisfies IC1 and IC2 for the Albanian generals problem we received a solution for the Byzantine generals problem. IC1 holds for the Albanian generals while two Byzantine generals are loyal and simulating only loyal Albanian generals who obey the same order, thus, these two Byzantine generals obey the same order and therefore IC1 holds for the Byzantine generals problem. IC2 holds for the Albanian generals so if the Albanian commander is loyal and all loyal Albanian lieutenants will obey the order he sends, then the Byzantine commander and general whose simulating loyal Albanian generals will also obey the same order, therefor IC2 holds.

Up to now we have dealt with the problem of Byzantine generals with the requirement to try and reach an exact agreement, but we will now show that even reaching an approximate agreement has the same difficulty. First we assume that the generals are trying to agree on the approximate time to attack, instead of a precise battle plan. The commander sends an order with desired time of attack and using the interactive consistency conditions we will define new conditions to match the case of reaching an approximate agreement. These conditions will be called approximate interactive consisteny conditions, let us present them:

APIC1. All loyal lieutenants attack within 10 minutes of one another.

APIC2. If the commanding general is loyal, then every loyal lieutenant attacks within 10 minutes of the time given in the commander`s orders.

note: there is no importance to the time at which the orders were sent and received and it is irrelevant, only the attack time given in the order matters.

We will prove that this problem is unsolvable in exactly the same conditions as the Byzantine Generals Problem – if not less than one-third of the generals are traitors. The proof will be by contradiction, first we assume the existence of a solution to the Byzantine Generals Problem with the presence of a single traitor. Then we construct a solution to the original Byzantine Generals Problem and by contradiction the assumed solution does not exist. Let us assume a the existence of a solution to the above problem with three generals and a single traitor. In order to simplify the analysis we assume the commander only wishes to send "attack" or "retreat" orders. He does so by sending a message with an attack time and assigning specific time for both orders. When the commander will send a message with attack time 1:00 it means he orders to attack, when he will send a message with attack time 2:00 it means he wishes to retreat. Each lieutenant is executing the following procedure to obtain his order:

1. After receiving the attack time from the commander, a lieutenant does one of the following:

a. If the time is 1:10 or earlier, then attack.

b. If the time is 1:50 or later, then retreat. c. Otherwise, continue to step 2.

2. Ask the other lieutenant what decision he reached in step 1. a. If the other lieutenant reached a decision, then make the same

decision he did. b. Otherwise, retreat.

Using APIC conditions and the assumed solution we will show the presence of IC conditions in the case of three generals and a single traitor. If the commander is loyal, and by using APIC2, we receive that every loyal lieutenant will obtain the correct order, so IC2 is satisfied. In this case IC1 follows from IC2, so the only thing left is to prove IC1 in case the commander is not loyal. If the commander is not loyal then both lieutenants are loyal and from APIC1 we know they both will reach the same decision. Hence, if one lieutenant will decide to attack at step 1 of the procedure the other cannot decide to retreat in the same step. So either they both decide the same in step 1 or at least one will reach step 2, in which they can decide in sub-step a. or b.. If one lieutenant will make his decision in step 2.a. then his decision will be the same as the second's lieutenant. The other option is that one will reach step 2.b. and decide to retreat, the other will either reach the same step and decide the same or he will use other's lieutenant decision. We then receive that in any case both lieutenants will have the same decision so IC1 holds. Using the assumption regarding the existence of a solution that sustains APIC conditions and the above reduction to the original Byzantine Generals Problem we now receive a solution to the original Byzantine generals problem with three generals in the presence of one traitor which is impossible. Therefore, no solution exist that maintains APIC conditions in similar conditions. It is now possible to use the same method as before to simulate m generals with a single one in order to show that no solution can cope with exactly or more than one-third of the generals being traitors. The proof is the same as with the demand of reaching an exact agreement. We have found that in the presence of 3m+1 generals no solution can cope with m traitors, no matter if we want to reach exact or approximate agreement.

3. A solution with oral messages

Up to now we discussed only the conditions and results that relate to impossibility of solution, now we will show a solution in form of an algorithm executed by the loyal generals and the conditions for it to work. A solution to the Byzantine Generals Problem exist in the presence of at most m traitors when there are at least 3m+1 generals. The algorithm will involve sending message from one general to the others (all or some), hence, we must explain exactly how the messages system work. The messages will be "oral messages" which we introduced in the beginning of Section 2. The following assumptions will define the messages system that will be used:

A1. Every message that is sent is delivered correctly. A2. The receiver of the message knows who sent it. A3. The absence of a message can be detected.

These assumptions are meant to limit the amount of harm a traitor can make. The first assumption will prevent him from disrupting messages of other generals. The second prevents him from sending spurious messages in the behalf of other generals. The third restricts him from sabotaging the army by not sending any information and thus preventing loyal generals from reaching a decision. Although we discuss only generals, our main interest is computer systems and practical implementation of the algorithm, which will be explained in Section 6 in further detail. The algorithms in this section and in the following one will require direct communication link between every two generals. The algorithms in Section 5 will not have this requirement, meaning not every two generals must have direct connection but we will present cummunication paths conditions in order for the algorithms to work correctly. As stated before, generals can detect the absence of message, and since every general must obtain some order sent by the others, even in this case, we define the message "retreat" as the default value.

The solution for the Byzantine generals problem will be in the form of an inductive algorithms called Oral Message algorithm, OM(m), that will solve the problem in the presence of at most m traitors when there at least 3m+1 generals. This algorithm is defined for all nonnegative integers m and will be described in terms of a lieutenant "obtaining a value" rather than "obeying an order" for convenience reasons. We assume the presence of a set of majority functions, such that if a majority of the values

iv equals v , then majority(

1v , 2

v , ..., 1n

v − ) equals v , for

every nonnegative integer n. This property of the majority functions is the only one nessecary for the correctness of the algorithm.

Let us describe the OM(m) algorithm:

Algorithm OM(0):

1) The commander sends his value to every lieutenant. 2) Each lieutenant uses the value he receives from the commander.

Algorithm OM(m) ( 0m > ):

1) The commander sends his value to every lieutenant.

2) Every lieutenant acts as the commander in OM(m-1) to disseminate the value he obtained from the commander to each of the n-2 lieutenants. if no value was received he uses "retreat". At the end of this step each lieutenant has a vector V containing:

a. The value he obtained from the commander. b. The value disseminated by every other lieutenant.

3) Every lieutenant uses the value majority(V ). We will now explain how the algorithm works, first in general terms and later using examples with specific parameters. In the first step of the algorithm the commander sends his order, if he is loyal, or orders (conflicting) if he is a traitor, to every lieutenant. In the next step of the algorithm each lieutenant will send the value he received from the commander to the other lieutenants, acting himeslf as the commander in this xt step. In this way the lieutenants will share the values they obtained with the other lieutenants for m steps.

Let us examine an example of the algorithm's execution in the case

1, 4m n= = . Figure 3 illustrates the situation graphically when Lieutenant 1 is the

traitor. In phase OM(1) the commander sends "attack" to every lieutenant. In the next step each lieutenant will send the value he obtained to the other lieutenants, L2 and L3 will send the value "attack" to the other lieutenants. Since L1 is a traitor, he sends the value "retreat" to L2 and L3. Now L2 and L3 has obtained the values: "retreat", "attack" and "attack" which result in obtaining the value majority("retreat", "attack", "attack")="attack". The commander is loyal and both loyal lieutenants will obey his order, which means that condition IC1 holds and thus IC2. Now we will examine a bit more complicated example in the case

2, 7m n= = . In this case the number of messages sent is substantialy larger than

in the previous case. Figure 4 shows the values sent by the commander to his lieutenants in OM(2). Since there are more than two steps of the algorithms

Figure 3: Execution of OM(1) when Lieutenant 1 is a traitor

Commander


"“attack

Lieutenant 3

"“attack

"“attack

“retreat” “attack”

"“attack "“attack

“retreat”

"“attack

Commander

Lieutenant 2 Lieutenant 3 Lieutenant 4 Lieutenant 5 Lieutenant 6 Lieutenant 1

"“attack "“attack "“attack

“retreat” “retreat” “x" (d.c.)

Figure 4: Execution of OM(2) when the commander and Lieutenant 6 are traitors.

execution, there must be a way to distinguish among the different messages from the different steps. The problem is solved if every lieutenant i prefixes his index number, i, to the value he sends in step 2. With this method every lieutenant knows which of the lieutenants have passed this message forward so he will send it only to those who did not receive it yet. Now let us review the algorithm's execution in the case of Figure 4. Figure 5 presents the the path each message Lieutenant 1 had received by every other lieutenant in the three steps of the algorithm's execution. The values sent in every message are summarized in Figure 6.

In Figure 5, every message is originated at the commander as he begins the execution. In OM(2) the commander sends his order directly to L1, but as the

algorithm's steps progress we receive an increasing number of intermediates for every message. The messages sent in OM(0) has two intermediates, while the index closest to L1 is the index of the lieutenant who actually passed the message to L1. Figure 6 will contain the values sent in every message, but it will have a different structure. The messages will be organized in rows, when every row matches the messages sent to L1 by the last Lieutenant index in Figure 5.

Figure 6: The messages Lieutenant 1 has received in the example presented in Figure 4.

( ‘a’ represents “attack” and ‘r’ represents “retreat”) OM(2): a OM(1): 2r, 3a, 4r, 5a, 6a OM(0): 2{ 3a, 4r, 5a, 6r} 3{ 2r, 4r, 5a, 6a} 4{ 2r, 3a, 5a, 6r} 5{ 2r, 3a, 4r, 5a, 6a} 6{ 2a, 3r, 4a, 5r }

Figure 7: All loyal lieutenants' decisions after obtaining the values in step OM(1).

L1: majority(a, r, a, r, a, a) = “attack” L2: majority(r, r, a, r, a, r) = “retreat” L3: majority(a, r, a, r, a, a) = “attack” L4: majority(r, r, a, r, a, r) = “retreat” L5: majority(a, r, a, r, a, a) = “attack”

All loyal lieutenants do not choose the same action

Let us now look how the algorithm works in this complicated and extreme example. The first message each lieutenant has received from the commander was in step OM(2), for Lieutenant 1 it is shown in the matching row presented in Figure 6. If the commander is loyal then L1 must obey the order, but he does not know whether or not it is true. In step OM(1) L1 receives the orders from the rest of the lieutenants, thus, by knowing there is a maximal number of two traitors and seeing the conflicting values he received he can assume the commander is a traitor (although this is not a part from the algorithm – its purpose is not to find the traitors).

From L2: majority(r, r, r, a) = “retreat” From L3: majority(a, a, a, r) = “attack” From L4: majority(r, r, r, a) = “retreat” From L5: majority(a, a, a, r) = “attack” From L6: majority(a, r, a, r) = “-” (do nothing)

”attack“) = -a, r, a, r, a, (majority :All Lieutenants` final decision

.1as the value of L” attack“ the value obtainedhas 5 -2Lieutenants : Note

Figure 8: The decision Lieutenant 1 has made after obtaining the values in step OM(0) and the decision made by every loyal Lieutenant.

Figure 7 summarizes the decision every loyal lieutenant has made after obtaining the information from all lieutenants in step OM(1). As we can see, all loyal lieutenants did not make the same decision in this step. Step OM(0) will give L1 all the information needed for him to make the right decision to correlate with the rest of the loyal Lieutenants in their decision. After obtaining the values from all lieutenants in step OM(0) L1 can calculate the right decision that needs to be made. Figure 8 shows the value L1 has used for every lieutenant from the values obtained in OM(0). Every Lieutenant has made the same as L1 in this step and has the right value for every lieutenant. Every lieutenant's final decision is also presented in Figure 8, as every value used for the other lieutenants has been obtained in a similar manner as Lieutenant 1 did in Figure 8. In this phase we know that all lieutenants use the same values and the same majority function, resulting in the same final decision.

Now a little about the algorithm`s message complexity. Step OM(m) is the first and executed only by the commander so it is executed only once and sends n-1 messages. Step OM(m-1) will be executed by every lieutenant so it will have

n-1 executions, which will result in sending ( )( )1 2n n− − messages. Every

execution of step OM(m-1) will invoke n-2 executions of OM(m-2) and so on which

will lead to ( )( )( )1 2 3n n n− − − messages to be sent. The method in which each

lieutenant is prefixing his index for every message he sends, results in the fact that algorithm's step OM(m-k) will invoke n-k-1 executions of OM(m-k-1). Therefore, we receive that algorithm OM(m) will invoke the send of

( ) ( ) ( ) ( )1

1 ...i

m

i m

n n m n i O n=

=

− ⋅ ⋅ − = − =∏ messages. For n=3m+1, the algorithm

is exponential in n. Let us now prove the correctness of the algorithm. In order to show correctness for every m we first prove the following lemma:

LEMMA 1: for any m and k, Algorithm OM(m) satisfies IC2 if there are more than 2k+m generals and at most k traitors.

PROOF: Condition IC2 specifies that if the commanding general is loyal then all loyal Lieutenants have to obey the same order, so we know the commander is loyal. The proof will be by induction on m. Since we know every message sent is delivered correctly we can see that OM(0) satisfies the lemma, so it is true for m=0. We will now assume correctness for m-1, m>0, and prove it for m. In the beginning of the algorithm (step OM(m)) the commander sends his value v to his n-1 lieutenants. The next step is that every lieutenant applies OM(m-1) with n-1 generals. We know that there are more than 2k+m generals, so

2 1 2n k m n k m> + ⇒ − > + . By the induction hypothesis, the algorithm is

correct for m-1 so we can conclude that every loyal lieutenant uses the value

jv v= for each lieutenant j. Since 1 ( 1)n k m k− − − − > there is a majority of

loyal lieutenants and we receive that the majority of all the values every loyal lieutenant has for the other lieutenants is v. This value will be calculated in step (3) of the algorithm, proving IC2 and the lemma. �

The following theorem will prove that algorithm OM(m) solves the Byzantine Generals Problem.

THEOREM 1: For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if there are more than 3m genenrals and at most m traitors.

PROOF: By induction on m. If m=0 then there are no traitors, all generals has obtained the same value from the commander and they following it. Now we will assume correctness of OM(m-1) and show that the theorem is true for OM(m), m>0. IC1 splits the prove into two, The first in which the commander is loyal and the second where he is a traitor. In the case that the commander is loyal, we choose k from Lemma 1 to sustain k=m and receive that OM(m) satisfies IC2, which leads to the correctness of IC1 in this case. Now the commander is a traitor among the m traitors, meaning there are at most m-1 traitorous lieutenants. In step (2), each loyal Lieutenant j applies OM(m-1) with n-2 other lieutenants acting as his lieutenatns. Since there are more than 3m lieutenants, there are at least 3m-1 lieutenants and we receive that

( )3 1 3 1 3 1n m n m m> ⇒ − > − > − . Therefore, we can apply the induction

hypothesis to conclude that OM(m-1) satisfies conditions IC1 and IC2. By IC2, if one of the two lieutenants is Lieutenant j, and otherwise from IC1. Hence, for each j, any two loyal lieutenants get the same value for

jv in step (3). Thus, every

two loyal lieutenants use the same values calculated in OM(m-1) for all other lieutenants and therefore obtain the same majority value in step (3), proving IC1.

4. A solution with signed messages

The fact that traitors can lie and manipulate the data they send is what makes the Byzantine Generals Problem harder than is may seem, but if we somehow manage to restrict their ability to lie the problem will be easier. One way to do this is to use unforgeable signed messages as the communication method among the generals. Let us phrase the assumption formaly and add it to our previous assumptions, A1-A3:

A4. (a) A loyal general's signature cannot be forged, and any alteration of the contents of his signed message can be detected.

(b) Anyone can verify the authenticity of a general's signature.

By inspecting the new assumption we see that it refers only to a loyal lieutenant's signature, which means that traitor's signature can be forged. This fact will not influence on the loyal lieutenants because traitors will be able to send fake data among themselves, and their data will still be ignored by the algorithm when making the final decision. The use of signed messages has cancelled our previos proof regarding the nonexistence of a solution to the Byzantine Generals Problem with three generals in the presence of a single traitor, and we will show an algorithm that solves the problem for m traitors and any number of generals. Note that the problem is meaningless if there are less than m+2 generals – a single loyal general can randomly choose any order to follow and still satisfying conditions IC1 and IC2. The proposed algorithm will be similar to OM, but it will use assumption A4 to adjust OM to our new case. The new algorithm will be used for signed messages and will be called SM, it will start like OM when the commander sends his order to his lieutenants but his order is now signed. Every lieutenant receives

the order, adds his signature and passes it forward to the rest of the lieutenants. In general, when a lieutenant receives a signed order he adds his signature and sends it to those whose signature is not included in that order. One may see technichal difficulties in receiving a single unforgeable order and sending it to multiple recipients, but it is not a matter of our concern since we can assume that the commander sends a stack of his signed orders to every lieutenant and the lieutenant simply adds his signature and passes them forward. We can also assume that each lieutenant can copy signed orders (does not matter how), add his signature and distribute them. Algorithm SM assumes the existence of function choise which is applied to a set of orders to obtain a single order. The only requirements we make for this function are:

1. If the set V consists of the single element v, then choise(V) = v. 2. choise(∅ ) = RETREAT, where ∅ is the empty set.

The value RETREAT is an arbitrary default order and can be replaced by any other order.

In the following algorithm, a message contains the value x signed by General i will be denoted by x:i . We let the value x:i signed by General j be denoted by x:i:j. The commander will be General 0, and the rest will be his lieutenants. In this algorithm, every lieutenant will have a set containing the properly signed orders he has received so far. This set will be denoted

iV for

Lieutenant i. This set will contain only the orders that the lieutenant has received and not the messages he has received, since there may be many different messages with the same order. In case the commander is loyal, the set

iV should

not contain more than a single element.

Let us now describe algorithm SM(m) explicitly:

• we initialize iV =∅ .

1) The commander signs and sends his value to every lieutenant. 2) For each i:

a. If Lieutenant i receives a message of the form v:0 from the commander and he has not yet received any order, then:

i. He lets iV equal {v}.

ii. He sends the message v:0:i to every other lieutenant.

b. If lieutenant i receives a message of the form 1

: 0 : :...: kv j j and v

is not in the set iV then:

i. He adds v to iV .

ii. If k<m, then he sends the message 1

: 0 : :...: :kv j j i to every

lieutenant other than 1, ... , kj j .

3) For each i: When Lieutenant i will receive no more messages, he obeys the order choise(

iV ).

Let us now clear a few problematic issues regarding the algorithm. First, we see that every lieutenant ignores any message containing an order, v, that exists in

iV

in step 2) .Second, in step 3) each lieutenant decides which order to obey only

after he will receive no more messages. The way each lieutenant is able to know when he will no longer receive messages can be shown by induction on k, since for each sequence of lieutenants

1, ... ,

kj j when k<m, a lieutenant can receive

one message of the form 1

: 0 : :... : kv j j in step 2). The requirement can be that if

Lieutenant kj will not send such a message he will send a message reporting it. In

this case we can easily decide whether all messages has been received or not. Assumption A3 protects us from the case of a traitor not playing along with the above requirement and decides not to send any message. A different approach is to use time-out for the absence of messages. This will be further discussed in Section 6.

Third, every lieutenant treats only messages of the proper form and throws any message that has illegal structure (valid structure is

1: 0 : : ... :

kv j j ). If we use

the method of sending enough copies of the same message so that the recipient will not have to copy it by himself, then every lieutenant who throws a message with illegal structure actually throws all of its copies. The number of copies from a single message that should exist if it was signed by k lieutenants is:

( )( ) ( )2 3 ... 2n k n k n m− − − − − − .

Figure 9 illustrates algorithm SM(1) in the case of three generals at which the commander is a traitor. The upper half shows the messages sent in step a. of the algorithm, and the lower half the messages sent in step b. The sets

iV

obtained by each lieutenant are: { } { }1 2, , ,V a r V r a= = . Since

1 2V V= and the fact

that they use the same choise function, Lieutenants 1 and 2 will decide to follow the same order (choise(a, r)). In this case, unlike in the examples of OM algorithm, both loyal lieutenants can know who is the traitor since by assumption A4 no message can be forged, and Lieuteants 1 and 2 receive different orders with the same signature.

Figure 10 illustrates SM(2) in case the commander and Lieutenant 3 are traitors. The upper half shows the messages sent by the commander, and although he is a traitor he sent the same values to the loyal lieutenants.The value he sent to the traitor makes no difference. In the lower half we see the next step of the algorithm, which results in the following values obtained by both loyal lieutenants:

{ }1 2,V V a r= = . Leading to the same decision by both loyal lieutenants, since they

will use choise(a, r). In algorithm SM(m), the signature added by each lieutenant to every message he sends has the purpose of acknowledging the receipt of the message. In the special case when the lieutenant is the m-th to add his signature to the order he relays, it is unnecessary to add his signature since the recipient of the message will not pass it forward and he knows who sent the message even without the signature. Hence, the lieutenants do not need to add their signature in the case of SM(1).

Let us now prove the correctness of the algorithm:

THEOREM 2: For any m, Algorithm SM(m) solves the Byzantine Generals Problem if there are at most m traitors.

PROOF: As before, we need to show the fulfillment of IC1 and IC2. Let us start with IC2. Since the commander is loyal he sends the same value, v, with his signature, v:0, to every lieutenant. All loyal lieutenants will obtain the same order, v,, and since traitors cannot forge the commander's signature (or anyone's signature for the matter) the lieutenants will not receive other message with a different order, u, "from the commander" (with his forged signature). Hence, every Llieutenant i will obtain the same set

iV , which consists only from the single

order v, resulting them to make of the same decision using choise function. Since all loyal lieutenants will obey the same order we have proved IC2. We will now prove IC1 when the commander is a traitor (IC2 covers the case he is loyal). In order to prove IC1 we must show that every two loyal lieutenants obtain the same set of orders, V. Let us consider two loyal lieutenants, i and j. They will follow the same order in step 3) if they have obtained the orders sets, meaning

i jV V= . Therefore, in order to prove IC1 it suffices to prove that if i

puts an order v into iV in step 2), then j must put the same order v into

jV .

Lieutenant j will put order v into j

V if he will receive a properly signed message

with the value v. If Lieutenant i received the order v in step 2)a. then he sends it to j in step 2)a.ii. and by A1, j will receive it. If i adds the order to

iV in step 2)b.,

then he must receive a first message of the form 1

: 0 : :... :k

v j j . If j's signature

appears in the message it means he has received it v earlier. If not, we consider two cases:

1. k<m: In this case, i sends the message 1

: 0 : : ...: :k

v j j i to j, which

means j has received the order v. 2. k=m: There are m-1 traitors among the lieutenants (the commander is

the m-th traitor), which means that at least one of Lieutenants 1,..., mj j

is loyal. Since he is loyal he must have sent the value v to Lieutenant j when he first received it, and by A1, j has received the value.

We see that in every case Lieutenants j and i has received v, which shows the fulfillment of IC1 and IC2 by algorithm SM, resulting in the proof of its correctness.

5. MISSING COMMUNICATION PATHS

In previous sections we have assumed a clique topology, meaning there is a communction path between each two generals. This section will discuss the case of missing paths. If every general will be represented by node and a bidirectional communication path by an edge, we will receive a simple, undirectional graph describing the network. Following this small preface of the section we will introduce algorithms OM(m) and SM(m) to more general graphs using the assumption that the network graph is completely connected. Before presenting the more general OM algorithm we first define the following definition in which two generals will be called neighbours if there is a communication path between them:

DEFINITION 1:

a) A set of nodes { }1,..., pi i is said to be a regular set of neighbours of a

node i if: i. each ji is a neighbour of i, and

ii. for any general k different from i, there exist paths ,j k

γ from

ji to k not passing through i such that any two different paths

,j kγ have no node in common other than k.

b) The graph G is said to be p-regular if every node has a regular set of neighbours consisting of p distinct nodes.

We will futher explain the term 'regular set'. In this type of set every path from the neighbours of i to k have no common nodes. Figure 11 presents a graph with a node that has a regular set of size 4. In case we remove a node from a p-regular

graph there are two possible scenarios. The first is when the removed node, n, is a neighbour of i (one of the red nodes in Figure 11), in the worst case of this scenario n is one of the nodes in the regular set of i, after its removal there remain p-1 nodes in the regular set of i. Which will leave the graph (p-1)-regular. The second scenario, n is not one of the neighbours of i, and in the worst case of this scenario n is one of the nodes on a path from a node in the regular set of r, after its removal there remain p-1 nodes in the regular set of i, resulting in (p-1)-regular. We have found that removing a node from a p-regular graph will leave it (p-1)-regualar

Figure 11 shows an example of a 3-regular graph, every node has a regular set of size three. Figure 12 shows an example of a graph that is not 3-regular, since the central node has no regular set of neghbours containing three nodes. Note that a 3-regular graph has at least four nodes, and in general, p-regular graph has at least p+1 nodes. We now extend OM(m) so that it will solve the Byzantine Generals Problem when there are missing communication paths. The condition from the generals network graph G is to be 3m-regular, which , as previously stated, contains at least 3m+1 nodes. For all positive integers m and p, we define the algorithm OM(m, p) as follows when the graph G of generals is p-regular. In case G is not p-regular OM(m, p) is not defined. The definition will be similar to OM(m), by induction on m:

Algorithm OM(m, p):

0) Choose a regular set N of neighbours of the commander consisting of p lieutenants.

1) The commander sends his value to every lieutenant in N.

2) For each i in N, let iv be the value Lieutenant i receives from the

commander, or else RETREAT if he receives no value. Lieutenant i sends

iv to every other lieutenant k as follows:

a. If m=1, then by sending the value along the path ,j kγ whose

existence is guaranteed by part a)ii. of Definition 1. b. If m>1, then by acting as the commander in the algorithm OM(m-1,

p-1), with the graph of generals obtained by removing the original commander from G.

Figure 12: 3-regular graph Figure 13: A graph that is not 3-regular

3) For each k, and each i in N with i k≠ , let iv be the value Lieutenant k

received from Lieutenant i in step 2), or RETREAT if he received no

value. Lieutenant k uses the value majority(1,...,

pi iv v ), where

N={ }1,...,

pi i .

As we have previously seen, removing a single node from a p-regular graph will leave the graph (p-1)-regular, which means we can apply the algorithm OM(m-1, p-1) in step 2)b. on a lawful graph. Let us now prove that OM(m, 3m) solves the Byzantine Generals Problem if there are at most m traitors. The proof is similar to the proof of the algorithm OM(m) and will not be fully presented. It begins with the following extension of Lemma 1:

LEMMA 2: For any m>0 and any 2p k m≥ + , Algorithm OM(m, p) satisfies IC2 if

there are at most k traitors.

PROOF: We first examine the case of m=1. Each lieutenant obtains the value majority(

1,...,

pv v ), while every two values ,

i jv v he received was sent to him

along different paths. Since there are only k traitors and we know that 2 1p k≥ + ,

more than half of the paths connect the lieutenant directly to other loyal lieutenants. The commander is loyal (if not then the proof is trivial), meaning more

than half of the values each lieutenant received in 1,...,

pv v will be equal to the

value sent by the commander, proving IC2. Now assume correctness for m-1, m>1 and prove it for m. Since the commander is loyal, all p lieutenants in N obtain the correct value.

2 2p k m k≥ + > which means that the majority of these lieutenants are loyal. By

induction hypothesis, each of the loyal lieutenants sends the correct value to every loyal lieutenant. Hence, each loyal lieutenant has obtained majority of correct values, resulting in the correct order in step 3). Next we will show a theorem from which the correctness of algorithm OM(m, 3m) is immediate.

THEOREM 3: For any m>0 and any 3p m≥ , Algorithm OM(m, p) solves the

Byzantine Generals Problem if there are at most m traitors.

PROOF: Using Lemma 2 and choosing k=m we receive that OM(m, p) satisfies IC2 and IC1 in case the commander is loyal. It is left to prove IC1 in the case that the commander is a traitor. In order to prove IC1 in this case we need to show

that each loyal lieutenant receives the same set of values i

v in step 3). If m=1,

the commander is the only traitor and all the rest lieutenants are loyal, which

means they will all get same value iv from N. Eventually all loyal lieutenants will

compute the same majority and obtain the same order. If m>1, we know that

( )3 1 3 1p m p m≥ ⇒ − ≥ − . Let us assume that OM(m-1, p-1) satisfies Theorem

3, and prove correctness for OM(m, p). We know that every lieutenant gets the value each lieutenant in N sent. Therefore, every loyal lieutenant will have the same input for the same decision function resulting in every lieutenant arriving the

same decision. �

We require 3m-regularity from graph G which is a strong connectivity hypothesis. In fact, if there are 3m+1 generals the graph G is a complete graph and Algorithm OM(m, 3m) is reduced to Algorithm OM(m). The weakest connectivity hypothesis for which the Byzantine Generals Problem is solvable is that the sub-graph formed by the loyal generals is connected, since a traitor serving as the only intermediate between two sub-graphs of G can easily block the messages and simply disjoint the sub-graphs. Under the above assumption we will show that SM(n-2) is a solution to the Byzantine Generals Problem where n is the number of generals, regardless the number of traitors. A few modifications must be made to match the new problem's restrictions, in which the generals cannot send messages directly every other general. We will define that the commander sends his signed order only to his neighbouring lieutenants, and each lieutenant, in step 2)b., will send his message to every neighboring lieutenant that is not among the

rj .

In order to continue with the presention of the more general solution we first define the diameter of graph, which is the smallest number d, such that any two nodes are connected by a path containing at most d arcs. If the graph is not connected, the diameter is ∞ .

THEOREM 4: For any m and d, if there are at most m traitors and the subgraph of loyal generals has diameter d, then Algorithm SM(m+d-1) (with the above modification) solves the Byzantine Generals Problem.

PROOF: We first consider the case of loyal commander. In this case, when the commander sends his message it will go thorugh at most d-1 loyal lieutenants and will finally reach every loyal lieutenant. Since all the lietenants in the message's path were loyal it will be relayed correctly in every step, and by relying on assumption A4 we know that the traitor cannot forge a different order. This proves IC2. In order to prove IC1 we need to prove the case of traitorous commander. We need to show that any order received by Lieutenant i is also received by Lieutenant j (both loyal). Suppose i receives an order

1: 0 : :...:

kv j j , not sigend by j.

If k<m then i will send the message to every neighbour that has not yet received the order and it will be relayed to j within d-1 steps. If k m≥ then one of the first m signers must be loyal (there are only m-1 traitorous lieutenants) and send it to all

of his neighbours, which will eventually relay it to j within d-1 steps at most. �

COROLLARY: If the graph of loyal generals is connected, then SM(n-2) (as modified above) solves the Byzantine Generals Problem for n generals.

PROOF: Let d be the diameter of the graph of loyal generals, when d V≤ . We

also know that there are at most d loyal generals so there are fewer than n-d traitors. If we choose m to be 1m n d= − − we can derive the following result

1 2 1m n d n m d= − − ⇒ − = + − and according Theorem 4 we have proved the

corollary.

We have assumed that the loyal generals graph is connected, but using Theorem 4 we can show that even if the graph is not connected and there are at most m traitors, then SM(m+d-1) has the following properties:

1) Any two loyal lieutenants connected by a path of length at most d passing through only loyal lieutenants will obey the same order.

2) If the commander is loyal, then any loyal lieutenant connected to him by path of length m+d passing only through loyal lieutenants will obey the same order.

6. RELIABLE SYSTEMS

Up to this point we have mostly discussed the theoretical aspects of the Byzantine Generals Problem, in this section we will explain the practical implementions of the algorithms presented in previous sections. The main purpose of this paper is to promote the notion of reliable systems. Nowadays reliability in computer systems is mainly achieved with redundancy. By using a few different replicas of the system's processor to compute the same output and taking their majority vote we are able to obtain a single value. Our reference to the system's processor is not necessarilly to a single chip, it may very well be a whole data-base system with multiple computers and processing units. The way we choose to calculate the output does not matter, it can be calculated whitin the system or by the recipient of the raw data himself. Majority's vote among the different processors assums that every nonfaulty processor will produce the same output, which is true as long as they use the same input. However, since input may come from different components it is enough that a fracture of them, or even a single one, will have a malfunction and the input will not be the same for every processor, resulting in different outputs. Surely it is not the only way to get conflicting input. In case of lack of synchronization, when processors read a value while it is changing, can lead to different inputs since the reads were not in the exact same time. This problem can be solved simply by synchronization, with the mechanism of 'mutual exclusion' (mutex, semaphore etc.). We receive that in order for majority voting to yield in a reliable system, the following two conditions must be satisfied:

1) All nonfaulty processors must use the same input value (so they produce the same output).

2) If the input unit is nonfaulty, then all nonfaulty processors use the value it provides as input (so they produce the correct output).

Since the "lieutenants" are processors, the "commander" is the input unit and "loyal" means nonfaulty we can see that the above conditions are simply the 'Interactive consistency' conditions. Here rises a different question, but similar to the one we have discussed so far in this section: "How can we ensure that all processors receive the same input?". We can try to connect all processors to the same wire, but a faulty input unit can send border-line input value that can be interpreted as '0' by some processors and as '1' by others. Since the input unit is also a system of some type, the subject of reliability comes once again. If the input unit is faulty then its output will be meaningless and all that is left for the Byzantine Generals solution is to ensure that all processors will use the same input. In case the input is important and might be faulty, it can be replicated using redundancy. Although redundancy and reliability are connected, it is still not enough to use many copies of the same data in order to obtain reliability. We still need to insure that the nonfaulty processors will use the redundant data to produce the same output. Let us now handle the case of a nonfaulty input unit, but the processors obtain different input values since they sample them at different times and the value is constantly changing. Despite the incoherency and lack of synchronization in obtainig the input we still want all processors to use the same input values. In case

the functions majority or choise are taken to be median functions, we will receive that the input value will belong to the domain of values obtained from the input unit. As long as the values produced by the input unit will be in reasonable range then all nonfaulty processors will get reasonable values. All of the above solutions were mostly in terms of Byzantine generals rather than in terms of computing systems. We will now examine the application of these solutions to reliable computer systems. The problem is not the implementation of "general's" algorithm with a processor, it is the implementation of a message passing system that meets assumptions A1-A3 for the Oral Messages algorithm and A1-A4 for Signed Messages algorithm. We will refer to every assumption in order:

A1. This assumption states that every message sent by a nonfaulty processor is delivered correctly. In real systems, communication lines can fail. The oral messages algorithms with parameter m will work correctly in the presence of at most m faulty processors. Therefore, if we assume that the failure of a communication line from a single processor is the failure of the processor (meaning he is a "traitor") then the algorithms will work correctly and the problems are equivalent. Now we examine the case for the signed messages algorithms. If we assume that a faulty processor cannot forge a signed message (which is basically assumption A4), which we will see that is very reasonable, then a communication line failure will result only in reducing the connectivity of the graph of generals and algorithm SM(m) will still work correctly and Theorem 4 remains valid. This is true since a nonfaulty processor will simply ignore messages that are illegal due to communication line failure, which is equivalent to a missing communication path in the Byzantine Generals Problem. A2. According to this assumption each receiver of a message knows who sent it. But most importantly is that a faulty processor will not disguise himself as other nonfaulty processor. The consequence of this demand is that all interprocessor communication will be done by fixed lines rather than a message switching network, mainly because the use of switching system network will reopen the Byzantine Generals Problem as we must consider faulty communication nodes. In the case of signed messages, assumption A2 is not necessary since A4 prevents a processor from successfully forging different processor's message and thus impersonating this processor. A3. The requirement that the absence of a message will be detected is covered with this assumption. Since the absence of a message can only be detected if it is failed to arrive whithin some fixed length of time, we must use time-out convention to answer the demand in A3. The use of time-out to satisfy A3 requires two assumptions:

1. There is a fixed maximum time needed for the generation and transmission of a message.

2. The sender and receiver have clocks that are synchronized to whitin some maximum error.

The first assumption is quite basic since it is a rephrasing of the previous conclusion that each dedicated recipient of a message must know the maximal waiting time. However, the second assumption is not trivial, but it is crucial in order to solve the Byzantine Generals Problem (this or an equivalent one). Let us assume an algorithm in which the generals take action only in the following circumstances:

•••• At some fixed initial time (the same for all generals). •••• Upon the receipt of a message.

•••• When a randomly chosen length of time has elapsed. (I.e., a general can set a timer to a random value and act when the timer goes off.)

From these three circumstances it is possible to construct almost any algorithm which does not allow the construction of synchronized clocks. This type of clock cannot be used because there is no lower bound as well as upper bound on transmission delays. Moreover, the Byzantine Generals Problem cannot be solved even if we place only an upper bound on message transmission delay and no lower bound. In addition, no solution exists even if the traitors are simply fail to send messages instead of sending false messages. The proof of these claims is beyond the scope of this paper. We will now show more formally how the two above assumptions serve us in the detection of unsent messages. Let µ be the maximum message generation and

transmission delay (the first assumption), and assume that τ is the maximal difference between the clocks of all nonfaulty processors (we will shortly discuss in the next section on this matter). If a message begins to be generated on a nonfaulty processor on time T on its clock, then the receiver of the message will receive it after it was generated and sent, and it must consider the possible difference between all clocks. Therefore, the recipient will receive the message no later than time T µ τ+ + on his clock. Hence, we may assume that if the receiver did not

receive the message by that time then it was not sent. If it arrives later the receiver may assume the sender is faulty and the correctness of the algorithm does not depend upon the message being sent. By using a fixed time at which the input processor sends its value, it is easy for each processor to derive until what time, on his clock, each processor must wait for each message. If we take the signed messages case for example, and assume that the commander starts the execution

at time 0

T at his own clock. Every processor that waits for a message with k

signatures will have to wait for k processors (the commander included) to generate and send their messages, which means he will receive the message no later than

time ( )0T k µ τ+ + on his clock.

Now we will refer to the subject of clock synchronization. Due to physical differences, no two clocks can run at exactly the same rate, even if they are most accurately synchronized. The clocks will eventually drift arbitrarily far apart and must be constantly resynchronized in order to maintain a maximal time difference, even faulty's processors clock. This problem is as difficult as the Byzantine Generals Problem itself, and it's solutions are closely related to our Byzantine Generals solutions.

A4. According to this assumption, each processor should be able to sign its messages in such a way that a nonfaulty's processor signature cannot be forged. When the subject of matter was generals we referred to the signature as a physical object and its forgery is impossible under the assumed circumstances. But now we are interested in processors and computer systems, and in this case the signature is

a piece of redundant information ( )iS M generated by process i from data item M.

A signed message by i consists of a pair ( )( ),i

M S M . We will rephrase parts (a)

and (b) of A4 in terms of computer systems to be more relevant the subject of discussion:

(a) If processor i is nonfaulty, then no faulty processor can generate

( )iS M .

(b) Given M and X, any process can determine if X equals ( )iS M .

Since the signature is just a sequence of bits and since every processor can generate any data item, we can never ensure that a faulty processor will not forge the signature, which means that property (a) can never be guaranteed. However, we can control the probabilty of a faulty processor violating property (a) and make it as small as we wish, thus reaching higher levels of reliability. The way to do this depends on upon the type of faults we expect to encounter. There are two cases of interest:

1. Random Malfunction: If we want to make as low as possible the probability that a random malfunction in a processor generates a correct signature we must make the signature a randomizing one. The goal is that the probability of forging a signature will be equal to the probabilty of reaching the same string of bits by random choise. Let us describe one method of doing so. Assume that the messages are encoded as integers less than P, where P is a power of two. We choose a random odd number,

iK , smaller than P and take ( )i

S M to be equal ( )* modi

M K P .

Letting 1

iK

− be the unique number less than P such that

( )1* 1 mod

i iK K P

− ≡ (this is the reason we have chosen i

K to be odd)

and we receive that a process can check if ( )iX S M= by testing that

( )1* mod

iM X K P

−≡ . So the probability that a processor will guess

1

iK

− right, if it does not have the number in memory, equals to the

probabilty of it generating the correct signature *i

M K for a single

(nonzero) message M using random choise. This probability equals 1

P

(choosing a random number among P possibilities). However, if i

K can

be obtained by a simple procedure, then the probabilty of forging a message can be larger since a certain processor j can confuse between

iK and

jK and compute ( )j

S M which will accidently result in

computing ( )iS M .

2. Malicious Intelligence: If the processor considered "faulty" is simply a nonfaulty processor which deliberately tries to harm the system using human intelligence or by any other malicious means, then the construction

of the signature i

S becomes a cryptographic problem.

We note that it is easy to generate the signature ( )iS M once the processor

received a message containing it. Therefore, it is important that the same message will never be signed twice. Which means that, when using SM(m) repeatedly to distribute a sequence of values, sequence numbers should be appended to the values to guarantee uniqueness.

7. CONCLUSION

This paper has discussed the ways to deal with a number of faulty components in a system under various assumptions. In order to arouse more interest in the subject, it was translated to generals in field of combat intead of electronic components in an electrical circuit and was called the Byzantine Generals Probelm. Several solutions to the problem were presented, each under different hypotheses, and we have seen how they contribute to the implementation of reliable systems. These solutions are expensive in both the amount of time and the number of messages required. Both algorithms OM(m) and SM(m) require message paths of length up to m+1, which means that each lieutenant will may have to wait for a message to be relayed via m other lieutenants. Fischer and Lynch have shown that any solution that can cope with m traitors must require paths of length m+1, which means our solution is optimal in that respect. In case the graph is not completely connected and its diameter is d, our algorithms require message paths of up to m+d. This may also be optimal result. The number of messages sent by algorithms OM(m) and SM(m) is

( )1

1

i

m

n i=

+

−∏ , but can be reduced by combining messages. The amount of information

transferred can also be reduced, though it has not been studied in detail. However, a large number of messages sent is still expected. Achieving reliability when arbitrary malfunctions may occur is difficult and expensive. We can reduce the cost by making assumtions about the type of failure that may occur, such as assuming that a computer may fail to respond but will never respond incorrectly. However, when extremely high reliability is required, we cannot endure making such assumptions and the full expense of the Byzantine Generals solution is required.

Documents

The Byzantine Generals Problem - Technionwebcourse.cs.technion.ac.il/236357/Winter2010-2011/ho/WCFiles/The... · The Byzantine Generals Problem A paper by: Leslie Lamport, Robert