46
Roberto Baldoni Università di Roma “La Sapienza” Retirement Seminar for Professor Santosh Shrivastava 8 th of September 2011, Newcastle, UK The Price of Mastering Churn in Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system” 1

Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Embed Size (px)

DESCRIPTION

A new challenge is emerging due to the adventof new classes of applications and technologies such as smart environments, sensor networks, mobile systems, peertopeer systems, cloud computing etc. In these settings, the underlying distributed systems cannot be fully managed but it needs some degree of self-management that depends on the specific application domain. However, it is possibleto delineate some common consequences of the presence of such self management: first, there is no entity that can always ensure the validity of the system assumptions during the entire computation and, second, no one knowsaccurately who joins and who leaves the system at anytime introducing a kind of unpredictability in the systemcomposition (this phenomenon of arrival and departureof processes in a system is also known as churn).As a consequence, distributed computing abstractions have to deal not only with asynchrony and failures, but also with this dynamic dimension where a process that does not crash can leave the system at any time implying that membership can fully change several times during the samecomputation. Hence, the abstractions for reliable distributed compiuting have to be reconsideredto take into account this new “adversary” setting. This selfdefinedand continuously evolving distributed system, thatwe will name in the following dynamic distributed system,makes abstractions more difficult to understand and masterthan in distributed systems where the set of processes isfixed and known by all participants. The churn notionbecomes thus a system parameter whose aim is to maketractable systems having their composition evolving alongthe time.The presentation analyzes the issues in building a regula register in an environment that considers crashs and byzantine failures. This presentation has been delivered during the Retirement Seminar for Professor Santosh Shrivastava that took place in Newcastle (UK) on september 2011.

Citation preview

Page 1: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Roberto BaldoniUniversità di Roma “La Sapienza”

Retirement Seminar for Professor Santosh Shrivastava8th of September 2011, Newcastle, UK

The Price of Mastering Churn in Distributed Systems

Roberto Baldoni, “The price of mastering churn in a distributed system”

1

Page 2: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Santosh reminds me … a set of acronims

MIDAS (2001)

EUCOSM (2003) LUCID (2004) MAGNET (2005) VIRTUE (2007) SEGOVIA (2009)

Roberto Baldoni, “The price of mastering churn in a distributed system”

2

Large and promising IP rejected --too many Chinese!

FET IP - Very strong consortium - rejected reason «very nice projects, however it wants to provide a real

software platfom for pooling together on-demand resources in a multi-tenant

environment resistant to byzantine attack…. in FET program we do not

fund engineering work»

Just below the bar!

Page 3: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Outline Dynamic Distributed Systems System Model with Churn Regular Registers Other interesting Abstractions Conclusion

Roberto Baldoni, “The price of mastering churn in a distributed system”

3

Page 4: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Advent of Complex Distributed Applications

Peer-to-peer Sensor Networks Mobile networks Cloud computing federations Internet supercomputing Smart environments

Roberto Baldoni, “The price of mastering churn in a distributed system”

4

Page 5: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Managed vs. Unmanaged distributed applications (i)

Managed Distributed Application Existence of a manager that can control the

entities comprising or running the application The manager guarantees a suitable environment

for a duration of time sufficient for a distributed system to behave correctly wrt its system model assumptions, e.g., Providing needed/sufficient/appropriate entities to enable

correct behavior of the application (global application view)

Providing operational guarantees of QoS and the necessary degree of synchrony in the underlying distributed platformRoberto Baldoni, “The price of mastering churn in a distributed

system”5

Page 6: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Managed Distributed Applications: Consequences

Main characteristics: a predefined setting, i.e., The application knows, directly or indirectly, the set of processes

that will participate in the computation The application knows if it can exploit synchrony assumptions

The system can be carefully and "centrally" configured through an appropriate tuning phase in order to get the best performance

The application cycle is: Design, deployment optimization, configuration, final deployment, operation

Managed Distributed Applications run on the top of a Distributed System that is piecewise static wrt time

N entitiesN-1

entitiesN-2 N+3 time

Roberto Baldoni, “The price of mastering churn in a distributed system”

6

Page 7: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Managed vs. Unmanaged distributed applications (ii)

Unmanaged Distributed Applications No assumption of a manager or access to

equivalent management facilities Each process autonomously decides to locally run

a component of a distributed application when (a) joining and (b) leaving the system the system and/or its components do not start with a

known and pre-defined setting “Nice” manageable system model assumptions

either cannot be guaranteed or do not last for long

Roberto Baldoni, “The price of mastering churn in a distributed system”

7

Page 8: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Unmanaged distributed applications: Consequences

Autonomic/autonomous behavior of entities Self-defined, self-instantiating (& self*?) and

perpetually evolving distributed system It is impossible to know the set of processes

participating to the computation because it changes dynamically and can potentially grow without bounds

E.g., the system could cease existing when no process is active, and at other times the system may be made of thousands of active processes

. . . Dynamic Distributed System

Roberto Baldoni, “The price of mastering churn in a distributed system”

8

Page 9: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

WorldOrderly Chaotic

Spectrum of Possible System Models

Roberto Baldoni, “The price of mastering churn in a distributed system”

9

Air traffic Control

Mobile ad-hoc Systems

Cloud Computing

Peer-to-peer

Page 10: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Uncertainty in Dynamic Distributed Systems Static Distributed Systems:

Lack of temporal knowledge Failures Unknown communication delays

Dynamic Distributed Systems Same issues as in static distributed systems,

plus Non-monotonic and unknown size of the system Potentially changing properties of the “universe” Unclear notions of efficiency, effectiveness,

scalability

Roberto Baldoni, “The price of mastering churn in a distributed system”

10

• Solid theoretical foundations

• Precise problem specifications

• Rigorously correct solutions• Solid theoretical foundations

• Precise problem specifications

• Rigorously correct solutions

Page 11: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

System Model with Churn

Roberto Baldoni, “The price of mastering churn in a distributed system”

11

Page 12: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

The distributed system is dynamic In each run, infinitely many processes can arrive and depart from

the system but at any point in time the number of processes is finite (Infinite Arrival Model)

Processes participate in a distributed computation running on top of the distributed system Processes of the distributed system decide at their will to join and

leave the distributed computation (i.e. the computation is affected by continuous churn)

No process is guaranteed to participate for ever in the distributed computation

Each process has a unique identifier

Processes can crash and this can be seen as a leave of the process

Roberto Baldoni, “The price of mastering churn in a distributed system”

12

System Model with Churn

Page 13: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Abstractions Shared Memory

Registers Sets

One-shot problem Interval valid queries

Agreement Problem Leader Election

Roberto Baldoni, “The price of mastering churn in a distributed system”

13

Page 14: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Churn

Distributed System

Distributed Computation

Connectivity Protocol

Communication Protocols

Abstraction

For simplicity we assume N processes are in the distributed computation at any given time

Roberto Baldoni, “The price of mastering churn in a distributed system”

14

Page 15: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Object Abstraction: The Regular Register

A register is a shared variable accessed by processes through read and write operations

Roberto Baldoni, “The price of mastering churn in a distributed system”

15

Page 16: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Regular Register Architecture at node i

Roberto Baldoni, “The price of mastering churn in a distributed system”

16

Connectivity Layer

Point-to-PointLink

Broadcast

Regular Register

If pi invokes the send(m) operation to pj at time t then pj will receive m by time t+ if it has not left the system by that time

If pi invokes the broadcast(m) operation at time t and does not leave the system by time t+ then all the processes that are in the system at time t and does not leave the system by time t+ will deliver m by time t+

(liveness) If a process invokes a read or a write operation and does not leave the system, it eventually returns from that operation

(safety) A read operation returns the last value written or a value written by a concurrent write

Read() write(v) join()

REG

SystemComputation

Page 17: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Regular Register: write()

Roberto Baldoni, “The price of mastering churn in a distributed system”

17

The writer process pw wants to write the value v

pw sends a broadcast message (WRITE, v, sn)

… in the meanwhile processes join and leave the computation

OBS. Only processes belonging to the computation when pw starts the write and that remain in the computation for all the time of the write will maintain the updated copy of the register

Active Processes keeps the state of the computation

Dis

trib

uted

Sys

tem

A subset of processes participate to the register computation

pw

Page 18: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Processes in the distributed computation vs Active Processes

Roberto Baldoni, “The price of mastering churn in a distributed system”

18

N

ChurnA(t)

t

Correctness bound

#pro

cess

es

Joining processe=leaving processes

Page 19: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Processes in the distributed computation vs Active Processes

Roberto Baldoni, “The price of mastering churn in a distributed system”

19

N

ChurnA(t)

t

Correctness bound

#pro

cess

es

Joining processe=leaving processes

Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.

Page 20: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Processes in the distributed computation vs Active Processes

N

ChurnA(t)

t

#pro

cess

es

Joining processe=leaving processes

Correctness bound

Liveness and Safety issues

Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn. Roberto Baldoni, “The price of mastering churn in a distributed system”20

Page 21: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

21

An Algorithm in Synchronous System

Assumption there is a bound δ such that any message sent

(broadcast) at time τ ≥ t, is received (delivered) by time τ + δ to the processes that are in the system during the interval [τ, τ + δ].

A process remain in the system at least 3δ

Algorithm Read local Write global Join global

Roberto Baldoni, “The price of mastering churn in a distributed system”

Page 22: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Synchronous System Safety: case registeri ≠

Roberto Baldoni, “The price of mastering churn in a distributed system”

22

Join()

0

0

0

1pi

pj

ph

pk

Join

Write

Reply

pi has received a WRITE(< val,sn >) message during the first waiting phase and accordingly updated registeri

• the write operation lasts time• the join operation lasts at least time• a write message takes at most time to be delivered

Then the join and the write are concurrent and the join terminates with the last value written

write (1) 1

1

1

WRITE(1, 1)

Page 23: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Synchronous System Safety: case registeri =

Roberto Baldoni, “The price of mastering churn in a distributed system”

23

Join()

0

0

0

0pi

pj

ph

pk

Join

Write

Reply

INQUIRY(i)

REPLY(h, 0, 0)

If no write is concurrent with the join operation, and c<1/3 then there always exists an active process that replies with the last written value

Page 24: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Synchronous System Safety: case registeri =

Roberto Baldoni, “The price of mastering churn in a distributed system”

24

write (1)

Join()

0

0

0

1

1

1pi

pj

ph

pk

INQUIRY(i)

REPLY(h, 0, 0)

WRITE(1, 1)

pi can receive both WRITE(< val,sn >) messagesand REPLY(< j, val, sn >) messages. Accordingthe values received at time τ + 2δ, pi will updateregisteri to the value written by a concurrent update,or the value written before the concurrent writes

WRITE(1, 1)

If pi receives the write before the reply, pi does not overwrite the value and then any following write will return the last value written.

Page 25: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

25

Synchronous System

Termination. If a process invokes the join() operation and does not leave the system for at least 3 time units, or invokes the read() operation, or invokes the write() operation and does not leave the system for at least time units, it does terminates the invoked operation.

Safety. Let [, + ] any interval of the computation. if (c x n) in [, + ] is lesser than n/(3) (i.e., c < 1/3 ). A read() operation returns the last value written before the read invocation, or a value written by a write operation concurrent with it.

Roberto Baldoni, “The price of mastering churn in a distributed system”

Page 26: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Horizontal Quorums for Register Persistence

Roberto Baldoni, “The price of mastering churn in a distributed system”

26

1

5

9

3

1

5

9

8

1

5

7

8

2

5

7

8

2 joining

Active processNon-active process

Page 27: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Horizontal Quorums for Register Persistence

Roberto Baldoni, “The price of mastering churn in a distributed system”

27

3δ3δ

1

5

9

3

1

5

9

8

1

5

7

8

2

5

7

8

2

6

7

8

2

6

7

32 joining joining

3

Active processNon-active process

The register persistence is preserved iff the churn is below a given bound depending of protocol implementation

Page 28: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Eventually Synchronous System

Assumption There exists a time t after that there is a bound δ such that any

message sent (broadcast) at time τ ≥ t, is received (delivered) by time τ + δ to the processes that are in the system during the interval [τ, τ + δ].

There exists a time t after that c < 1/3δ. A process remain in the system at least 3δ

Algorithm Read global Write global Join global

Roberto Baldoni, “The price of mastering churn in a distributed system”

28

Page 29: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Roberto Baldoni, “The price of mastering churn in a distributed system”

29

Vertical Quorums for Register Validity in Asynchronous Periods

Validity of the read:During asynchrony periods to be sure to read the last written value you need to read/write registers from a majority of processes in the system (you do not have anymore the guarantee that messages are delivered within a known bound)

time

Termination. Let us assume that |A(t)| > n/2 (i.e., majority of processes is active at any time), if a process invokes join(), read() or write (), and does not leave the system, it terminates its operation.

Safety. Let us assume that |A(t)| > n/2, a read operation returns the last value written before the read invocation, or a value written by a write operation concurrent with i

Page 30: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Asynchronous System There are no bound on message transfer

delays

Theorem It is not possible to implement a regular register

in a fully asynchronous dynamic system.

The results is similar to the one of [Attiya – Bar-Noy -Dolev JACM95] when considering a static system with any number of process failures

Roberto Baldoni, “The price of mastering churn in a distributed system”

30

Page 31: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Regular Register with Byzantine Failures

Roberto Baldoni, “The price of mastering churn in a distributed system”

31

Page 32: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Regular Register with Byzantine Failures

Composed by an arbitrary large set of client c1... cm

Dynamic: servers may join and leave (infinite arrival model)

Join_System() operation: connects new processes to the system

Leave_System() operation: passive leave

Connection Layer (e.g. Overlay Management

Protocol)

(Authenticated)Communication Layer

(Best-effort Semantics)

Distributed Computation(i.e. Regular Register)

Page 33: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Computation Model

Client are correctNo information about register stateClients triggers read() and write() operations

Write (v)

Read ()

Page 34: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Computation Model

Initially n servers are part of the register computationUp to f byzantine failures (f < n/3)Servers maintain locally a copy of the register valueAlternating periods of churn and stability

No stable processes In churn periods the servers

set is refreshed of cn servers in each time unit (c [0, 1]).

Write (v)

Read ()

v

v

v

vx

vx

v

Join_Server()

Page 35: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Requirements

Write Persistency: Servers maintain the last value written by a write operation despite servers departures

Byzantine Resiliency: There are always at least f+1 servers maintaining the same value

Read- Validity: any read() operation returns the last value written by a completed write() or a value concurrently written

Page 36: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Issues in read() operations

v xxv

v xxvv

v xxv

v xxv

v

v

y

time

t1

t2

ti

tk

Page 37: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Validity Bound

Consider a generic protocol P= {AJS, AR, AW } implementing a regular register such that1) every operation eventually terminates and

2) there exists a period of churn longer than the longest operation issued on the register

Theorem: Let AJS, AR and AW be the algorithms implementing respectively join_Server(), read() and write() operations. Let tj, tr and tw be the maximum time intervals needed by the previous algorithm to terminate the operation. If

c min {(n-3f)/(n tr), (n-3f)/(n (tj+ tw)}

then it is not possible to ensure both write persistency and read validity

Page 38: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

38 Roberto Baldoni, “The price of mastering churn in a distributed system”

Validity Bound in a synchronous system TimelyBroadcastDelivery(TBDel) : There exists a known and finite

bound such that every message broadcast at some time t is delivered up to time t + .

TimelyChannelDelivery(TCDel) : There exists a known and finite bound ’ < such that every message sent at some time t is delivered up to time t + ’ .

Page 39: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Pictorial Related Work and summary of results for Regular Register

System Model

Churn Model

Failure model

Asyncronous

Eventuallysynchronous

synchronous

crash

byzantine

static quiescent continuous

Aguilera et al. PODC 2010

Baldoni et al. ICDCS 2009

Baldoni et al. PODC 2011

Roberto Baldoni, “The price of mastering churn in a distributed system”

39

Page 40: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

No Churn Quiescent Churn

Continuous Churn

Synch Crash

BFT papers

Baldoni et al ICDCS 2009

Byzant Baldoni et al. PODC 2011 (ba)

Event Synch

crash Baldoni et al ICDCS 2009

byzantine Open Problem

Asynch Crash Aguillera et al 2009 Impossible

byzant Open Problem

Pictorial Related Work and summary of results for Regular Register

Roberto Baldoni, “The price of mastering churn in a distributed system”

40

Page 41: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Other Abstractions we faced

Set object (Europar 2010, EWDC2011) More complex semantic than the one of registers The set containts all its history

Main result: It is not possible to implement a set object in an eventually synchronous distributed system prone to continuous churn if:a) Processes have only finite memory space for local computation

b) Accesses to the set are continuous

c) There are no stable processes participating in the set computation

k-bounded set in an eventually synchronous distributed system

Roberto Baldoni, “The price of mastering churn in a distributed system”

41

Page 42: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Other Abstractions we faced

Leader Election (EDCC2010) There is a bounded set of (good) processes that

gets into the computation and remain forever (no one knows who they are)

Churn is continuous Communication is synchronous with finite losses

and unknown maximum transfer delay

Risk: elect an infinite sequence of processes that leave the system (bad processes)

Main result: «under these assumptions we can implement leader election»

Roberto Baldoni, “The price of mastering churn in a distributed system”

42

Page 43: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

43 Roberto Baldoni, “The price of mastering churn in a distributed system”

done in 2 Steps

The HB* Oracle Provide a list of processes

deemed to be up (alive list). The list aims to: Put good processes on the top of

the list Stabilize the position of a good

process in the list

protocol Take the list provided by the

HB* protocol and output the leader

HB*

leader

alive list

unicast multicastse

nd/r

ecei

ve

multicast/receive

Page 44: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Conclusion Dynamic Distributed Systems are everywhere

Most of the todays systems are unmanaged to some extent

Some of the functionality have to be autonomic and do not rely on a manager

Dynamic Distributed Systems are unquestionably more complex than static ones this leads to more complex solutions to solve the same problem

Scalability and dynamicity are not synonymous Understanding the how to implement abstractions

in a efficient way and well-suited to a dynamic distributed systems is stil an open and fashinating problem

Roberto Baldoni, “The price of mastering churn in a distributed system”

44

Page 45: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

One slide to remember

Roberto Baldoni, “The price of mastering churn in a distributed system”45

Page 46: Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

One slide to remember

N

ChurnA(t)

t

#pro

cess

es

Joining processe=leaving processes

Correctness bound

Liveness and Safety issues

Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn. Roberto Baldoni, “The price of mastering churn in a distributed system”46