44
http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Embed Size (px)

Citation preview

Page 1: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

http://parasol.tamu.edu

Partial Synchrony: Realizing an Ideal

Srikanth Sastry Parasol Lab, Texas A&M University

Page 2: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Outline

• Classic Partial Synchrony• Empirical Systems• Problem Statement and Methodology• Preliminary Results

• The Celeration Problem• Fair Schedulers

• Future Work

2

Page 3: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Partial Synchrony

• Temporal guarantees on computation and communication• Guarantees themselves are incomplete• Knowledge is incomplete

• Introduced to circumvent the FLP impossibility• Formalizes the notion of ‘somewhat timely’

• Classic model (ParSync) [DLS1988]• (Eventual) Reliable communication• (Eventual, Unknown) Bound on message delay• (Eventual, Unknown) Bound on relative process speeds

3

Page 4: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Closer Look At ParSync

• Reliable message delivery• Unreliable message delivery

• Unbounded-size messages delivered in bounded time• Larger messages experience greater delays

• Arbitrary number of messages received per step• Fixed number of messages per step

• Agnostic to absolute process speeds• Aware and affected by absolute process speeds

• Agnostic to channel capacity• Sensitive to channel capacity

• Non-blocking Communication• Blocking Communication

4

Page 5: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Characterizing Empirical Systems

Computation• Processes take atomic

steps• Receive at most one

message• Make a state transition• Send at most one message

• Processes can crash• Processes execute at

finite rate• Processes have bounded

relative execution rate

Communication• Fair lossy• Detectable message

corruption• Some infinite subset of

timely messages• Message delay proportional

to message size• Timely messages not too

sparse • Bounded FIFO delivery of

messages

5

Page 6: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Problem Statement and Methodology

• Problem:

Construct ParSync on top of empirical systems

• Methodology‒Step 1. Encapsulate the underlying synchronism‒Step 2. Construct reliable channels‒Step 3. Construct a fair distributed scheduler‒Step 4. Construct timely channels

6

Page 7: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

7

Methodology

Empirical Distributed Systems

Encapsulate Synchronism

Distributed Fair Scheduler

Reliable Channels

ParSync Environment

Page 8: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

8

Encapsulating Synchronism: Failure Detection Oracles

• A system service that can be queried for (potentially unreliable) information about process crashes [CT96]

• False negatives (crash, but no suspicion)• False positives (suspicion, but no crash)

• ◊P – Eventually Perfect Failure Detector• No false negatives• Finitely many false positives

• Strong Completeness• Every crashed process is eventually and permanently

suspected

• Eventual Strong Accuracy• Every correct process is eventually and permanently trusted

Page 9: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Implementation Challenges

• Absolute Process Speeds [SPW2009]

• Message Loss [SP2007]

• Bounded Channel Capacity [SP2007]

• Size-Sensitive Message Delay [SP2007]

9

Page 10: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

10

Absolute Process Speeds: Celeration

• Every crash-fault detector…uses some kind of timeout mechanism…

which requires some way to measure time.

• But which time base should be measured?• Real Time ≈ Ticks of a physical clock• Action Time ≈ Steps of an executing process

• Negative result: Neither time base is sufficient for crash-fault detection in celerating environments.

• Celerating processes can:• Accelerate (e.g., via hardware upgrades)• Decelerate (e.g., via increased loads)

Page 11: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

11

Acceleration and Action Time

Transmission Processing

Real Time

Action Time

Page 12: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

12

Deceleration and Real Time

Transmission Processing

Real Time

Action Time

Page 13: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

13

The Celeration Problem

Transmission Processing

Action time diverges for acceleration

Real time diverges for deceleration

Page 14: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

14

Significance of Celeration

• Existing crash-detection mechanisms based only on real-time or action-time clocks are actually broken!

• Our positive result: We construct a new bichronal timeout mechanism that is immune to process celeration

Page 15: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

15

System Model

• Temporal Assumptions• - unknown upper bound absolute message delay • - unknown bounds relative process speeds

• Reliability Assumptions• Reliable communication: no message loss/corruption• Unreliable computation: processes may crash

Page 16: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

16

and Processing Delays

• Suppose P sends a ping to Q• Ping will be delivered within real-time units• But when will the ping actually be received?

• Depends on the local processing delay at Q

• Q takes (at least) 1 step for every steps at P• If Q has c local actions executed in round-robin order,

then Q executes all such actions within · c steps at P

• So, processing delay at Q is at most ·c steps at P

Page 17: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

17

Ping-Ack ◊P Implementations

• Adaptive timeout values should• Exceed RTT after finitely-many false positives• Converge to a constant timeout value (if efficient)• Guarantee accuracy forever thereafter

Round-Trip Time (RTT) ≤ (2 + ·c + c)

P Q

PING

ACK

≤ Δ

≤ Δ

≤ (Φ·c)≤ c

Page 18: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

18

Celeration and False PositivesR

TT

in A

ctio

n-T

ime

Uni

ts

RTT in Real-Time Units

2

·c

Acceleration

Deceleration

Worst-Case RTTUnbounded RTT!

Unbounded RTT!

Page 19: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

19

A New Timeout Mechanism

• Timeouts in ParSync are inherently bichronal• Transmission delays are bounded in real-time units• Processing delays are bounded in action-time units

• We define Bichronal Clocks for bichronal time• Model as an ordered pair <real-time, action-time>• Measure both time components concurrently • Expire only after both components expire!

Page 20: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

20

Bichronal Clock Expiry

• Clock.Start (Real=5, Action=8)

0

Real Ticks

Action Ticks

2 4 6 8 10 12

Expiry at bichronal time (5,11)5 8 11

Page 21: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

21

Celeration-Immune ◊P

• Adaptive ping-ack protocol• Start bichronal clock after sending ping• Run bichronal clock 4 consecutive times• Timeout if no ack received by 4th expiry

• Upon receiving any ack• Trust sending process• Adapt bichronal values after false-positive mistakes

• Increase both real-time and action-time components!

Page 22: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

22

◊P – Strong Completeness

• RequirementSuspect crashed processes permanently

• This one is easy• Crashed processes stop sending acks• Bichronal timer eventually expires 4th time• Permanent suspicion after final ack

Page 23: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

23

◊P – Eventual Strong Accuracy

• Requirement Trust correct processes eventually and

permanently

• After finitely many false-positive mistakes• Bichronal values exceed <real=, action=·c>

P Q

PING

ACK

≤ Δ

≤ Δ

≤ ·c

Page 24: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

24

◊P – Eventual Strong Accuracy

P

Q

Ping Ack

Real Time Unbounded Unbounded

Action Time Unbounded ·c Unbounded c

Bichronal max(,·c) max(,·c) max(,·c) max(,c)

Transmit Ping

Process Ping

Transmit Ack

Process Ack

Page 25: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

25

Take Home Lessons

• Diagnosed the celeration problem• Many existing ◊P implementations are actually broken

• Defined bichronal clocks• Effective timeout mechanism for celerating environments

• Implemented celeration-immune ◊P • Ping-ack implementation based on bichronal clocks

• Practical advantages • Performance: Reduces ◊P mistakes during system volatility • Portable: Easy to incorporate into existing implementations

Page 26: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

26

Why ◊P ?

• Strong enough to implement • Fair distributed schedulers [SP2008], [PSS 2008]• Quiescent reliable communication [ACT2004]

• In fact, it is the weakest such failure detector!• For fair distributed schedulers [SPW2009] • For quiescent reliable communication [ACT2004]

Page 27: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

27

Checkpoint

Empirical Distributed Systems

Eventually Perfect Failure Detector ◊P

Distributed Fair Scheduler

Reliable Channels

ParSync Environment

Celeration [IPDPS 2009]Message Loss [ISPA 2007]

Crash Quiescence [DISC 2009]

[ACT 2004]

Page 28: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

28

Dining Philosophers As Schedulers

• Arbitrary graph topology• Nodes = processes (diners)• Edges = potential conflicts

Thinking

HungryEating

Diners cycle among three states

• Constraints– Thinking may last forever– Eating must be finite for correct

diners

Page 29: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

29

Dining Specifications

• Wait Freedom• Progress despite crashes

• Eventual Weak Exclusion ◊WX• Eventually live neighbors never eat together

• Eventual fairness• Eventually a hungry process is never overtaken more than k times

• ◊P is sufficient to implement wait-free dining under ◊WX [PSS2008] [SP2007]

• But is ◊P necessary? In other words, is it the weakest?

Page 30: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Utility of ◊WX: Duty Cycle Scheduling

Page 31: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

31

Methodology To Show ◊P Is The Weakest Failure Detector

• Based on definitions in [CT 96] and [CHT 96]• Suppose a weaker failure detector D could solve dining

under ◊WX• If we can implement ◊P using a black-box solution to

dining under ◊WX• Then we can use D to implement ◊P• Contradiction!• Hence, ◊P is the weakest.

Page 32: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

32

Construction

• Given two processes X and Y• Y has two witnesses detecting X's liveness• Subject and witness compete in a dining instance• Each subject-witness pair throttles the other• Careful hand-off of eating sessions

S0

S1

W0

W1

X Y

Dining0

Dining1

Page 33: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

33

Witness Actions

• Wi becomes hungry

• Upon eating– Trusts X if trust bit is true– Else, suspects X– Resets trust bit to false

– Triggers W1-i

to become hungry

– Exits eating

• Upon receiving a ping from S

x

– Set trust bit to true

– Send an ack to Sx

S0

S1

W0

W1

X Y

DX0

DX1

Legend

Thinking Hungry Eating

Page 34: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

34

Subject Actions

• Sx becomes hungry

• Upon eating– Waits until S

1-x exits

– Sends ping to Wx

– Waits for ack– Upon receiving ack

– Triggers S1-x

to become hungry

– Waits until S1-x

is eating

– Exits eating

S0

S1

W0

W1

X Y

DX0

DX1

Legend

Thinking Hungry Eating

PINGACK

PINGACK

Page 35: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

35

Witness Actions – Timeline

Y.w0

Y.w1

1

2

3

. . . 4. . .

. . .

Enable

Ena

ble

Ena

ble

Enable

Legend

Thinking Hungry Eating

Page 36: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

36

6

Subject Action - Timeline

1X.s0

X.s1 2

3

4

5

. .. . .

.

PIN

G

AC

K

PIN

G

AC

K

AC

K

PIN

G

PIN

G

AC

K

AC

K

AC

K

PIN

G

PIN

G

Y.w0

Y.w1

LegendThinking Hungry EatingTrigger

Page 37: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

37

2 4

1 3

6

Eventual Strong Accuracy

1X.s0

X.s1 2

3

4

5

. .. . .

.

PIN

G

AC

K

PIN

G

AC

K

AC

K

PIN

G

PIN

G

AC

K

AC

K

AC

K

PIN

G

PIN

G

Y.w0

Y.w1

LegendThinking Hungry EatingTrigger

Trust X Trust X

Trust X Trust X

Page 38: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

38

Strong Completeness

X.s0

X.s1. ..

. . .

. . .

PIN

G

AC

K

PING

AC

K

Y.w0

Y.w1

LegendThinking Hungry EatingTrigger

Trust X Suspect X ....

Trust X Suspect X ....

Crash!

Suspect X

Suspect X

Page 39: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

39

Take Home Lesson

• ◊P is the ‘weakest’ failure detector to implement Wait-free dining under ◊WX

• ◊P and Wait-free dining under ◊WX encapsulate equivalent synchronism in the underlying system

Page 40: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

40

Checkpoint

Empirical Distributed Systems

Eventually Perfect Failure Detector ◊P

Dining under ◊WX Reliable Channels

ParSync Environment

Celeration [IPDPS 2009]Message Loss [ISPA 2007]

Crash Quiescence [DISC 2009]

[ACT 2004]Necessity [SPAA 2009]

Sufficiency [ICDCN 2008]

Page 41: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Next Steps

• Implement `timely’ systems using ◊P • Challenge

• Failure detectors no real time guarantees!• “In Search of Lost Time” [CBHW2008]

• So, failure detectors do not encapsulate temporal guarantees• Synchronous System and P [CBGS2000]

• Then, what do failure detectors encapsulate?• Our assertion: Fairness

• Theta [WS2009]• Asynchronous Bounded Cycle [RS2008]

41

Page 42: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

42

Big Picture

Empirical Distributed Systems

Eventually Perfect Failure Detector ◊P

Dining under ◊WX Reliable Channels

ParSync Environment

Celeration [IPDPS 2009]Message Loss [ISPA 2007]

Crash Quiescence [DISC 2009]

[ACT 2004]Necessity [SPAA 2009]

Sufficiency [ICDCN 2008]

[On Going]

Page 43: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

Future Work

• Extend the results to:‒ Other models of partial synchrony

Abstract MAC Layer‒ Other fault models

Crash-recover faults Transient faults

‒ Other kinds of networks VANETs, MANETs Anonymous Networks

43

Page 44: Http://parasol.tamu.edu Partial Synchrony: Realizing an Ideal Srikanth Sastry Parasol Lab, Texas A&M University

http://parasol.tamu.edu

Questions?

Thank You!Srikanth Sastry

[email protected]