PASC fault tolerance

Preview:

DESCRIPTION

A new generic and rigorous approach to the tolerance of data corruptions. Presentation of the paper "Practical Hardening of Crash-Tolerant Systems" published at USENIX ATC 2012. See video at http://bit.ly/LNc5mc

Citation preview

Taming Data Corruptions in Distributed Systems

Marco Serafini (Yahoo! Research BCN)

Infrastructure dependability

o Service availability, data durabilityo In presence of hardware faultso Current approaches tolerate crashes

Crashes

oAssumptionso A server (process) suddenly stopso Until then, only correct steps

Time

Crash

Data corruptions

oWhat if there are data corruptions?o The state of a process may be corruptedo The process may make incorrect steps before stopping

Time

Datacorruptions

Data corruptions

oWhat if there are data corruptions?o The state of a process may be corruptedo The process may make incorrect steps before stopping

Time

Datacorruptions

NOT COVERED!

Sources of data corruptions

o Commodity disks are known to be unreliableo Faulty firmware, bad sectors etc.

oRAM: ECC errors are frequento Production machines only see detected errors

Coverage not knowno Interconnects and CPUs also fail

o Faulty drivers or bit flips

A horror storyAn 8-hour system-wide outage due to a single hardware fault

What happened?

oQuoted from the Amazon service health dashboardo “A handful of messages had a single bit corrupted”o “The message was still intelligible, but the system state

information was incorrect”o “We used MD5 checksums throughout the system (but

not) for this particular internal state information”o “(The corruption) spread throughout the system causing

the symptoms described above”

Error propagation

u

v

mout

Event handling

min

min

x

y

Eventhandling

Process i Process j

Common practice

oManual placement of ad-hoc error detection checkso Application knowledgeo Time consuming

oHard to structure without fault model

oNo error isolation guarantee

Research: Byzantine faults

oByzantine modelo Faulty nodes controlled by an adversaryo Worst-case model

11

Time

Byzantinefault

Byzantine fault model

oBlack-box model of faulty processes: adversarialoHardening for error isolation [Nysiad NSDI 2008]

o Based on state machine replicationo Replication and performance costs

Servers

Client

Agreement on requests

Byzantine faults

oByzantine hardening covers attacks and bugs…o… assuming, e.g., design diversity of replicas

o Unpractical in most systems no real adoption

Attacks

Security

Bugs

V & V

Data corruptions

ASC Hardening

A new approach to error isolation

u

v

mout

Event handling

min

min

x

y

Eventhandling

Process i Process j

1. General model of process behavior2. Arbitrary State Corruption (ASC) fault model3. Guarantee error isolation through hardening

A new approach to error isolation

u

v

mout

Event handling

min

min

x

y

Eventhandling

Process i Process j

1. General model of process behavior2. Arbitrary State Corruption (ASC) fault model3. Guarantee error isolation through hardening

with M. Correia, D. Ferro and F. Junqueira2012 Usenix Annual Technical Conference

Process and fault modelsDefining Arbitrary State Corruptions

Process model

Upon receive message <REQ, r> doif v > 5 then

u = r + v + 5;

elseu = r + v;

v = u;send <WRITE, v> to

process p

min

mout

1) Event Dispatching

2) Event Handling

3) Message sending

State

ASC fault model

oAn Arbitrary State Corruption can make a process o Crasho Assign an arbitrary value to any variableo Start the execution from an arbitrary instruction

v 5

z 10

PC 20

v 12

z 7

PC 320

Fault frequency

oOne fault for every processed input message

Upon receive message <REQ, r> doif v > 5 then

u = r + v + 5;

elseu= r + v;

v = u;send <WRITE, v> to

process p

min

mout

1) Event Dispatching

2) Event Handling

3) Message sending

State

Fault diversity

oA corrupted variable is different from its replica

oOnly holds immediately after the faulto Can be invalidated if instructions modify the variable

v 5

z 10

PC 20

v 12

z 7

PC 320

5

10

5

41

original replica original replica

Error propagation

o Fault diversity does not holdoHardening preserves diversity

u

v ?

Original ReplicaFault diversity

ASC hardeningFrom ASC faults to crashes and message omissions

From ASC to crashesoTransparent: to the hardened processo Local: no process replication on multiple machinesoUntrusted: can have faults while executing hardening

HARDENING RUNTIME

u

v

mout

Event handling

min

PASC runtime

EH1 EH2 EH3

Process state

PASC checks

PASC library

User- defined

Transparent

github.com/yahoo/pasc

Replica state

Evaluation

Hardening an echo server

o Little computation, network bound, no overheado PBFT is a reference (Nysiad not available)

Hardening State Machine Replication

+ 70 %- 15 %

Zookeeper (core)

Memory overhead

Scalability

o SimpleKV: eventually consistent store, no replicationo Scales similarly with hardeningo No server “wasted” for replication

1 3 5 70

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

PASC sKVUnprot. sKV

Number of servers

Ma

x.

thro

ug

hp

ut

(ko

ps/

sec)

PASC fault coverageo Injected random bit flips in Paxos

o Code corruptions: bytecode and binary codeo State corruptions: pointers and primitive values

Code corruptions State corruptions

Unprot PASC Unprot PASC

Undet. 3 0 93 0

Det. - 1 - 330

Crash 1640 1663 2301 2066

Not manif. 1213 1193 2843 2841

Total 2856 2856 5237 5237

Wrap up

oHardware data corruptions are a real dangero Proposed new systematic approach

o BFT not realistico Ad-hoc approaches are not systematic

oHardening algorithm for error isolation o Local: does not require replicationo Efficient: PASC-Paxos has up to 70% more throughput

than PBFTo High fault coverage

Directions

o Systematic protection of Yahoo! infrastructure against data corruptions

oASC just scratched the surface – some todoso Reduce memory footprinto Support for external memory (disks/SSDs)o Hardening of legacy codeo Theoretical foundations

Thank you

serafini@yahoo-inc.com