23
A Language Support for Exhaustive Fault-Injection in Message-Passing System Models Masaya Suzuki & Takuo Watanabe Department of Computer Science Tokyo Institute of Technology 1 MOD*2014, Bertinoro

A Language Support for Exhaustive Fault-Injection in Message-Passing System Models

Embed Size (px)

Citation preview

A Language Support for Exhaustive Fault-Injection in Message-Passing

System Models!

Masaya Suzuki & Takuo Watanabe!Department of Computer Science!

Tokyo Institute of Technology

1

MOD*2014, Bertinoro

About This Work

• Proposes a modeling language Sandal that is aimed to describe fault-prone distributed systems.!- Sandal provides a fixed set of features for describing faults

and fault-handling actions!• timeout, message lost, shutdown!!

• Talk Outline!- Background!- Modeling Faults and Fault-Handling Actions!- Language Features of Sandal!- Case Study: 2PC!- Final Stuff

2

Research Background (1):!Adaptive Distributed Systems• Concurrent Context-Oriented Programming!• "A Reflective Approach to Actor-Based Concurrent Context-

Oriented Systems" [Watanabe & Takeno, COP 2014]!- asynchronous context manipulation using reflection!• optimistic and pessimistic synchronization!

!!!!!!!

• ⇒ Verification of context manipulation mechanism3

observer

context change info.

O

A

Bcross-context message

Research Background (2):!Modeling Human-Made Faults• Verifying workflows including recovery processes of

human-made faults!• "A Model-Checking Based Approach to Robustness Analysis of

Procedures under Human-Made Faults" [Nagatou & Watanabe, APBPM 2014]!

- Modeling a system as a set of concurrent processes!- Injecting possible human-made fault actions to the model!• cf. HAZOP!

- Model-check the fault-injected model!- Applications!• Blood Testing, Radar Data Processing, etc.!

!

• ⇒ Modular fault description mechanism

4

Modular Description of Self-* Behaviors

• Generally, modeling/specification languages need good modularization mechanisms for describing/specifying self-* behaviors and/or non-functional behaviors such as:!- Faults, Fault Handling Actions!- (Dynamic) Adaptation / Evolution / Self-Updating!- Context-Aware / Context-Oriented Behaviors!- Resource Aware Actions!- (Application-Aware) Synchronizations!- Security / Safety Related Behaviors!!

• cf. Advanced Modularization Mechanisms in Programming Languages: AOP, FOP, COP, etc.

5

Motivation: Modeling a Faulty System

• From an experience on building a complex service on a distributed system: testing is not satisfactory for some fault-prone environments!• Tried to borrow the idea of SFI (software fault

injection) for describing the abstract model of the service to be model checked.

6

Describing Faults (1)

• A simple timeout action for a message reception (in Promela)!!!!!!!!!- Note: Promela's timeout primitive can not be used for this

purpose.

7

ch ? var; if :: var == Done -> ... :: ... fi

bool recv_timeout = false; if :: ch ? var; :: recv_timeout = true; fi; if :: var == Done -> ... :: ... fi

the original model!(w/o timeout)

a model with timeout action

Describing Faults (2)

• Unexpected termination actions (highlighted) should be inserted to wherever needed.

8

proctype Arbiter() { mtype resp; if :: true; false :: true fi; worker1_recv ! Ready; if :: true; false :: true fi; worker2_recv ! Ready; if :: true; false :: true fi; worker1_send ? resp; if :: true; false :: true fi; if :: resp == NotReady -> if :: true; false :: true fi; all_ready = false :: else fi; if :: true; false :: true fi; worker2_send ? resp; if :: true; false :: true fi; if :: resp == NotReady ->

if :: true; false :: true fi; all_ready = false :: else fi; determined = true; if :: true; false :: true fi; if :: all_ready -> if :: true; false :: true fi; worker1_recv ! Commit; if :: true; false :: true fi; woeker2_recv ! Commit :: else -> if :: true; false :: true fi; worker1_recv ! Abort; if :: true; false :: true fi; worker2_recv ! Abort fi}

proctype Worker1() { mtype resp; if :: true; false :: true fi; worker1_recv ? resp; if :: true; false :: true fi; if :: worker1_ready = true; if :: true; false :: true fi; worker1_send ! Ready :: worker1_ready = false; if :: true; false :: true fi; worker1_send ! NotReady fi; if :: true; false :: true fi; worker1_recv ? worker1_resp}

proctype Worker2() { ...}

Need for Modular Description Mechanism

• Manually inserting faults and fault-handling actions into a model is itself fault-prone. !• Modeling language should have features that support

modular descriptions for faults and fault-handling actions.

9

Current Contribution

• We designed and implemented a modeling language Sandal that is aimed to describe fault-prone distributed systems.!• Some case studies, including two phase commit (2PC)

protocol, show the effectiveness of the language features of Sandal.

10

Sandal

• A process-oriented modeling language with features for describing typical faults:!- unexpected process termination!- timeout in message reception!- random loss of message!!

• Langauge Processor (translator to NuSMV)!- Source code: https://github.com/draftcode/sandal!- You need!• Go (http://golang.org) to build the translator!• NuSMV (http://nusmv.fbk.eu) to verify translated models

11

Target Systems

12

Example

13

data Message { Ping, Pong } proc PingProc(ch_send channel { Message }, ch_recv channel { Message }) { for { var msg Message send(ch_send, Ping) recv(ch_recv, msg) } } ... init { P0_0: PingProc(ping_to_pong_0, pong_to_ping_0), P1_0: PongProc(pong_to_ping_0, ping_to_pong_0), ping_to_pong_0: channel { Message }, pong_to_ping_0: channel { Message } }

Unexpected Process Termination &!Random Loss of Messages (1)

• @shutdown!- specifies that the process may terminate unexpectedly!• @drop!

- specifies that the channel may lost messages

14

init { P0_0: PingProc(ping_to_pong_0, pong_to_ping_0) @shutdown, P1_0: PongProc(pong_to_ping_0, ping_to_pong_0) @shutdown, ping_to_pong_0: channel { Message } @drop, pong_to_ping_0: channel { Message } @drop, }

Unexpected Process Termination &!Random Loss of Messages (2)

15

Unexpected Termination Random Loss of Messages

Timeout (Nonblock) Message Reception

16

var result bool = timeout_recv(ch, v)

Case Study: Two Phase Commit Protocol

17

Case Study: Experimental Result

18

(arbiter.determined^¬arbiter.all ready ! (¬worker1.resp = Commit ^ ¬worker2.resp = Commit))

Speed LOC Memory

No Fault 0.96 sec 51 26.4 MB

With Timeout 2.88 sec 51 (6) 21.8 MB

With Message Loss 2.11 sec 51 (8) 11.9 MB

With Termination 0.51 sec 51 (6) 17.1 MB

Arch Linux (Kernel 3.12.7) Intel Core i7-3370K @ 3.50GHz 16GB Memory NuSMV 2.5.4 (CUDD 2.4.1 MiniSat2-070721), Spin 6.2.5

Property to be checked:

Result:

Comparison (1): Time & Memory Footprint

19

Sandal Spin NuSMV

No Fault 20.8 MB 128 MB 6.42 MB

With Timeout 21.2 MB 128 MB 6.64 MB

With Message Loss 25.2 MB 128 MB 6.82 MB

With Termination 12.7 MB 128 MB 6.57 MB

Sandal Spin NuSMV

No Fault 0.42 sec 0.87 sec 0.016 sec

With Timeout 0.50 sec 0.89 sec 0.018 sec

With Message Loss 0.95 sec 0.88 sec 0.025 sec

With Termination 0.21 sec 0.95 sec 0.015 sec

Comparison (2): Size of Models

• (n) : # of lines modified / added to "No Fault" version

20

LOC (Diff) Sandal Promela NuSMV

No Fault 45 28 178 (58)

With Timeout 48 (5) 37 (13) 180 (6)

With Message Loss 45 (2) 34 (10) 182 (14)

With Termination 45 (6) 41 (21) 179 (23)

Related Work

• Automatic fault-injection tools targeted to models!- MODIFI [Svenningsson et al, 2010]!- FSAP/NuSMV-SA [Bozzano et al, 2003]!• both are for hardware faults!• modularization problem!

• Model-checking message-based distributed systems!- Rebeca [Sirjani et al, 2004]!• AOP for modeling languages!

- Aspect-Oriented Promela [Ohno & Kishi, 2008]!- Moxa [Yamada & Watanabe, 2005]!• Aspect-Oriented Extension of JML

21

Future Work

• Optimizing the Translator!- Abstraction Refinement, K-Induction, etc.!• AOP/FOP version of Sandal!

- Model-level separation of concerns (parameterization?)!• Probabilistic Models for Faulty Behaviors!

!

• Verifying Multi-Level Models of Self-* Systems!• Compositional Construction of Actor-Based

Group-Wide Reflection [Watanabe, 2013]!- Self-* actions vs. base-level actions

22a"gr

oup"of"

objects�

meta0g

roup�

Summary

• We propose a modeling language Sandal that provides features for describing faults and fault-handling actions!- timeout, random loss of messages, unexpected termination!• Case study (2PC protocol) shows the effectiveness of

the language features.

23