Upload
takuo-watanabe
View
68
Download
0
Embed Size (px)
Citation preview
A Language Support for Exhaustive Fault-Injection in Message-Passing
System Models!
Masaya Suzuki & Takuo Watanabe!Department of Computer Science!
Tokyo Institute of Technology
1
MOD*2014, Bertinoro
About This Work
• Proposes a modeling language Sandal that is aimed to describe fault-prone distributed systems.!- Sandal provides a fixed set of features for describing faults
and fault-handling actions!• timeout, message lost, shutdown!!
• Talk Outline!- Background!- Modeling Faults and Fault-Handling Actions!- Language Features of Sandal!- Case Study: 2PC!- Final Stuff
2
Research Background (1):!Adaptive Distributed Systems• Concurrent Context-Oriented Programming!• "A Reflective Approach to Actor-Based Concurrent Context-
Oriented Systems" [Watanabe & Takeno, COP 2014]!- asynchronous context manipulation using reflection!• optimistic and pessimistic synchronization!
!!!!!!!
• ⇒ Verification of context manipulation mechanism3
observer
context change info.
O
A
Bcross-context message
Research Background (2):!Modeling Human-Made Faults• Verifying workflows including recovery processes of
human-made faults!• "A Model-Checking Based Approach to Robustness Analysis of
Procedures under Human-Made Faults" [Nagatou & Watanabe, APBPM 2014]!
- Modeling a system as a set of concurrent processes!- Injecting possible human-made fault actions to the model!• cf. HAZOP!
- Model-check the fault-injected model!- Applications!• Blood Testing, Radar Data Processing, etc.!
!
• ⇒ Modular fault description mechanism
4
Modular Description of Self-* Behaviors
• Generally, modeling/specification languages need good modularization mechanisms for describing/specifying self-* behaviors and/or non-functional behaviors such as:!- Faults, Fault Handling Actions!- (Dynamic) Adaptation / Evolution / Self-Updating!- Context-Aware / Context-Oriented Behaviors!- Resource Aware Actions!- (Application-Aware) Synchronizations!- Security / Safety Related Behaviors!!
• cf. Advanced Modularization Mechanisms in Programming Languages: AOP, FOP, COP, etc.
5
Motivation: Modeling a Faulty System
• From an experience on building a complex service on a distributed system: testing is not satisfactory for some fault-prone environments!• Tried to borrow the idea of SFI (software fault
injection) for describing the abstract model of the service to be model checked.
6
Describing Faults (1)
• A simple timeout action for a message reception (in Promela)!!!!!!!!!- Note: Promela's timeout primitive can not be used for this
purpose.
7
ch ? var; if :: var == Done -> ... :: ... fi
bool recv_timeout = false; if :: ch ? var; :: recv_timeout = true; fi; if :: var == Done -> ... :: ... fi
the original model!(w/o timeout)
a model with timeout action
Describing Faults (2)
• Unexpected termination actions (highlighted) should be inserted to wherever needed.
8
proctype Arbiter() { mtype resp; if :: true; false :: true fi; worker1_recv ! Ready; if :: true; false :: true fi; worker2_recv ! Ready; if :: true; false :: true fi; worker1_send ? resp; if :: true; false :: true fi; if :: resp == NotReady -> if :: true; false :: true fi; all_ready = false :: else fi; if :: true; false :: true fi; worker2_send ? resp; if :: true; false :: true fi; if :: resp == NotReady ->
if :: true; false :: true fi; all_ready = false :: else fi; determined = true; if :: true; false :: true fi; if :: all_ready -> if :: true; false :: true fi; worker1_recv ! Commit; if :: true; false :: true fi; woeker2_recv ! Commit :: else -> if :: true; false :: true fi; worker1_recv ! Abort; if :: true; false :: true fi; worker2_recv ! Abort fi}
proctype Worker1() { mtype resp; if :: true; false :: true fi; worker1_recv ? resp; if :: true; false :: true fi; if :: worker1_ready = true; if :: true; false :: true fi; worker1_send ! Ready :: worker1_ready = false; if :: true; false :: true fi; worker1_send ! NotReady fi; if :: true; false :: true fi; worker1_recv ? worker1_resp}
proctype Worker2() { ...}
Need for Modular Description Mechanism
• Manually inserting faults and fault-handling actions into a model is itself fault-prone. !• Modeling language should have features that support
modular descriptions for faults and fault-handling actions.
9
Current Contribution
• We designed and implemented a modeling language Sandal that is aimed to describe fault-prone distributed systems.!• Some case studies, including two phase commit (2PC)
protocol, show the effectiveness of the language features of Sandal.
10
Sandal
• A process-oriented modeling language with features for describing typical faults:!- unexpected process termination!- timeout in message reception!- random loss of message!!
• Langauge Processor (translator to NuSMV)!- Source code: https://github.com/draftcode/sandal!- You need!• Go (http://golang.org) to build the translator!• NuSMV (http://nusmv.fbk.eu) to verify translated models
11
Example
13
data Message { Ping, Pong } proc PingProc(ch_send channel { Message }, ch_recv channel { Message }) { for { var msg Message send(ch_send, Ping) recv(ch_recv, msg) } } ... init { P0_0: PingProc(ping_to_pong_0, pong_to_ping_0), P1_0: PongProc(pong_to_ping_0, ping_to_pong_0), ping_to_pong_0: channel { Message }, pong_to_ping_0: channel { Message } }
Unexpected Process Termination &!Random Loss of Messages (1)
• @shutdown!- specifies that the process may terminate unexpectedly!• @drop!
- specifies that the channel may lost messages
14
init { P0_0: PingProc(ping_to_pong_0, pong_to_ping_0) @shutdown, P1_0: PongProc(pong_to_ping_0, ping_to_pong_0) @shutdown, ping_to_pong_0: channel { Message } @drop, pong_to_ping_0: channel { Message } @drop, }
Unexpected Process Termination &!Random Loss of Messages (2)
15
Unexpected Termination Random Loss of Messages
Case Study: Experimental Result
18
(arbiter.determined^¬arbiter.all ready ! (¬worker1.resp = Commit ^ ¬worker2.resp = Commit))
Speed LOC Memory
No Fault 0.96 sec 51 26.4 MB
With Timeout 2.88 sec 51 (6) 21.8 MB
With Message Loss 2.11 sec 51 (8) 11.9 MB
With Termination 0.51 sec 51 (6) 17.1 MB
Arch Linux (Kernel 3.12.7) Intel Core i7-3370K @ 3.50GHz 16GB Memory NuSMV 2.5.4 (CUDD 2.4.1 MiniSat2-070721), Spin 6.2.5
Property to be checked:
Result:
Comparison (1): Time & Memory Footprint
19
Sandal Spin NuSMV
No Fault 20.8 MB 128 MB 6.42 MB
With Timeout 21.2 MB 128 MB 6.64 MB
With Message Loss 25.2 MB 128 MB 6.82 MB
With Termination 12.7 MB 128 MB 6.57 MB
Sandal Spin NuSMV
No Fault 0.42 sec 0.87 sec 0.016 sec
With Timeout 0.50 sec 0.89 sec 0.018 sec
With Message Loss 0.95 sec 0.88 sec 0.025 sec
With Termination 0.21 sec 0.95 sec 0.015 sec
Comparison (2): Size of Models
• (n) : # of lines modified / added to "No Fault" version
20
LOC (Diff) Sandal Promela NuSMV
No Fault 45 28 178 (58)
With Timeout 48 (5) 37 (13) 180 (6)
With Message Loss 45 (2) 34 (10) 182 (14)
With Termination 45 (6) 41 (21) 179 (23)
Related Work
• Automatic fault-injection tools targeted to models!- MODIFI [Svenningsson et al, 2010]!- FSAP/NuSMV-SA [Bozzano et al, 2003]!• both are for hardware faults!• modularization problem!
• Model-checking message-based distributed systems!- Rebeca [Sirjani et al, 2004]!• AOP for modeling languages!
- Aspect-Oriented Promela [Ohno & Kishi, 2008]!- Moxa [Yamada & Watanabe, 2005]!• Aspect-Oriented Extension of JML
21
Future Work
• Optimizing the Translator!- Abstraction Refinement, K-Induction, etc.!• AOP/FOP version of Sandal!
- Model-level separation of concerns (parameterization?)!• Probabilistic Models for Faulty Behaviors!
!
• Verifying Multi-Level Models of Self-* Systems!• Compositional Construction of Actor-Based
Group-Wide Reflection [Watanabe, 2013]!- Self-* actions vs. base-level actions
22a"gr
oup"of"
objects�
meta0g
roup�