Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
From Experimental Assessmentof Fault-Tolerant Systems
to Dependability Benchmarking
Workshop on Fault-Tolerant Parallel and Distributed Systems — April 19, 2002
Int. Parallel & Distributed Processing Symp. (IPDPS-2002) — Fort Lauderdale, FL, USA
Jean Arlat
DBench
Dependability BenchmarkingWork partially supported by project IST 2000-25425
Dependability Assessment
ObjectivesEvaluation of Dependability Measures (Reliability, Availability, etc.)Verification of Properties
Nominal ServiceService in presence of Faults
Characterization of Behavior in Presence of FaultsFailure modesEfficiency of fault tolerance
Methods and TechniquesAxiomatic
Stochastic processesModel checking Empirical
Field measurement Robustness testing Fault injection
Fault Tolerance Validation
Validation of fault tolerancewrt specific inputs it is designed to deal with: the faults
FT mechanisms = humanartefacts (not perfect)
Calibration of models
Formal approaches limits
Threats = rare event
Impact on dependabilitymeasures
Estimation of FT efficiency
Experimental approaches
Controlled experiments
Fault Tolerance (FT)Dependability
Fault Injection
Fault Injection
Test and evaluation of fault-tolerant systems& FT mechanisms
Explicit characterization of faulty behaviors
Target System
Activity
Faults
InputError
Processing
Valid
Invalid
Output
8
Calibration & Validation
Analytical & Experimental Evaluations
Target System 1
5
Fault Injection
Experiments
6
7
Constructionof Empirical & Physical
Models
Estimation of FTMs
Coverage Parameters
Processing
by Stochastic Processes
3
4Evaluation of
Dependability Measures
2Construction ofAxiomatic Models
Fault Injection as A Design Aid
—4 C T,1 N
1 2 3
4
μ μ
2
4 +H 4 CT N 3 +H 3 CT N
4 ( C—
T,2 + C—
T,3 ) N
N3 C—
T N
Model
10
10
10
/μ
+1
+2
+3
10-4 10-3 10-2
NAC Std - AMp V2
NAC Std - AMp V1
NAC Std - AMp V2.5
NAC Duplex - AMp V2.5
MTFF networkMTFF node
NAC/AMp
spare node
Token ring
Host Host Host Host
Target Architecture
NAC/AMp NAC/AMp NAC/AMp
Coverage factors— — —
NAC Std - AMp V 1
NAC Std - AMp V 2
NAC Std -AMp V 2.3
NAC Duplex - AMp V 2.5
CT CT,1 CT,2 CT,3
79.08% 2.32% 11.77% 6.83%
85,02% 8.73% 2.80% 3,45%
90,32% 7.79% 1,05% 0,84%
99.55% 0.32% 0.00% 0.12%
Target system
FaultInjection
[ESPRIT Project Delta-4]
(Software) Component-based Development
Integration of previously developed components (commercial or not) OTS = either COTS or OSSMain advantages
productivity and time to market incorporation of technology advances compatibility with industry standards quality (widely deployed components)
Applications Requiring High-level of Dependability
Lack of observability and controlability : ? Dependability
Global cost (development, validation, usage, etc.) ?
Fault Injection-basedDependability Characterization of COTS SW
�
�
�
��
�
0% 10% 20% 30% 40% 50%
SunOS 5.5
SunOS 4.13
QNX 4.24
QNX 4.22
OSF-1 4.0
OSF-1 3.2
NetBSD
Lynx
Linux
Irix 6.2
Irix 5.3
HPUX 10.20
HPUX 9.05
FreeBSD
AIX
Abort % Silent % Restart %
�Catastrophic
Normalized failure rate (%)
AIX
FreeBSD
HP-UX B.10.20
Linux
LynxOS
QNX 4.24
SunOS 5.5
NetBSD
Irix 6.2
Irix 5.3
HP-UX B.9.05
OSF-1 3.2
OSF-1 4.0
QNX 4.22
SunOS 4.13
Abort
SilentRestart
Catastrophic
0 10 20 30 40 50
9
15 « COTS » OSs[Koopman & DeVale 99 (FTCS-29)]
Invalid parameters in system calls at POSIX Interface
Ext. faults
0
10
20
30
40
50
60
70
80
90
ErroneousResult
Appl.Hang
SystemHang
KernelDebug.
Exception ErrorStatus
NoObs.
Ob
serv
atio
n % Int. faults
Chorus microkernel[Arlat et al. 2002 (ToC Feb.)]
Bit flips- on parameters of kernel calls (ext.) - in kernel memory space (int.)
MAFALDA
Faul InjectionWell-Accepted by Industry as a Whole
ProviderIBM, Intel, Sun MicroSystems,…
IntegratorAnsaldo Segnalamento Ferroviario, Astrium,DaimlerChrysler, Saab Ericsson Space, Siemens,Technicatome, THALES, Volvo,…
StakeholderElectricité de France, ESA, NASA (JPL),…
ConsultantCritical Software, Cigital,…
Dependability Benchmarking
DependabilityAssessment
Performance Benchmarking
DependabilityBenchmarking +
Naive View … :-)
More Realistic View
DependabilityBenchmarking
Agreement/AcceptanceUsefulnessFairnessUsabilityPortability
etc.
DependabilityAssessment
Performance Benchmarking
Desired Properties
Dependability Benchmarking Framework
Analysis
Experimentation
Target System
FaultloadWorkload
Experimental Features& Measures
AComprehensive
Measures
Modeling
B
DBench
Dependability Benchmarking
Challenges
?Real faultsReal faultsReal faultsReal Faults
Real faultsReal faultsReal faultsFault Injection
Techniques
Real faultsReal faultsReal faults
Faultload(s)for
Dependability Benchmarking
• Agreement/Acceptance
• Representativeness
• Usefulness,
• Portability,…
Properties Dimensions
• Faultload
• Workload
• Measurements
• Measures
• Representativeness• Faultload
The Fault Injection Techniques
Logical &Information
Physical
MEAN
SimulationModel
Prototype/Real System
ModèleSimulation-based
Software-Implemented
Physical
Heavy-ions FIST,…EM perturbations TU ViennaPins MESSALINE, Scorpion, DEFOR, RIFLE, AFIT, ...
TARGET SYSTEM
Built-in test devices (SCIFI) FIMBUL
system DEPEND, REACT, ...RT Level ASPHALT, ...Logical Gate Zycad, Technost, ...Switch FOCUS, ...
Wide Range MEFISTO, VERIFY,...
communication FIMD debugger FIESTA task FIATexecutive Ballista, (DE)FINE, MAFALDA,memory DEF.I, SOFIT, ...instr. set FERRARIprocessor Xception, …
Compile-timesoftware mutation
SESAME, ...
Target System Levels & Fault Pathology
M
i
d
d
l
e
w
a
r
e
K
e
r
n
e
l
-
d
e
v
i
c
e
L
o
g
i
c
R
T
L
A
l
g
o
r
i
t
h
m
i
c
A
p
p
l
i
c
a
t
i
o
n
O
p
e
r
a
t
i
o
n
X
X
X
O
O
O
O
OOO
X: Fault locations O: Observation locations
Target System Levels & Fault Pathology
M
i
d
d
l
e
w
a
r
e
K
e
r
n
e
l
-
d
e
v
i
c
e
L
o
g
i
c
R
T
L
A
l
g
o
r
i
t
h
m
i
c
A
p
p
l
i
c
a
t
i
o
n
O
p
e
r
a
t
i
o
n
X
X
X
O
O
O
O
OOO
X
O
O
O
O
O
OO
O
X: Fault locations O: Observation locations
Target System Levels & Fault Pathology
M
i
d
d
l
e
w
a
r
e
K
e
r
n
e
l
-
d
e
v
i
c
e
L
o
g
i
c
R
T
L
A
l
g
o
r
i
t
h
m
i
c
A
p
p
l
i
c
a
t
i
o
n
O
p
e
r
a
t
i
o
n
X
X
XO
O
O
OOO
X
O
O
OX
O
O
OO
O
X: Fault locations O: Observation locations
Target System Levels — Ref. & Obs. dist.
dr1
dr2 do1
do2Dist. to fault ref. Dist. to observation
M
i
d
d
l
e
w
a
r
e
K
e
r
n
e
l
-
d
e
v
i
c
e
L
o
g
i
c
R
T
L
A
l
g
o
r
i
t
h
m
i
c
A
p
p
l
i
c
a
t
i
o
n
O
p
e
r
a
t
i
o
n
X
X
X
OOO
Mutation vs. Real Software Faults
10%
30%60%
5%
13%
82%
Sets of Errors Provoked => 395 distinct errors
140
<— Mutation Data Base —>
2272 rec => 348 Distinct Errors
208 47
<— Real Fault Data Base —>
1450 rec => 255 Distinct Errors
ImmediatePropagatedOther errorsCommon errors
7%8%
85%
Impact of the Mutation Experiments(wrt Real Faults)
2272
Not found in error set created by real faults
Critical software from civil nuclear field - 12 programming faults
[Daran & Thévenod-Fosse 1996 — ISSTA’96]
SWIFI vs. Software Faults
SW Fault Classification (ODC)AssignmentCheckingInterfaceTimingAlgorithmFunction
Can be (easily) emulated by SWIFI
-> Main open issues are related to fault-trigerring conditions?
SWIFI Bit-flips vs. SEUs
Computerized system (80C51 controller)Activity: 6x6 matrix multiplication
3%
46% 51%
Sequence lossErroneous resultTolerated
SWIFI Bit-flips
1.5%
44% 54.5%
SEUs Radiation
12245
[Velazco et al. 2000 — IEEE ToNS Dec. 2000]
Several FI techniques
MARS fault-tolerantdistributed system(prior version of TTP architecture)
Heavy-ion radiation
Electromagnetic interferences
Pin-level (forcing)
[Karlsson et al. 1998 — DCCA-5]
Scan Chain-Implemented Fault Injectionvs. Simulation
32-bit Processor (Saab Ericsson Space)Control program
Exceptions16%
Control Flow 3%
Other 1%Failure 4%
Overwritten
61% Latent15%
Exceptions23%
Control Flow 1%
Other 0%Failure 6%
Overwritten
42%
Latent28%
SCIFI Simulation (VHDL)
[Folkesson et al. 1998 — FTCS-28]
33405 7990
Comprehensive & Coordinated Study
Physical Faults (VHDL Simulation and SWIFI)
Software FaultsApplication (Mutation, Controlled-SWIFI)OS (Bit-flips on system call parameters, Invalid system call parameters,Bit-flips on internal function calls, real faults)
Operator & Maintenance Administrator Faults(emulation scripts, real faults from field data and interviews)
Software Faults in OSs
Target:Linux OSScheduling component
Invalid APIparameter
Bit-flippedAPI parameter
Bit-flippedinternal function
parameter
507 1890 552
More information
DBench - Dependability Benchmarking[IST Project 2000-25425]
-> http://www.laas.fr/dbench/
IFIP WG 10.4 - SIGDeBSpecial Interest Group on Dependability Benchmarking
-> http://www.dependability.org/wg10.4/SIGDeB