View
212
Download
0
Embed Size (px)
Citation preview
1 of 141/14
Design Optimization of Time- and Cost-Constrained
Fault-Tolerant Distributed Embedded Systems
Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo PengEmbedded Systems Lab (ESLAB)
Linköping University, Sweden
2 of 142/14
Motivation
Hard real-time applications Timing constraints Cost constraints
Online preemptive Flexible
Off-line non-preemptive
Predictable
vs.
Faults Predictable Transient Intermittent
Hardware solutions MARS, TTA, X-by-Wire
Permanent faults Costly for transient
faults
Software solutions Re-execution/rollback
recovery Checkpointing/rollback
recovery Replication, primary-backup…
vs. Software solutions Re-execution/rollback
recovery Checkpointing/rollback
recovery Replication, primary-backup…
3 of 143/14
Outline
Motivation
System architecture and fault-model Fault-tolerance techniques
Problem formulation Motivational examples
Tabu-search optimization strategy
Experimental results
Contributions and Message
4 of 144/14
Processes: Static cyclic scheduling
Fault-Tolerant Time-Triggered Systems
...
Transient faults
Processes: Re-execution and replication
S1 S3 S2 S4 S1 S3 S2 S4
TDMA RoundCycle of two rounds
Slot
Time Triggered Protocol (TTP) Bus access scheme:
time-division multiple-access (TDMA) Schedule table located in each TTP
controller: message descriptor list (MEDL)
Messages: Static schedule table
Messages: Fault-tolerant protocol
5 of 145/14
Fault-Tolerant Techniques
P1 P1 P1
Re-execution
N1
P1
P1
P1
Replication
N1
N2
N3
P1
P1
N1
N2
P1
Re-executed replicas
2
6 of 146/14
Problem Formulation
Given Fault model
Number of transient faults in the system period System architecture Application
WCETs, message sizes, periods, deadlines
Application: set of process graphs Architecture: time-triggered system
...
Fault-model: transient faults Determine Schedulable and fault-tolerant design
implementation Fault-tolerance policy assignment Mapping of processes and messages Schedule tables for processes and messages
7 of 147/14
Static Scheduling [Kandasamy et al. 03]
P1
N1: S1
N2: S11
N3: S14
P2 P4
P5
P3
m1
m2
2
P2
P4P3
P5
P1
m1
m2
N1 N2 N3S11
S12
S13
P1
P1
N2
S1
S2
S3
P2
P2
S4
P3
S5
S6
S7 S8 S10
S9
P4
P3 P4
P3 P4 P4
N1
S14
S15 S18
P5 P5
N3
Root schedules
P1
N1: S2
N2: S12
N3: S14
P2 P4
P5
P3
m1
m2
P1
Contingencyschedules
S1 S11
S2 S12
P2
Contingency schedules
Transparentre-execution
Recoveryslack
8 of 148/14
Re-execution vs. Replication
N1 N2
P1
P3
P2
m11P1
P2P3
N1 N2
40504060
5070
N1
N2
TTP
P1 P2
S1S2
P3
Met
A1
N1
N2
TTP
P1 P2 P3
S1S2
Missed
P1N1
N2
TTP
P1
P2
P2
P3
P3
S1S2 m 1m 1 m 2m 2
Deadline
Met
P1 P3P2
m1 m2A2
Replication is better
P1
S1
N1
N2
TTP
P1
S2
P2
P2
P3
P3
Deadline
Missed
m 1m 1
Re-execution is better
9 of 149/14
P1N1
N2
TTP
P2
P3
S1S2
P4
m 2
Missed
Fault-Tolerant Policy Assignment
P1P2P3
N1 N2
40506060
8080
P4 4050
1
N1 N2P1
P4P2
P3
m1
m2
m3
P1N1
N2
TTP
P2
P3
S1S2 m 2
P4
N1
N2
P1
P3
S1S2
P4P2
P1
m 1m 1TTP m 2m 2
P2
m 3m 3
P3
P4
Missed
P1N1
N2
TTP
P2
P3
S1S2 m 2
P4No fault-
tolerance:application crashes
N1
N2
P1
P3
S1S2
P4P2
P1
m 2m 1TTP
Met
Optimization of fault-tolerance
policy assignment
Deadline
10 of 1410/14
Mapping and Fault-Tolerance
P1
P4
P2 P3
m1 m2
m3 m4
P1P2P3P4
N1 N2
40 X606040
70
X70
1
N1 N2
P1N1
N2
TTP
P2
P3
S1S2 m 2
P4
m 4
Best mapping without considering fault-tolerance
Deadline
Missed
P1N1
N2
TTP
P2
P3
S1S2
P4
m 4m 2
P1N1
N2
TTP
P2 P3
S1S2 m 4m 2
P4
Deadline
Met
Simultaneous mapping and fault-tolerance
11 of 1411/14
Optimization Strategy
Design optimization: Fault-tolerance policy assignment Mapping of processes and messages
Root schedules
Three tabu-search optimization algorithms: 1. Mapping and Fault-Tolerance Policy assignment (MRX)
Re-execution, replication or both
2. Mapping and only Re-Execution (MX)3. Mapping and only Replication (MR)
Tabu-search
List scheduling
12 of 1412/14
MRX Tabu-Search Example
P1P2P3
N1 N2
40506060
7575
P4 4050
1
N1 N2P1
P4P2
P3
m1
m2
m3
N1
N2
TTP
P1
P3
S1S2 S2
P4
m2
P2
1101Wait0021TabuP4P3P2P1
1101Wait0021TabuP4P3P2P1 Current
solution
Designtransformations
N1
N2
TTP
P1 P3
S1S2
P4P2
m1
1101Wait0021TabuP4P3P2P1
1101Wait0021TabuP4P3P2P1
Tabu move &worse than best-so-far
S1
N1
N2
P1
P3
S1S2
P4P2
P1
m2
m1
TTP
1200Wait0012TabuP4P3P2P1
1200Wait0012TabuP4P3P2P1
Tabu move &better than best-so-far
N1
N2
TTP
P1 P3
S1S2 S2
P4
m2
P2
1101Wait0021TabuP4P3P2P1
1101Wait0021TabuP4P3P2P1
Non-tabu &worse than best-so-far
N1
N2
TTP
P1
P3
S1S2 S2
P4
m2
P2 P3
1101Wait0021TabuP4P3P2P1
1101Wait0021TabuP4P3P2P1
Non-tabu &worse than best-so-far
N1
N2
P1
P3
S1S2
P4P2
P1
m2
m1
TTP
1200Wait0012TabuP4P3P2P1
1200Wait0012TabuP4P3P2P1 Current
solution
Designtransformations
13 of 1413/14
80
20
Experimental Results
0
10
30
40
50
60
70
90
100
20 40 60 80 100
80Mapping and replication (MR)
20Mapping and re-execution (MX)
Mapping and policy assignment (MRX)
Number of processes
Avgera
ge %
devia
tion
fro
m M
RX
Schedulability improvement under resource constraints
Case study Vehicle cruise controller MRX: schedulable fault-
tolerant application with 65% overhead
14 of 1414/14
Contributions and Message
Contributions Combined re-execution and replication Optimization algorithms for fault-tolerance policy
assignment Efficient contingency schedule generation
Optimization of fault-tolerance policy assignment needed for cost-effective fault tolerance