14
1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop , Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

Embed Size (px)

DESCRIPTION

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems. Viaceslav Izosimov, Paul Pop , Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden. Hardware solutions MARS, TTA, X-by-Wire Permanent faults Costly for transient faults. - PowerPoint PPT Presentation

Citation preview

Page 1: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

1 of 141/14

Design Optimization of Time- and Cost-Constrained

Fault-Tolerant Distributed Embedded Systems

Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo PengEmbedded Systems Lab (ESLAB)

Linköping University, Sweden

Page 2: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

2 of 142/14

Motivation

Hard real-time applications Timing constraints Cost constraints

Online preemptive Flexible

Off-line non-preemptive

Predictable

vs.

Faults Predictable Transient Intermittent

Hardware solutions MARS, TTA, X-by-Wire

Permanent faults Costly for transient

faults

Software solutions Re-execution/rollback

recovery Checkpointing/rollback

recovery Replication, primary-backup…

vs. Software solutions Re-execution/rollback

recovery Checkpointing/rollback

recovery Replication, primary-backup…

Page 3: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

3 of 143/14

Outline

Motivation

System architecture and fault-model Fault-tolerance techniques

Problem formulation Motivational examples

Tabu-search optimization strategy

Experimental results

Contributions and Message

Page 4: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

4 of 144/14

Processes: Static cyclic scheduling

Fault-Tolerant Time-Triggered Systems

...

Transient faults

Processes: Re-execution and replication

S1 S3 S2 S4 S1 S3 S2 S4

TDMA RoundCycle of two rounds

Slot

Time Triggered Protocol (TTP) Bus access scheme:

time-division multiple-access (TDMA) Schedule table located in each TTP

controller: message descriptor list (MEDL)

Messages: Static schedule table

Messages: Fault-tolerant protocol

Page 5: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

5 of 145/14

Fault-Tolerant Techniques

P1 P1 P1

Re-execution

N1

P1

P1

P1

Replication

N1

N2

N3

P1

P1

N1

N2

P1

Re-executed replicas

2

Page 6: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

6 of 146/14

Problem Formulation

Given Fault model

Number of transient faults in the system period System architecture Application

WCETs, message sizes, periods, deadlines

Application: set of process graphs Architecture: time-triggered system

...

Fault-model: transient faults Determine Schedulable and fault-tolerant design

implementation Fault-tolerance policy assignment Mapping of processes and messages Schedule tables for processes and messages

Page 7: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

7 of 147/14

Static Scheduling [Kandasamy et al. 03]

P1

N1: S1

N2: S11

N3: S14

P2 P4

P5

P3

m1

m2

2

P2

P4P3

P5

P1

m1

m2

N1 N2 N3S11

S12

S13

P1

P1

N2

S1

S2

S3

P2

P2

S4

P3

S5

S6

S7 S8 S10

S9

P4

P3 P4

P3 P4 P4

N1

S14

S15 S18

P5 P5

N3

Root schedules

P1

N1: S2

N2: S12

N3: S14

P2 P4

P5

P3

m1

m2

P1

Contingencyschedules

S1 S11

S2 S12

P2

Contingency schedules

Transparentre-execution

Recoveryslack

Page 8: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

8 of 148/14

Re-execution vs. Replication

N1 N2

P1

P3

P2

m11P1

P2P3

N1 N2

40504060

5070

N1

N2

TTP

P1 P2

S1S2

P3

Met

A1

N1

N2

TTP

P1 P2 P3

S1S2

Missed

P1N1

N2

TTP

P1

P2

P2

P3

P3

S1S2 m 1m 1 m 2m 2

Deadline

Met

P1 P3P2

m1 m2A2

Replication is better

P1

S1

N1

N2

TTP

P1

S2

P2

P2

P3

P3

Deadline

Missed

m 1m 1

Re-execution is better

Page 9: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

9 of 149/14

P1N1

N2

TTP

P2

P3

S1S2

P4

m 2

Missed

Fault-Tolerant Policy Assignment

P1P2P3

N1 N2

40506060

8080

P4 4050

1

N1 N2P1

P4P2

P3

m1

m2

m3

P1N1

N2

TTP

P2

P3

S1S2 m 2

P4

N1

N2

P1

P3

S1S2

P4P2

P1

m 1m 1TTP m 2m 2

P2

m 3m 3

P3

P4

Missed

P1N1

N2

TTP

P2

P3

S1S2 m 2

P4No fault-

tolerance:application crashes

N1

N2

P1

P3

S1S2

P4P2

P1

m 2m 1TTP

Met

Optimization of fault-tolerance

policy assignment

Deadline

Page 10: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

10 of 1410/14

Mapping and Fault-Tolerance

P1

P4

P2 P3

m1 m2

m3 m4

P1P2P3P4

N1 N2

40 X606040

70

X70

1

N1 N2

P1N1

N2

TTP

P2

P3

S1S2 m 2

P4

m 4

Best mapping without considering fault-tolerance

Deadline

Missed

P1N1

N2

TTP

P2

P3

S1S2

P4

m 4m 2

P1N1

N2

TTP

P2 P3

S1S2 m 4m 2

P4

Deadline

Met

Simultaneous mapping and fault-tolerance

Page 11: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

11 of 1411/14

Optimization Strategy

Design optimization: Fault-tolerance policy assignment Mapping of processes and messages

Root schedules

Three tabu-search optimization algorithms: 1. Mapping and Fault-Tolerance Policy assignment (MRX)

Re-execution, replication or both

2. Mapping and only Re-Execution (MX)3. Mapping and only Replication (MR)

Tabu-search

List scheduling

Page 12: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

12 of 1412/14

MRX Tabu-Search Example

P1P2P3

N1 N2

40506060

7575

P4 4050

1

N1 N2P1

P4P2

P3

m1

m2

m3

N1

N2

TTP

P1

P3

S1S2 S2

P4

m2

P2

1101Wait0021TabuP4P3P2P1

1101Wait0021TabuP4P3P2P1 Current

solution

Designtransformations

N1

N2

TTP

P1 P3

S1S2

P4P2

m1

1101Wait0021TabuP4P3P2P1

1101Wait0021TabuP4P3P2P1

Tabu move &worse than best-so-far

S1

N1

N2

P1

P3

S1S2

P4P2

P1

m2

m1

TTP

1200Wait0012TabuP4P3P2P1

1200Wait0012TabuP4P3P2P1

Tabu move &better than best-so-far

N1

N2

TTP

P1 P3

S1S2 S2

P4

m2

P2

1101Wait0021TabuP4P3P2P1

1101Wait0021TabuP4P3P2P1

Non-tabu &worse than best-so-far

N1

N2

TTP

P1

P3

S1S2 S2

P4

m2

P2 P3

1101Wait0021TabuP4P3P2P1

1101Wait0021TabuP4P3P2P1

Non-tabu &worse than best-so-far

N1

N2

P1

P3

S1S2

P4P2

P1

m2

m1

TTP

1200Wait0012TabuP4P3P2P1

1200Wait0012TabuP4P3P2P1 Current

solution

Designtransformations

Page 13: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

13 of 1413/14

80

20

Experimental Results

0

10

30

40

50

60

70

90

100

20 40 60 80 100

80Mapping and replication (MR)

20Mapping and re-execution (MX)

Mapping and policy assignment (MRX)

Number of processes

Avgera

ge %

devia

tion

fro

m M

RX

Schedulability improvement under resource constraints

Case study Vehicle cruise controller MRX: schedulable fault-

tolerant application with 65% overhead

Page 14: Design Optimization of  Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

14 of 1414/14

Contributions and Message

Contributions Combined re-execution and replication Optimization algorithms for fault-tolerance policy

assignment Efficient contingency schedule generation

Optimization of fault-tolerance policy assignment needed for cost-effective fault tolerance