39
DEPENDABLE PROCESSOR DESIGN Matteo Carminati Politecnico di Milano - October 31st, 2012 Partially inspired by P . Harrod (ARM) presentation at the Test Spring School 2012 - Annecy (France)

DEPENDABLE PROCESSOR DESIGN - Intranet DEIBhome.deib.polimi.it/mcarminati/doc/carminati_slides_deib.pdf · DEPENDABLE PROCESSOR DESIGN Matteo Carminati Politecnico di Milano - October

  • Upload
    lytuyen

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

DEPENDABLEPROCESSOR DESIGN

Matteo CarminatiPolitecnico di Milano - October 31st, 2012

Partially inspired by P. Harrod (ARM) presentation at the Test Spring School 2012 - Annecy (France)

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

OUTLINE

2

What?

Problem StatementPreliminary Definitions

Where?

Interested FieldsStandards

Why?

Pursued ObjectivesState of the Art

How?

Innovative Solutions

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

PROBLEM STATEMENTGuarantee a system to:

• Match specifications

• Fulfill requirements

• Meet constraints

• Provide real-time response

even when faults occur!

We want the system to be dependable.

3Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

4Wha

t

“It is that property of a computer system such that reliance can justifiably be placed

on the service it delivers.”J.C. Laprie [6]

Dependability is an abstraction comprising a plethora of quantities.

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

5

Probability that the system will operate correctly in a specified operating

environment up until time t.

R(t) = P(not failed during [0, t])

Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

6

Probability that the system will be operational at time t.

A(t) = P(not failed at time t)

Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

7

The absence of undesired and unplanned event that results in a specific level of loss (i.e. accident).

Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

8

The absence of impropersystem state alterations.

Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

9

Probability that the system can be repaired until time t.

M(t) = P(repaired during [0, t])

Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

RELIABILITY

AVAILABILITY

SAFETY

INTEGRITY

MAINTAINABILITY

TESTABILITY

DEPENDABILITY

10

The ability to test for certain attributes within a system.

Related to maintainability: importance of minimizing time required to identify and

locate specific problems

Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

These quantities have different, sometimes contradictory, goals: their trade-off is to be

maximized while designing a new electronic system.

11Wha

t

ROBUSTNESSAbility of a system to

continue functioning despite the presence of faults, even if

the system performance may be altered (always in a safe way), until the faults are

corrected.

FUNCTIONALSAFETY

Absence of unreasonable risk due to hazards caused by malfunctioning behavior

of electronic systems.

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

WHAT IS A FAULT?

12Wha

t Fault Free Latency Fault Free

Fault Error Detection

RepairRecovery

Outage

t

FAULT a defect within the systemERROR a deviation from the required operationFAILURE the system fails to perform its required function

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

FAULT TAXONOMY

13Wha

tFAULT

SYSTEMATICRANDOM

HW SWHW

PERMANENT (hard): shorts, stuck-at, stuck-open

INTERMITTENT

TRANSIENT (soft): SEE, SBU, MBU, SET

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

AccidentHazard

WHAT IS AN ACCIDENT?

14Wha

tFault Error Failure

Fault Error Failure

Fault Error Failure

state of the system that in certain environmental situations

may lead to an accident

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 15

SFF

<60%

>60% && <90%

>99%

-

SIL 1

SIL 2SIL 3

Wha

tSAFETY INTEGRITY LEVEL

The relative level of risk reduction provided by a safety function.

SIL - IEC 61508

HFT

>90% && <99%

0 1 2

SIL 2

SIL 3SIL 4

SIL 1

SIL 3

SIL 4SIL 4

SIL 2

Safe Failure FractionRatio between the sum ofsafe hazards plus detecteddangerous hazards and thesum of safe hazards plusall dangerous hazards.

Hardware Fault ToleranceA HFT on N means that

N+1 faults could cause a lossof the safety function.

From [1].

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

FAULT-RELATED PROPERTIES

• Fault IgnoreThe fault does not require to be detected nor mitigated.

• Fault DetectionThe result can be incorrect, but the fault must be identified.

• Fault ToleranceThe fault is to be mitigated and the provided result correct.

• Fault DiagnosisThe result must be correct and the faulty unit is to be identified.

16Wha

t

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

CRITICAL SYSTEMS• MISSION

AerospaceRailway

• SAFETY

Nuclear power stationsMedical devicesAutomotive

• BUSINESS

Account managementTransaction systems

17Whe

re

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

CRITICAL SYSTEMS• MISSION

Aerospace - DO-178B/DO-254 Railway - EN 50128

• SAFETY

Nuclear power stations - IEC 60880Medical devices - IEC 60601Automotive - ISO 26262

• BUSINESS

Account managementTransaction systems

18Whe

re

STANDARDS

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

ISO 26262: FOCUS

19Whe

re

Breaking: ABS, anti-skid, ...Engine management, power train

Driver assistant,lane departure

Passenger safetyair bags

Electric/hybridenergy system

From [1].

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

ISO 26262: FOCUS

20Whe

re

REQUIREMENTS

•Architecture complianceMeasures to achieve system safety in case of random HW failures.

•Process complianceGuidelines for designing processes and HW/SW architectures to avoid systematic failures.

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

ISO 26262: FOCUS

21Whe

re

IEC 61508 ISO 26262 Application Example

SIL 4

SIL 3

SIL 2SIL 1

-ASIL D

ASIL CASIL BASIL A

Railway signal control

Brake-by-wire, EPS, ...

Battery management

Rear lightsAutomotive dashboard

ASILAUTOMOTIVE SAFETY

INTEGRITY LEVEL

Ex: ASIL D means >99% faults must be detected and the probability of violationof safety goal due to HW random failures shall be less than 10 FIT

(1 FIT = 1 failure in1 billion of hours)

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

ISO 26262: FOCUS

22Whe

reHOW TO ACHIEVE THE ASIL

Setting up functionalsafety management

Defining safety goal

Improving the process Improving the product

Avoid systematic failures

Detect/tolerateHW random failures

Avoid/detectdependent failures

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

NON-CRITICAL SYSTEMS

23Whe

re

• Domestic appliance

• Entertainment devices

• Distribution networks

• Wellness

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

NON-CRITICAL SYSTEMS

• Domestic appliance

• Entertainment devices

• Distribution networks

• Wellness

24Whe

re

Dependability

Performance Power

TRADE-OFF

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

GOALS

25Why

Design dependable processors to:

• Reduce the number hazards and accidents

• Increase system safety

• Meet standards

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

STATE OF THE ART

26Why

ARCHITECTURAL APPROACH

AA

B

A

B

A

B

C VOTI

NG

1oo1

Rs = Ra

2oo2

Rs = 1 - (1 - Ra) x (1 - Rb)

1oo2

Rs = Ra x Rb2oo3

Rs = 1 - (1 - RaRb) x(1 - RbRc) x (1 - RaRc)

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

STATE OF THE ART

27Why

DIVERSITY“different solutions satisfying the same requirement

with the aim of independence” - ISO 26262

A B

• Reduces HW systematic failures

• Prevents, reduces, or detects common cause failures replacing the need for complex measures

ACh

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

DUAL CORE LOCK-STEP

28Why

Homogeneous Redundancy

CPUmaster

CPUcheckerCOMP

SW

• High diagnostic coverage• Negligible SW overhead

• Significant HW overhead• Significant power consumption

increase• Susceptible to common-cause

and HW systematic failures• Poor diagnostic info and availability

Achieves ASIL D

ExampleFreescale MPC5746M

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

CHALLENGE & RESPONSE

29Why

SW cross-exchange between 2 independent units

• Common-cause failures detection

• HW/SW systematic failures detection

• Significant HW and power consumption increase

• Significant SW and performance overhead

• Poor transient fault coverage, reusability, and availability

• Slow error detection latency

Achieves ASIL CCPUmain

CPUsecondary

SW1

serial interface

SW2

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

E-GAS CONCEPT

30Why

SW diversified redundancy with 2 independent units

Why

• Low HW overhead• SW systematic failures

detection

• Significant SW and performance overhead

• Poor transient fault coverage, reusability, and availability

• Susceptible to common-cause failures

• Slow error detection latency

Achieves ASIL BCPUmain MISR

SW1 SW2

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

ExampleLEON3 FT

HARDENED BY-DESIGN

31WhyEach processor functional units is independently hardened

Why

• Low HW overhead• Low performance

overhead• Optimized solution

• Significant design overhead• Need to know processor internal

description

• Very low reusability, very specific solution

hardenedCPU

SW

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 32HowTIGHTLY COUPLED 2 CORE

Asymmetric Redundancy

CPUmaster

SW

• Low HW and power consumption overhead

• Negligible SW overhead• Common-cause and HW

systematic failures detection• Fast error detection

latency, good availability

• Very detailed analysis required• CPU interface needed

Achieves ASIL Doptimizedsupervisorch

ecke

r

CPU interface

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 33HowYOGITECH’S FAULT-ROBUST

• The supervisor is designed exploiting a white-box approach• Meets IEC 61508 and ISO 26262 requirements• One main supervisor for the CPU and a set of remote

supervisors, one for each specific region of the system• Hardware-centric approach

CPUmaster

SW

main supervisor

CPU interface rem

ote

supe

rviso

rsrobustnet

From [4].

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 34How

The CPU Checking Unit checks the instructions execution, the program flow, and the data processing

• The CPU sniffer collects, compacts, codes, and buffers signals from the CPU and forwards them to the supervisors

• Each supervisor is composed by: a data-path, a sequencer and a checker

The System Control Unit decides if the system is in a wrong state and performs necessary actions

MAIN SUPERVISOR

CPUinterface

robustnet

System Control Unit

CPU Checking Unit

main supervisor

CPUinterface

CPU checking unit

CPU

sniffe

r

Dat

a su

p.

Mod

e su

p.In

struc

t. exe

c. su

p.D

ata

addr

. sup

.

system control unit

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 35How

• The memory supervisor provides the possibility to store ECC codes and to share it with multiple memories

• Peripheral supervisors implement a hardware verification component

• The bus supervisors monitor sources and sinks of the bus and perform data integrity checks

REMOTE SUPERVISORS

robustnet

Memory Supervisors

remote supervisors

Peripheral Supervisors

Bus Supervisors

Custom Supervisors

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 36How

METHODOLOGY FLOWSafety

Requirements Specification

Failure Modes and Effects

Analysis

FaultInjection

SFF/DCreports

Safe Failure Fraction - SFFDiagnostic Coverage - DC

DESIGN

supported byautomatic tools

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati 37How

RESULTS

• PHILIPS SJA2510 FlexRay microcontroller

• <30% HW overhead for CPU protection

• <10% HW overhead for memory protection

• A greater level of optimization can be reached if the configuration is more application specific

From [4].

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

CONCLUSIONS

38

OLD TREND NEW TREND

• HW is unreliable by definition• HW is stupid: in case of failure

it cannot tell what is going on• HW/SW redundancy is the

only way to guarantee the availability of safety-critical systems

• Design-for-Uncertainty must become a new paradigm

• FMEA till the gate level should become a de facto standard

• New architectures should embed methods to detect and control errors

The proposed platform-based solution aims at reducing of HW and SW costs needed to implement fault robust MCUs in adherence with IEC 61508 SIL3.This is achieved by implementing an optimized HW CPU fault detection, by providing dedicated HW to replace, support or supplement SW tests and by distributing robustness to the whole SoC.The proposed approach is scalable, flexible, portable and reusable by design.

DEPENDABLE PROCESSOR DESIGN - Matteo Carminati

BIBLIOGRAPHY

39

1. P. Harrod, Dependable Processor Design - TSS presentation, 2012

2. C. Bolchini, Dependable Systems - Course slides, 2012

3. M. Bellotti, R. Mariani, How future automotive functional safety requirements will impact microprocessors design - Microelectronics Reliability, 2010

4. R. Mariani, P. Fuhrmann, B. Vittorelli, Fault-robust microcontrollers for automotive applications, IEEE International On-Line Testing Symposium (IOLTS), 2006

5. M. Baleani, A. Ferrari, L. Mangeruca, A. Sangiovanni-Vincentelli, M. Peri, S. Pezzini, Fault-Tolerant Platforms for Automotive Safety-Critical Applications, International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), 2003

6. J.C. Laprie, Dependable Computing: Concepts, Limits, Challenges, IEEE International Symposium on Fault-Tolerant Computing, 1995