34
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions Rapid Prototyping of Radiation-Tolerant Embedded Systems on FPGA F.Restrepo-Calle 1 A.Martínez-Álvarez 1 F.R.Palomo 2 H.Guzmán-Miranda 2 M.A.Aguirre 2 S.Cuenca-Asensi 1 1 Computer Technology Department, University of Alicante, Spain 2 Department of Electrical Engineering, University of Sevilla, Spain FPL 2010 August 31 - September 2, 2010 Milano, Italy

Rapid Prototyping of Radiation-Tolerant Embedded …conferenze.dei.polimi.it/FPL2010/presentations/W1_C_3.pdf · Rapid Prototyping of Radiation-Tolerant ... Disadvantages: increment

Embed Size (px)

Citation preview

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Rapid Prototyping of Radiation-TolerantEmbedded Systems on FPGA

F.Restrepo-Calle1 A.Martínez-Álvarez1 F.R.Palomo2

H.Guzmán-Miranda2 M.A.Aguirre2 S.Cuenca-Asensi1

1Computer Technology Department, University of Alicante, Spain2Department of Electrical Engineering, University of Sevilla, Spain

FPL 2010August 31 - September 2, 2010

Milano, Italy

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Outline

1 IntroductionProblemSolutions for Embedded SystemsObjectivesFault Model and Terminology

2 Platform for the rapid prototyping of dependable ESPlatform for fault-tolerant co-designSoftware Hardening EnvironmentReliability Evaluation Tool

3 Case StudyPicoblazeSoftware-based technique: SWIFT-RHardware-based technique: TMRPrototypes evaluation

4 Conclusions and Future Work

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Problem

Miniaturization of electronic components Advantage: microprocessor performance increase Disadvantage: more prone to transient faults [Baumann, 2005]

Transient faults (soft-errors)Induced by radiation - by the ionization of an incidentcharged particleDo not cause permanent damageCan alter signal transfers or stored values provoking errorsin the systemsOccur in: space, atmosphere and even at ground level[Baumann, 2002]

Increasing concern about the fault mitigation inmission-critical, security and safety systems

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Solutions for Embedded Systems

Hardware-based techniques

Usual solution: hardware redundancyLow level structures: ECC, parity bits, TMRMore complex components: co-processors[Mahmood and McCluskey, 1988], functional units [Austin, 1999],. . .Exploiting redundancy in multi-thread/multi-corearchitectures [Gomaa et al., 2003, Mukherjee et al., 2002]

Advantage: high effective solutionDisadvantage: very costly!

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Solutions for Embedded Systems

Software-based techniques

In recent years several proposals based on redundantsoftwareSome examples: EDDI [Oh et al., 2002b], CFCSS [Oh et al., 2002a],SWIFT [Reis et al., 2005b], ARBT [Rebaudengo et al., 2001], . . .Advantage: low cost with acceptable reliabilityDisadvantages: increment code size and execution time

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Solutions for Embedded Systems

Hybrid Hardware/Software approaches

In many cases the optimal solution is an intermediatepoint, which combines software and hardware protectionapproaches HW/SW fault-tolerant co-designExamples: [Bernardi et al., 2006, Reis et al., 2005a]Need for suitable tools to easily explore the design spacein order to find the best trade-off between designconstraints and reliability requirementsGrowing use of FPGAs to prototype ASICs as part of anASIC verification methodology

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Objectives

So in this paper, we present. . .

A rapid prototyping approach for radiation-tolerantembedded systems design

FPGAs are used as development and verification platformto produce HW/SW systems that best meet the design anddependability constraintsMitigation techniques are applied to a high abstractionlevel so the final deployment platform will be an ASIC or anFPGASupported by a hardening platform that is made up of:

Software Development Environment: to implement,automatically apply and evaluate software-only fault toleranttechniquesFT-Unshades: FPGA-based fault emulation tool to assessseveral reliability metrics

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Fault Model and Terminology

Fault Model: Single Event Upset — SEU [Rebaudengo et al., 2001]

Only one bit-flip occurs in a storage cell during the programexecutionWidely used because matches the real fault behavior

Faults classification [Mukherjee et al., 2002]

According to their effect on the program behavior:unACE: fault in a unACE bit, i.e. the program finishes andproduces the expected resultsSDC: Silent Data Corruption - finishes with incorrect resultsHang: abnormal program termination or infinite loop

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Platform for fault-tolerant co-design

HW/SW Fault-Tolerant Co-design

Designer

SW

Design

Tools

HW

Design

Tools

Reliability

Evaluation

Tools

Design

constraints

Reliability

requirements

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Platform for fault-tolerant co-design

HW/SW Fault-Tolerant Co-design - Proposed tools

Software

Hardening

Environment

FT-Unshades

Designer

HW

Design

Tools

Design

constraints

Reliability

requirements

Our tools Third-party tools

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

General scheme

...

Arch. 1

Arch. 2

Arch. n-1

Arch. n

Compiler back-ends

...

Compiler front-ends

Generic

Instruction

Flow

(GIF)

Hardened

source

code

Arch. 1

HardenerArch. 2

Arch. n-1

Arch. n

Hardened

Generic

Instruction

Flow

(HGIF)

Original

source

code

Simulator

Generic Hardening Core

(GH-Core)

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

Advantages

Main advantagesBased on a Generic Architecture permits. . .

to handle multiple microprocessorsto provide an uniform hardening coreto re-target the output to any supported microprocessor

Automatic code transformation based on rules (assembler)Conceived to implement a wide suite of techniques

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

Generic Architecture

Three main topics:Generic Instruction

Address MnemonicGeneric

Operator List

Affected Generic

Flag List

Instruction

Type

Tool

message

Memory ManagementIdentification of memory map (and memory sections)Update memory map when code is inserted:

DilationDisplacementReallocation

Control Flow GraphNode 1

Node 4

Node 3Node 2

Node 5

Node 1: {I1, I2, I3, I4, I5}

Node 2: {I6, I7, I8}

Node 3: {I9, I10}

Node 4: {I11, I12, I13}

Node 5: {I14}

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

Generic Architecture

Sphere of Replication — SoR [Reinhardt and Mukherjee, 2000]

Logic domain of redundant executionInstruction classification (for hardening):

InSoROutSoR

we have applied this concept in a flexible way a SoRwith flexible frontiers facilitates implement selectivesoftware protection

inSoR outSoRInput port

Read from memory

Load a value

Output port

Write into memory SoR

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

Generic Architecture: Control Flow Analysis

Nodes and SubnodesEvery node is subdivided into subnodes after each OutSoRinstruction

Node 1

I1: ______

I2: ______

I3: STORE

I4: ______

I5: ______

Node 1

I1: ______

I2: ______

I3: STORE

I4: ______

I5: ______

Subnode 1

Subnode 2

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

Generic Hardening Core

Generic Hardening Core: HardenerTwo components: Hardener and Instruction Set SimulatorHardener Tool for the design of software-based techniques:

API of hardening routinesFlexible and easy to extend

Allows to automatically apply these techniques:

Receives a GIF and produces the Hardened-GIFUser control from command-lineOptions: method, mcpu, redundancy level,replication times, voter, . . .

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software Hardening Environment

Generic Hardening Core

Generic Hardening Core: ISSInstruction Set Simulator — ISS

Simulates the execution of the GIFChecks original and hardened code functionality - custompragmas with the expected resultsOutputs useful information (code and execution timeoverheads, program characterization, . . . )Simulates SEU faults to preliminary evaluate the reliability

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Reliability Evaluation Tool

Reliability Evaluation Tool: FT-Unshades

SEU-Emulation Tool: FT-Unshades[Guzman-Miranda et al., 2009, Napoles et al., 2007]

FPGA-based platform for the reliability evaluationEmulated bit-flips in the real implementation of the systemby means of partial reconfigurationSmart Table: FT-Unshades extension for the study ofmicroprocessor architectures

Target MUT

COUNTER

COUNTER

SMARTCONTROLLER

Inputs

Outputs

COMP

(a) Smart Table (b) FT-Unshades

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Reliability Evaluation Tool

Reliability Evaluation Tool: FT-Unshades

. . . is used by the European Space Agency — ESA

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Picoblaze

Case study: PicoblazeCompiler front-end and back-end for PicoblazePicoblaze:

8-bit soft-microprocessor widely used in FPGA-basedsystemsStrong restrictions on memory program size andperformance

RTL Picoblaze developed

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Picoblaze

Benchmark suiteBubble sort (bub)Scalar division (div)Fibonacci (fib)Greatest common divisor (gcd)Matrix addition (madd)Scalar multiplication (mult)Matrix multiplication (mmult)Exponentiation (pow)

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software-based technique: SWIFT-R

SWIFT-R

Software protection: SWIFT-R [Reis et al., 2007]TMR-based method aimed to recover faults from the datasection

Build the control flow graphData triplication after inSoR instructionsTriplication of instructions using redundant dataInsertion of majority voters and recovery procedures beforeoutSoR instructions and before conditional branches

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Software-based technique: SWIFT-R

Code and Execution Time Overheads

1,0

1,5

2,0

2,5

3,0

3,5

bub div fib gcd madd mmult mult pow GeoMean

No

rma

lize

d O

verh

ea

ds

Code Overhead Execution Time Overhead

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Hardware-based technique: TMR

TMR: Triple Modular Redundancy

Hardware protection: TMRThe fault tolerant co-design strategy was complemented byincrementally hardening the microprocessor resourcesFive microprocessor versions were developed:

P0: non-hardened RTL PicoblazeP1: with hardware redundancy for Program Counter (PC),Flags and Stack Pointer (SP)P2: all registers in the pipeline protectedP3: with hardware redundancy for PC, Flags, SP, andPipelineP4: full protected

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Prototypes evaluation

Experimental setup

Reliability evaluation: Experimental Setup for FT-UnshadesFor each prototype: Fault injection campaign with selectiveattacks against the microprocessor register sets: registerfile, PC, Flags, SP, and pipelineFor each one of these register sets, 5.000 SEUs injected(one per execution)Bit-flip in a randomly selected clock cycle

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Prototypes evaluation

Reliability evaluation using FT-Unshades

70%

75%

80%

85%

90%

95%

100%

O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H

P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3

bub div fib gcd madd mmult mult pow Average

Perc

enta

ge [%

]

unACE SDC Hang

Fault classification percentages for every test program —non-hardened (O) and

SWIFT-R (H)— running on each processor version (P0 to P3)

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Prototypes evaluation

Reliability vs Hardware Costs

88,0

89,5

91,0

92,5

94,0

95,5

97,0

98,5

100,0

1,00

1,25

1,50

1,75

2,00

2,25

2,50

2,75

3,00

P0 P1 P2 P3 P4

unA

CE f

ault

s pe

rcen

tage

[%]

Nor

mal

ized

har

dwar

e co

st

Microprocessor approaches

Normalized Xilinx primitives cost Normalized FlipFlops/Latches cost

Normalized RAMS cost % unACE faults for non-hardened programs

% unACE faults for SWIFT-R programs

Normalized hardware cost and percentage of unACE faults per microprocessor

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Conclusions and Future Work

We have presented a rapid prototyping approach for thedesign of radiation-tolerant embedded systems usingFPGAThis approach is supported by a flexible hardeningplatform, which facilitates the representation of severaltrade-offs among design constraints, reliability,performance, and costsThe rapid prototyping strategy allows designers to easilyexplore the design space between hardware-only andsoftware-only fault-tolerance techniquesAs case study, several fault-tolerant prototypes based on aRTL implementation of PicoBlaze have been developedand evaluatedThe infrastructure will be extended to support 32-bitmicroprocessors, such as: Microblaze and Leon3

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Thank you for your attention!September 1st 2010, FPL 2010, Milano, Italy

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Austin, T. (1999).DIVA: A reliable substrate for deep submicronmicroarchitecture design.In 32nd Annual International Symposium onMicroarchitecture, (MICRO-32), pages 196–207.Haifa, Israel, Nov 16-18, 1999.

Baumann, R. (2002).Soft errors in commercial semiconductor technology:Overview and scaling trends.IEEE 2002 Reliability Physics Tutorial Notes, ReliabilityFundamentals, page 121.

Baumann, R. (2005).Radiation-induced soft errors in advanced semiconductortechnologies.IEEE Trans. on Device and Materials Reliability,5(3):305–316.

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Bernardi, P., Bolzani, L., Rebaudengo, M., Reorda, M.,Vargas, F., and Violante, M. (2006).A new hybrid fault detection technique forsystems-on-a-chip.IEEE Transactions on Computers, 55(2):185–198.

Gomaa, M., Scarbrough, C., Vjaykumar, T., and Pomeranz,I. (2003).Transient-fault recovery for chip multiprocessors.IEEE MICRO, 23(6):76–83.

Guzman-Miranda, H., Aguirre, M., and Tombs, J. (2009).Noninvasive fault classification, robustness and recoverytime measurement in microprocessor-type architecturessubjected to radiation-induced errors.IEEE Transactions on Instrumentation and Measurement,58(5).

Mahmood, A. and McCluskey, E. (1988).

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Concurrent error-detection using watchdog processors - asurvey.IEEE Transactions on Computers, 37(2):160–174.

Mukherjee, S., Kontz, M., and Reinhardt, S. (2002).Detailed design and evaluation of RedundantMultithreading alternatives.In 29th Annual International Symposium on ComputerArchitecture, pages 99–110.Anchorage, AK, May 25-29, 2002.

Napoles, J., Guzman, H., Aguirre, M., Tombs, J., Munoz, F.,Baena, V., Torralba, A., and Franquelo, L. (2007).Radiation environment emulation for VLSI designs A lowcost platform based on xilinx FPGAs.In IEEE International Symposium on Industrial Electronics,ISIE 2007.

Oh, N., Shirvani, P., and McCluskey, E. J. (2002a).

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Control-flow checking by software signatures.IEEE Transactions on Reliability, 51(1).

Oh, N., Shirvani, P. P., and McCluskey, E. J. (2002b).Error detection by duplicated instructions in super-scalarprocessors.IEEE Transactions on Reliability, 51(1).

Rebaudengo, M., Reorda, M. S., Violante, M., andTorchiano, M. (2001).A source-to-source compiler for generating dependablesoftware.First IEEE International Workshop on Source CodeAnalysis and Manipulation, Proceedings, pages 33–42.

Reinhardt, S. and Mukherjee, S. (2000).Transient fault detection via simultaneous multithreading.In 27th International Symposium on Computer Architecture,pages 25–36.

Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions

Conclusions and Future Work

Vancuver, Canada, Jun 12-14, 2000.

Reis, G., Chang, J., Vachharajani, N., Mukherjee, S.,Rangan, R., and August, D. (2005a).Design and evaluation of hybrid fault-detection systems.In 32nd International Symposium on ComputerArchitecture, Proceedings, pages 148–159.Madison, WI, Jun 04-08, 2005.

Reis, G. A., Chang, J., and August, D. I. (2007).Automatic instruction-level software-only recovery.IEEE Micro, 27(1):36–47.

Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., andAugust, D. I. (2005b).SWIFT: software implemented fault tolerance.CGO 2005: Int Symposium on Code Generation andOptimization, pages 243–254.