Upload
trinhkhuong
View
231
Download
1
Embed Size (px)
Citation preview
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Rapid Prototyping of Radiation-TolerantEmbedded Systems on FPGA
F.Restrepo-Calle1 A.Martínez-Álvarez1 F.R.Palomo2
H.Guzmán-Miranda2 M.A.Aguirre2 S.Cuenca-Asensi1
1Computer Technology Department, University of Alicante, Spain2Department of Electrical Engineering, University of Sevilla, Spain
FPL 2010August 31 - September 2, 2010
Milano, Italy
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Outline
1 IntroductionProblemSolutions for Embedded SystemsObjectivesFault Model and Terminology
2 Platform for the rapid prototyping of dependable ESPlatform for fault-tolerant co-designSoftware Hardening EnvironmentReliability Evaluation Tool
3 Case StudyPicoblazeSoftware-based technique: SWIFT-RHardware-based technique: TMRPrototypes evaluation
4 Conclusions and Future Work
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Problem
Miniaturization of electronic components Advantage: microprocessor performance increase Disadvantage: more prone to transient faults [Baumann, 2005]
Transient faults (soft-errors)Induced by radiation - by the ionization of an incidentcharged particleDo not cause permanent damageCan alter signal transfers or stored values provoking errorsin the systemsOccur in: space, atmosphere and even at ground level[Baumann, 2002]
Increasing concern about the fault mitigation inmission-critical, security and safety systems
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Solutions for Embedded Systems
Hardware-based techniques
Usual solution: hardware redundancyLow level structures: ECC, parity bits, TMRMore complex components: co-processors[Mahmood and McCluskey, 1988], functional units [Austin, 1999],. . .Exploiting redundancy in multi-thread/multi-corearchitectures [Gomaa et al., 2003, Mukherjee et al., 2002]
Advantage: high effective solutionDisadvantage: very costly!
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Solutions for Embedded Systems
Software-based techniques
In recent years several proposals based on redundantsoftwareSome examples: EDDI [Oh et al., 2002b], CFCSS [Oh et al., 2002a],SWIFT [Reis et al., 2005b], ARBT [Rebaudengo et al., 2001], . . .Advantage: low cost with acceptable reliabilityDisadvantages: increment code size and execution time
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Solutions for Embedded Systems
Hybrid Hardware/Software approaches
In many cases the optimal solution is an intermediatepoint, which combines software and hardware protectionapproaches HW/SW fault-tolerant co-designExamples: [Bernardi et al., 2006, Reis et al., 2005a]Need for suitable tools to easily explore the design spacein order to find the best trade-off between designconstraints and reliability requirementsGrowing use of FPGAs to prototype ASICs as part of anASIC verification methodology
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Objectives
So in this paper, we present. . .
A rapid prototyping approach for radiation-tolerantembedded systems design
FPGAs are used as development and verification platformto produce HW/SW systems that best meet the design anddependability constraintsMitigation techniques are applied to a high abstractionlevel so the final deployment platform will be an ASIC or anFPGASupported by a hardening platform that is made up of:
Software Development Environment: to implement,automatically apply and evaluate software-only fault toleranttechniquesFT-Unshades: FPGA-based fault emulation tool to assessseveral reliability metrics
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Fault Model and Terminology
Fault Model: Single Event Upset — SEU [Rebaudengo et al., 2001]
Only one bit-flip occurs in a storage cell during the programexecutionWidely used because matches the real fault behavior
Faults classification [Mukherjee et al., 2002]
According to their effect on the program behavior:unACE: fault in a unACE bit, i.e. the program finishes andproduces the expected resultsSDC: Silent Data Corruption - finishes with incorrect resultsHang: abnormal program termination or infinite loop
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Platform for fault-tolerant co-design
HW/SW Fault-Tolerant Co-design
Designer
SW
Design
Tools
HW
Design
Tools
Reliability
Evaluation
Tools
Design
constraints
Reliability
requirements
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Platform for fault-tolerant co-design
HW/SW Fault-Tolerant Co-design - Proposed tools
Software
Hardening
Environment
FT-Unshades
Designer
HW
Design
Tools
Design
constraints
Reliability
requirements
Our tools Third-party tools
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
General scheme
...
Arch. 1
Arch. 2
Arch. n-1
Arch. n
Compiler back-ends
...
Compiler front-ends
Generic
Instruction
Flow
(GIF)
Hardened
source
code
Arch. 1
HardenerArch. 2
Arch. n-1
Arch. n
Hardened
Generic
Instruction
Flow
(HGIF)
Original
source
code
Simulator
Generic Hardening Core
(GH-Core)
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
Advantages
Main advantagesBased on a Generic Architecture permits. . .
to handle multiple microprocessorsto provide an uniform hardening coreto re-target the output to any supported microprocessor
Automatic code transformation based on rules (assembler)Conceived to implement a wide suite of techniques
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
Generic Architecture
Three main topics:Generic Instruction
Address MnemonicGeneric
Operator List
Affected Generic
Flag List
Instruction
Type
Tool
message
Memory ManagementIdentification of memory map (and memory sections)Update memory map when code is inserted:
DilationDisplacementReallocation
Control Flow GraphNode 1
Node 4
Node 3Node 2
Node 5
Node 1: {I1, I2, I3, I4, I5}
Node 2: {I6, I7, I8}
Node 3: {I9, I10}
Node 4: {I11, I12, I13}
Node 5: {I14}
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
Generic Architecture
Sphere of Replication — SoR [Reinhardt and Mukherjee, 2000]
Logic domain of redundant executionInstruction classification (for hardening):
InSoROutSoR
we have applied this concept in a flexible way a SoRwith flexible frontiers facilitates implement selectivesoftware protection
inSoR outSoRInput port
Read from memory
Load a value
Output port
Write into memory SoR
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
Generic Architecture: Control Flow Analysis
Nodes and SubnodesEvery node is subdivided into subnodes after each OutSoRinstruction
Node 1
I1: ______
I2: ______
I3: STORE
I4: ______
I5: ______
Node 1
I1: ______
I2: ______
I3: STORE
I4: ______
I5: ______
Subnode 1
Subnode 2
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
Generic Hardening Core
Generic Hardening Core: HardenerTwo components: Hardener and Instruction Set SimulatorHardener Tool for the design of software-based techniques:
API of hardening routinesFlexible and easy to extend
Allows to automatically apply these techniques:
Receives a GIF and produces the Hardened-GIFUser control from command-lineOptions: method, mcpu, redundancy level,replication times, voter, . . .
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software Hardening Environment
Generic Hardening Core
Generic Hardening Core: ISSInstruction Set Simulator — ISS
Simulates the execution of the GIFChecks original and hardened code functionality - custompragmas with the expected resultsOutputs useful information (code and execution timeoverheads, program characterization, . . . )Simulates SEU faults to preliminary evaluate the reliability
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Reliability Evaluation Tool
Reliability Evaluation Tool: FT-Unshades
SEU-Emulation Tool: FT-Unshades[Guzman-Miranda et al., 2009, Napoles et al., 2007]
FPGA-based platform for the reliability evaluationEmulated bit-flips in the real implementation of the systemby means of partial reconfigurationSmart Table: FT-Unshades extension for the study ofmicroprocessor architectures
Target MUT
COUNTER
COUNTER
SMARTCONTROLLER
Inputs
Outputs
COMP
(a) Smart Table (b) FT-Unshades
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Reliability Evaluation Tool
Reliability Evaluation Tool: FT-Unshades
. . . is used by the European Space Agency — ESA
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Picoblaze
Case study: PicoblazeCompiler front-end and back-end for PicoblazePicoblaze:
8-bit soft-microprocessor widely used in FPGA-basedsystemsStrong restrictions on memory program size andperformance
RTL Picoblaze developed
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Picoblaze
Benchmark suiteBubble sort (bub)Scalar division (div)Fibonacci (fib)Greatest common divisor (gcd)Matrix addition (madd)Scalar multiplication (mult)Matrix multiplication (mmult)Exponentiation (pow)
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software-based technique: SWIFT-R
SWIFT-R
Software protection: SWIFT-R [Reis et al., 2007]TMR-based method aimed to recover faults from the datasection
Build the control flow graphData triplication after inSoR instructionsTriplication of instructions using redundant dataInsertion of majority voters and recovery procedures beforeoutSoR instructions and before conditional branches
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Software-based technique: SWIFT-R
Code and Execution Time Overheads
1,0
1,5
2,0
2,5
3,0
3,5
bub div fib gcd madd mmult mult pow GeoMean
No
rma
lize
d O
verh
ea
ds
Code Overhead Execution Time Overhead
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Hardware-based technique: TMR
TMR: Triple Modular Redundancy
Hardware protection: TMRThe fault tolerant co-design strategy was complemented byincrementally hardening the microprocessor resourcesFive microprocessor versions were developed:
P0: non-hardened RTL PicoblazeP1: with hardware redundancy for Program Counter (PC),Flags and Stack Pointer (SP)P2: all registers in the pipeline protectedP3: with hardware redundancy for PC, Flags, SP, andPipelineP4: full protected
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Prototypes evaluation
Experimental setup
Reliability evaluation: Experimental Setup for FT-UnshadesFor each prototype: Fault injection campaign with selectiveattacks against the microprocessor register sets: registerfile, PC, Flags, SP, and pipelineFor each one of these register sets, 5.000 SEUs injected(one per execution)Bit-flip in a randomly selected clock cycle
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Prototypes evaluation
Reliability evaluation using FT-Unshades
70%
75%
80%
85%
90%
95%
100%
O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H O H
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
bub div fib gcd madd mmult mult pow Average
Perc
enta
ge [%
]
unACE SDC Hang
Fault classification percentages for every test program —non-hardened (O) and
SWIFT-R (H)— running on each processor version (P0 to P3)
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Prototypes evaluation
Reliability vs Hardware Costs
88,0
89,5
91,0
92,5
94,0
95,5
97,0
98,5
100,0
1,00
1,25
1,50
1,75
2,00
2,25
2,50
2,75
3,00
P0 P1 P2 P3 P4
unA
CE f
ault
s pe
rcen
tage
[%]
Nor
mal
ized
har
dwar
e co
st
Microprocessor approaches
Normalized Xilinx primitives cost Normalized FlipFlops/Latches cost
Normalized RAMS cost % unACE faults for non-hardened programs
% unACE faults for SWIFT-R programs
Normalized hardware cost and percentage of unACE faults per microprocessor
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Conclusions and Future Work
We have presented a rapid prototyping approach for thedesign of radiation-tolerant embedded systems usingFPGAThis approach is supported by a flexible hardeningplatform, which facilitates the representation of severaltrade-offs among design constraints, reliability,performance, and costsThe rapid prototyping strategy allows designers to easilyexplore the design space between hardware-only andsoftware-only fault-tolerance techniquesAs case study, several fault-tolerant prototypes based on aRTL implementation of PicoBlaze have been developedand evaluatedThe infrastructure will be extended to support 32-bitmicroprocessors, such as: Microblaze and Leon3
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Thank you for your attention!September 1st 2010, FPL 2010, Milano, Italy
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Austin, T. (1999).DIVA: A reliable substrate for deep submicronmicroarchitecture design.In 32nd Annual International Symposium onMicroarchitecture, (MICRO-32), pages 196–207.Haifa, Israel, Nov 16-18, 1999.
Baumann, R. (2002).Soft errors in commercial semiconductor technology:Overview and scaling trends.IEEE 2002 Reliability Physics Tutorial Notes, ReliabilityFundamentals, page 121.
Baumann, R. (2005).Radiation-induced soft errors in advanced semiconductortechnologies.IEEE Trans. on Device and Materials Reliability,5(3):305–316.
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Bernardi, P., Bolzani, L., Rebaudengo, M., Reorda, M.,Vargas, F., and Violante, M. (2006).A new hybrid fault detection technique forsystems-on-a-chip.IEEE Transactions on Computers, 55(2):185–198.
Gomaa, M., Scarbrough, C., Vjaykumar, T., and Pomeranz,I. (2003).Transient-fault recovery for chip multiprocessors.IEEE MICRO, 23(6):76–83.
Guzman-Miranda, H., Aguirre, M., and Tombs, J. (2009).Noninvasive fault classification, robustness and recoverytime measurement in microprocessor-type architecturessubjected to radiation-induced errors.IEEE Transactions on Instrumentation and Measurement,58(5).
Mahmood, A. and McCluskey, E. (1988).
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Concurrent error-detection using watchdog processors - asurvey.IEEE Transactions on Computers, 37(2):160–174.
Mukherjee, S., Kontz, M., and Reinhardt, S. (2002).Detailed design and evaluation of RedundantMultithreading alternatives.In 29th Annual International Symposium on ComputerArchitecture, pages 99–110.Anchorage, AK, May 25-29, 2002.
Napoles, J., Guzman, H., Aguirre, M., Tombs, J., Munoz, F.,Baena, V., Torralba, A., and Franquelo, L. (2007).Radiation environment emulation for VLSI designs A lowcost platform based on xilinx FPGAs.In IEEE International Symposium on Industrial Electronics,ISIE 2007.
Oh, N., Shirvani, P., and McCluskey, E. J. (2002a).
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Control-flow checking by software signatures.IEEE Transactions on Reliability, 51(1).
Oh, N., Shirvani, P. P., and McCluskey, E. J. (2002b).Error detection by duplicated instructions in super-scalarprocessors.IEEE Transactions on Reliability, 51(1).
Rebaudengo, M., Reorda, M. S., Violante, M., andTorchiano, M. (2001).A source-to-source compiler for generating dependablesoftware.First IEEE International Workshop on Source CodeAnalysis and Manipulation, Proceedings, pages 33–42.
Reinhardt, S. and Mukherjee, S. (2000).Transient fault detection via simultaneous multithreading.In 27th International Symposium on Computer Architecture,pages 25–36.
Introduction Platform for the rapid prototyping of dependable ES Case Study Conclusions
Conclusions and Future Work
Vancuver, Canada, Jun 12-14, 2000.
Reis, G., Chang, J., Vachharajani, N., Mukherjee, S.,Rangan, R., and August, D. (2005a).Design and evaluation of hybrid fault-detection systems.In 32nd International Symposium on ComputerArchitecture, Proceedings, pages 148–159.Madison, WI, Jun 04-08, 2005.
Reis, G. A., Chang, J., and August, D. I. (2007).Automatic instruction-level software-only recovery.IEEE Micro, 27(1):36–47.
Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., andAugust, D. I. (2005b).SWIFT: software implemented fault tolerance.CGO 2005: Int Symposium on Code Generation andOptimization, pages 243–254.