26
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

  • Upload
    saburo

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing. Quinn Martin Alan George. SOAP. Background FPGAs and Radiation in Space Traditional Scrubbing Methods SOAP Approach Mission Parameters Markov Models Mission Case Studies Results - PowerPoint PPT Presentation

Citation preview

Page 1: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

HPEC 2012

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Quinn MartinAlan George

Page 2: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

SOAPSOAP

2

Background FPGAs and Radiation in Space Traditional Scrubbing Methods

SOAP Approach Mission Parameters Markov Models

Mission Case Studies Results Conclusions

Page 3: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

FPGAsFPGAs

3

Field-Programmable Gate Arrays (FPGAs) Implement custom digital logic hardware with

fabric of logic resources and interconnect Lookup tables (LUTs) implement combinational logic User flip flops (FFs) implement sequential logic Switch and connection boxes route among resources

Many are reconfigurable Allows update of routing and logic state Partial reconfiguration can update partition of device E.g., Virtex from Xilinx and Stratix from Altera

Page 4: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Reconfigurable FPGAs in Reconfigurable FPGAs in SpaceSpace

4

Advantages Very high performance/power ratio Reconfigurable (fully and partially)

Adaptable to changing environments and mission requirements

Can update design after launchDisadvantages

Relatively difficult to design/test applications Configuration memory vulnerable to radiation

Can change application processor architecture in unpredictable way

Must repair upsets via configuration scrubbing

Page 5: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Radiation Effects on Radiation Effects on FPGAsFPGAs

5

Single-event Effects (SEE) Single-event Latchup (SEL) – Causes current

spike that may damage device Single-event Upset (SEU) – Changes state of

bit(s), e.g. from logic ‘0’ to ‘1’ Can be single-bit upset (SBU) or multi-bit upset (MBU)

Single-event Functional Interrupt (SEFI) – Like SEU, but affecting critical device resource

Total Ionizing Dose Degrades performance over time leading to

eventual device failure

Page 6: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Xilinx V-5/V-6 Xilinx V-5/V-6 ConfigurationConfiguration

6

Programmed via SelectMAP interface Runtime configuration interface Also allows readback of existing configuration 32 bits per configuration word Parallel bus width of 8, 16, or 32 bits Max clock frequency 100 MHz

Configuration memory arranged in frames Minimum unit of access to config. memory Virtex-5 – 41 words per frame Virtex-6 – 81 words per frame

Page 7: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

FPGA ScrubbingFPGA Scrubbing

7

FPGA Configuration ScrubbingQuickly repairs SEUs before accumulation

Accumulation defeats redundancy strategies (e.g., TMR)

Fast repair can prevent SEUs from manifesting as errors

Can be decomposed into basic scrubbing techniques Correction techniques repair upsets Detection techniques discover and locate upsets

Page 8: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

FPGA Scrubbing FPGA Scrubbing TechniquesTechniques

8

Correction TechniquesGolden Copy – Repairs configuration based on know “golden” copy (e.g., in rad-hard PROM)Frame ECC – Repairs based on per-frame error syndrome code stored on-chip

Detection TechniquesFrame ECC – Detects based on per-frame SECDED Hamming codeCRC-32– Detects using device-wide CRC-32

Page 9: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

FPGA Scrubbing FPGA Scrubbing StrategiesStrategies

9

Scrubbing StrategiesAny combination of detection and correction techniques with controller to implement algorithmBlind Scrubbing – Golden copy correction onlyReadback Scrubbing – Some detection technique used

Page 10: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

FPGA Scrubbing FPGA Scrubbing StrategiesStrategies

10

Page 11: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

SOAP ApproachSOAP Approach

11

Scrubbing Optimization via Availability Prediction (SOAP) Uses system availability as primary metric for

scrubbing efficacy Models scrubbing strategies as Markov diagrams Vary free parameters to find optimal scrubbing

system Environmental parameters λ and α (orbits) System parameters B and fCCLK (memory and pin

constraints) Scrubbing parameters μ and γ (device configuration

capability)

Page 12: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

SOAP ApproachSOAP Approach

12

Page 13: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Environmental ParametersEnvironmental Parameters

13

λ - SEU rates for devices in various orbits of interest Calculated per-bit and per-device using

CREME96 α – Correction factors for single-bit and multi-

bit upsets (SBU/MBU) From beam tests on Virtex-5 devices

Page 14: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

System ParametersSystem Parameters

14

Factors chosen by the system designer based on available memories, power budget, etc.

Affect scrubbing detection and correction rates (see equations on next slide)

B – Configuration bus width in bits fCCLK – Configuration clock speed in Hz

Page 15: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Scrubbing ParametersScrubbing Parameters

15

μ – Repair rate for scrubbing technique (per second)

γ – Detection rate for scrubbing technique (per second)

Page 16: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Markov Algorithm ModelsMarkov Algorithm Models

16

Blind No detection

Built-in CRC-32 Basic detection

Frame ECC with CRC-32 CRC acts as “safety net” for upsets

undetected by Frame ECC Frame ECC with CRC-32 and Essential

Bits (EB) Only scrubs errors that may be critical

Page 17: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Blind ScrubbingBlind Scrubbing

17

Page 18: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Readback CRC-32 Readback CRC-32 ScrubbingScrubbing

18

Page 19: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

CRC-32 w/ Frame ECC CRC-32 w/ Frame ECC ScrubbingScrubbing

19

Page 20: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Case StudyCase Study

20

Applies SOAP method to hypothetical systems with realistic parameters

Devices Xilinx Virtex-5 Xilinx Virtex-6

Orbits ISS low earth orbit (LEO) Molniya highly elliptical orbit (HEO)

8-bit SelectMAP bus at 33 MHz Accounts for access speed of slow rad-hard PROM

Page 21: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Case StudyCase Study

21

Two mission types Non upset critical (non-UC) – System continues

to run upon detection and correction of upsetOnly count critical upsets as system “unavailable”

Upset critical (UC) – System requires reset upon detection of upset to ensure state integrity

Requires detectionAll detected upsets render system unavailable for reset periodWill benefit from essential bits mask used in detection

Page 22: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Non-UC ResultsNon-UC Results

22

Continuous blind scrubbing offers highest availability

CRC-32 offers similar availability with low implementation complexity

Frame ECC suffers because TBUs can be falsely corrected, resulting in further errors

Page 23: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

UC ResultsUC Results

23

Page 24: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

UC ResultsUC Results

24

Page 25: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

ResultsResults

25

Frame ECC with CRC-32 and Essential Bits mask offers highest availability Roughly one extra nine over other methods Xilinx-provided soft-error mitigation (SEM) core

implements similar strategy

Other strategies still competitive Complex state machine or software and additional

memory required for Frame ECC/EB Model does not account for vulnerability associated

with internal scrubbing

Page 26: Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

ConclusionsConclusions

26

Predicts availability for various FPGA scrubbing strategies on real and hypothetical platforms

Uses analytical models rather than experimentation

Markov availability modeling with parametric approach

Allows optimization of scrubbing strategy during design phase

In case study, blind scrubbing best for non-UC and Frame ECC with EB mask best for UC