23
Swift 1 P173/MAPLD 2005 Upset Susceptibility Upset Susceptibility and and Design Mitigation Design Mitigation of of PowerPC405 Processors PowerPC405 Processors Embedded in Embedded in Virtex II- Virtex II- Pro FPGAs Pro FPGAs

P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Embed Size (px)

Citation preview

Page 1: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 1 P173/MAPLD 2005

Upset Susceptibility Upset Susceptibility and and

Design Mitigation Design Mitigation

of of

PowerPC405 Processors PowerPC405 Processors Embedded in Embedded in

Virtex II-Pro FPGAs Virtex II-Pro FPGAs

Page 2: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 2 P173/MAPLD 2005

AuthorsAuthors

Gary SwiftJet Propulsion Laboratory/California Institute of Technology

Gregory AllenJet Propulsion Laboratory/California Institute of Technology

Jeffrey GeorgeThe Aerospace Corporation

Page 3: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 3 P173/MAPLD 2005

AuthorsAuthors

Sana RezguiXilinx Corporation

Carl CarmichaelXilinx Corporation

Fayez ChayabMDRobotics

Page 4: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 4 P173/MAPLD 2005

AbstractAbstract

We show recent results for the upset susceptibility of the registers and caches in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively, these upsets can dominate the system error rate.

We consider several techniques for implementing various levels of redundancy to reduce system errors, including single-, dual- and triple-chip options. We conclude that the dual-chip option may often be the best choice and warrants further study.

Page 5: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 5 P173/MAPLD 2005

Background - Reconfigurable FPGA Background - Reconfigurable FPGA UpsetsUpsetsThe basic building blocks are soft to upset [Ref. 1]

Config

1E-10

1E-9

1E-8

1E-7

0 10 20 30 40

LET (MeV per mg/cm2)

Cro

ss S

ectio

n (c

m2 /b

it)

BRAM

Configuration Cells and Block RAM XQR2VP40

Page 6: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 6 P173/MAPLD 2005

Background - Upset MitigationBackground - Upset Mitigation

Critical applications require design-level upset mitigation

• Design Triplication– The use of TMR (or triple modular redundancy) in a design

allows correct function through triplicated majority voters even when a configuration element is upset.

– The extra design effort is now largely automated by new software (TMRtool).

• Active Configuration Scrubbing– Upsets in the configuration must not be allowed to

accumulate or TMR will “break”– Scrubbing uses some resources, but can be implemented

so that it is transparent to system operation.

Page 7: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 7 P173/MAPLD 2005

Embedded “Hard-Core” Processor(s) Embedded “Hard-Core” Processor(s) Upset Upset

PowerPC 405 cores in Virtex II-Pro family FPGAs offer unprecedented computational power inside an FPGA, but include additional upsetable storage elements

Ones

1E-10

1E-9

1E-8

1E-7

1E-6

0 10 20 30 40

LET (MeV per mg/cm2)

Cro

ss S

ectio

n (c

m2 /b

it)

Zeros

General Purpose Registers XQR2VP40 embedded PPC405 core

Page 8: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 8 P173/MAPLD 2005

Processor Upsets – Data Cache Processor Upsets – Data Cache Processor caches are very important features for increased performance; however, upsets in the caches can lead to system errors.

Ones

1E-10

1E-9

1E-8

1E-7

0 10 20 30 40

LET (MeV per mg/cm2)

Cro

ss S

ectio

n (c

m2 /b

it)

Zeros

Data Cache XQR2VP40 embedded PPC405 core

Page 9: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 9 P173/MAPLD 2005

Processor Upset MitigationProcessor Upset Mitigation

The “obvious” solution of implementing TMR with three processor cores is not an available single chip option because the maximum number of processors per FPGA is currently two.

Tradeoffs between upset robustness and system complexity, possibly spanning multiple FPGAs, must be considered.

Page 10: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 10 P173/MAPLD 2005

One-Chip SolutionOne-Chip SolutionRunning two processors in lockstep is conceptually simple, esp. as they can reside in a single FPGA. A fast TMR-ed comparison block is required to contain errors and not allow them to propagate into the rest of the system. A processor upset will appear to the comparison block as a disagreement, necessitating both processors be stopped within the current clock cycle. Then they both must be forced to roll back to a known good software “bookmark” or, alternatively, to reboot.

Page 11: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 11 P173/MAPLD 2005

Flow ChartFlow Chart

One-Chip SolutionSingle Instruction Executed

(in Lockstep)

Compare Processor Outputs

Error Detected

Stop Execution

Y

Initialize Processor Reboot

Execute reboot and/or resynchronization processes

N

Page 12: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 12 P173/MAPLD 2005

AdvantagesAdvantages

• Contained in one chip

– No chip-to-chip interconnects (minimal latency and propagation delay)

– Lower power consumption– Less board area– No chip-to-chip synchronization

• Technology is more developed and tested [See Reference 2]

Page 13: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 13 P173/MAPLD 2005

DisadvantagesDisadvantages

• More system outages

– Reboot or rollback on every error– Not suitable for some critical real-time

applications

• Twice as many errors as on a single processor, but at least they are detected

Note: Requires extra device – either watchdog timer or external configuration scrubber

Page 14: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 14 P173/MAPLD 2005

Two-Chip SolutionTwo-Chip Solution

With four processors in lockstep (necessitating two chips), a solution as robust as full TMR is possible. In this scheme, a pair of processors that get into a disagreement due to an upset will be stopped while the system runs without interruption on the processor pair that are in agreement. Correct internal state information is available in the working pair., preferably soon. Thus, it is possible to re-synchronize almost transparently and rapidly get back to full four-processor lockstep operation with minimal intrusion. As a side effect of using two separate FPGAs, additional robustness is possible by adding on cross-strapped configuration control.

Page 15: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 15 P173/MAPLD 2005

Flow ChartFlow Chart

Two-Chip Solution Power up configuration (both FPGAs from the same ROM)

Parallel internal error checking

Error Detected

N

Y

Healthy FPGA takes over and initiates a full or partial reconfiguration of the upset FPGA

Wait for an opportunity to reconfigure

Resynchronization arbitrator synchronizes processors to appropriate location

Processors with disagreement halt.

Page 16: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 16 P173/MAPLD 2005

AdvantagesAdvantages

• Reboots rare; requires simultaneous errors in two separate processors

• Processor upsets are transparently handled without system outage until convenient re-synchronization opportunites

• Enhanced robustness – outages lowered to less than the SEFI rate of ~1 in 80 years per device

• Allows added configuration robustness– Chips check each other (not self-checking)– Eliminates need for external watchdog timer

Page 17: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 17 P173/MAPLD 2005

DisadvantagesDisadvantages

• Complicated– Inter-chip communication/synchronization– Transparent reboot/resynchronization of both

processors in chip with error

• Twice the power consumption

• In-beam testing is not yet done (although planned for the near future)

Page 18: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 18 P173/MAPLD 2005

Three-Chip SolutionThree-Chip Solution

The three-chip implementation (also known as the “virtual FPGA” solution [Ref. 3]) takes the responsibility of error detection out of the hands of the upsetable FPGAs by adding a Radiation-Hardened ASIC. Note that only one processor per FPGA is needed. The ASIC handles stopping error propagation and re-synchronizing an upset processor. Additionally, the ASIC can be used for configuration control of all three FPGAs.

Page 19: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 19 P173/MAPLD 2005

Flow ChartFlow Chart

Three-Chip Solution

Configure all three FPGAs

Processors execute a cycle in lockstep

Error is detected

Re-synchronize state of device with upset

Y

N

Page 20: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 20 P173/MAPLD 2005

AdvantagesAdvantages

• Maximum robustness to upsets• Only three processors in lockstep (but in 3

chips)• More fabric available for other functions• No system outages; errors and SEFIs are

handled transparently• Most implementation details are confined to the

ASIC and don’t affect the IP in the FPGAs significantly

Page 21: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 21 P173/MAPLD 2005

DisadvantagesDisadvantages

• Complex ASIC development for controller to vote outputs and re-load/re-sync upset processor

• ASIC development cost (currently funded though)

• Board area

Page 22: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 22 P173/MAPLD 2005

ConclusionsConclusions

• Both two-chip and three-chip solutions have about the same robustness, power consumption, and system complication, but handle upsets better than the one-chip solution.

• The two- vs. three-chip decision mostly boils down to the familiar FPGA vs. ASIC debate

• Three-chip solution may use less power than the two-chip. (Is the ASIC’s power consumption less than that of one processor core?)

• At present, the JPL-preferred approach is the two-chip implementation achieving maximum flexibility and near maximum robustness to upsets.

Page 23: P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

Swift 23 P173/MAPLD 2005

ReferencesReferences

• [1] J. George et al., “Initial Single-Event Effects Testing and Mitigation in the Xilinx Virtex II-Pro FPGA,” Paper 211, MAPLD 2005.

• [2] M. Wang and G. Bolotin, “SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA,” Paper D110, MAPLD 2004, http://klabs.org/mapld04/presentations/session_d/ 1_d110_wang_s.ppt

• [3] J. Lyke and B. Marty, Virtual Field Programmable Gate Array Triple Modular Redundant Cell Design, Air Force Research Laboratory: Space Vehicles Directorate, AFRL-VS-PS-TR-2004-1093, April 28, 2004.