29
ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

ED4I: Error Detection by Diverse Data and Duplicated Instructions

Greg Bronevetsky

Page 2: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

ED4I Background

• A code transformation system developed at the Stanford Center for Reliable Computing.

• Authors: Nahmsuk Oh, Subhasish Mitra, Edward J. McCluskey

• ED4I allows us to run a program on two slightly different inputs and still be able to compare results at the end.

Page 3: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Motivation

• The simplest way to detect Byzantine Faults is to run the same program on multiple processors and compare results.

• ED4I is Byzantine Fault detection for uniprocessors.

• Must take into account both temporary and and permanent faults.

Page 4: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Definitions

• Temporary Faults – any fault that temporarily affects a processor, long enough to execute several instructions.• Ex: Radiation hitting wires, frayed wires.

• Permanent Faults – a fault that affects a processor for a long period of time.• Ex: Spilling Coke on the chip, cut wires.

Page 5: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Problem Statement

• We can detect Byzantine Failures by running each program or procedure twice and comparing the results.

• However, this does not guard against permanent faults since the results of both runs will be the same.

• Need to make the two runs different so that the same fault will affect the results differently.

• Overhead = 100%.

Page 6: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Key Idea

• Lets feed into the program two different sets of data and then compare the results.

• Key Insight: • If the program only uses arithmetic operations,

we can alter the input by multiplying all input numbers by a constant.

• Then the modified output will be the (real output) * (the constant).

• Thus, you can verify that the two computations succeeded AND the two computations will be affected by errors differently.

Page 7: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

New Program

• If we alter the input to the program, we must alter the program to work with this modified input.

• The transformation is given the constant k (called the “diversity factor”) and it creates the “k-factor diverse program”.

• The new program will have the same control flow graph as the old program but all the variables will be k-multiples of the of original ones.

Page 8: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Transformations

• If k<0, branches flip directions (> ↔ <, ≥ ↔ ≤)

• All constants in code get multiplied by k.• Addition and Subtraction of variables

unchanged.• Multiplication:

v1*v

2*....*v

n → (v

1*v

2*....*v

n)/kn-1

• Division: v

1/v

2 → (v

1/v

2)*k

Page 9: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Fault Detection Probability

• For functional unit hi (such as the adder),

fault f and diversity factor k:

• Xi = is the set of inputs to h

i

• Ei = subset of X containing the inputs that will

result in erroneous output due to the fault.• E'

i = subset of E

i that will escape detection

• Ci(k) = Probability of catching an error in h

i.

∣ ∣)()('

i

ii

fi X

EEfP=kC

Page 10: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Data Integrity Probability

• For functional unit hi, fault f and diversity

factor k:

• Xi = is the set of inputs to h

i

• Ei = subset of X containing the inputs that will

result in erroneous output due to the fault.• E'

i = subset of E

i that will escape detection

• Di(k) = Probability of missing no errors in h

i.

)1)(()('

i

i

fj X

EfP=kD

Page 11: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Choosing the value of k

• For some functional units we can derive Ci(k)

and Di(k) analytically for each k.

• This is too hard in general so we resort to trying out a range of k's empirically to determine C

i(k) and D

i(k).

Page 12: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Bus Signal Line

• Bus wire stuck at either 0 or 1.

• Derived results for a 12-bit bus:

Page 13: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Adder

• Experimental results for a 12-bit ripple carry adder:

• Experimental results for a 12-bit carry look-ahead adder:

Page 14: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Multiplier & Divider

• Experimental Results for • 12-bit array multiplier• 8-bit Wallace Tree multiplier• SRT divider

Page 15: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Shifter

• Experimental Results for 16-bit multiplexer-based shifter:

Page 16: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Using Benchmarks to pick k

• Need to determine how much each functional unit is used in the average program.

• Add, sub, mult and shift use the obvious functional units.

• “memory access” uses the memory bus• “branch” uses a carry-lookahead adder

Page 17: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Benchmarked Data Integrity

• Calculated Data Integrity=Di(k) given above

usage statistics. (high Di(k) top priority)

• Highlighted columns provide the best data integrity for each benchmark.

Page 18: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Benchmarked Detection Probability

• Calculated Detection Probability=Ci(k) given

above usage statistics.

• Highlighted columns provide the best detection probability for each benchmark.

Page 19: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Optimum k

• Optimum k selected:• Must maximize the Data Integrity=D

i(k).

• Given maximum Di(k), maximize C

i(k).

• For each program, should get an estimate of how it uses the different functional units and pick k accordingly.

Page 20: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Dealing with Overflow

• By multiplying all variables by k, we may cause them to overflow.• Can scale variables up to next largest type.• Scale down variables by dividing by k. Must only

check higher order bits when comparing new results to results of original program.

• Can use compile-time range checking to determine vulnerability to overflow and pick k accordingly

Page 21: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Floating Point Numbers

• Above technique fails for floating point numbers.

• IEEE 754 format: • K=-2 will only change the sign bit and some

bits in the exponent.• Solution: pick separate k's for the exponent

and the mantissa and run the program once with each k.

• Overhead = 200%.

2121 m,m bs

Page 22: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Picking k for the mantissa

• To find errors in mantissa, pick k to be 3/2.• A stuck-at-1 fault:

• In original program, variable x's value corrupted to:

• In transformed program,

Since

However, the mantissa must be <2, so if

• the mantissa is right shifted by 1 and normalized.

2)(12)(1 em,em=x bse

bs m=x=x' 2 2

31

2

3

32

3

2

32,1 mm

22

3m

Page 23: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Transformed variables

• So now, the value in transformed program is:

• Value in original program is:

32

322)

4

3(1

22

32)

2

3(1

1

mif,m

mif,m

bs

bs

2121 m,m bs

Page 24: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Fault Detection in Mantissa

• If there is a stuck-at-1 fault• Value in transformed program:

• Value in original program * k (for checking):

32

322)

4

3(1

22

32)

2

3(1

1

mif,em

mif,em

bs

bs

3)(2

322)(

4

31

2)(2

32)(

2

31

1

emif,em

emif,em

bs

bs

Page 25: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

We can detect Mantissa errors!

• Note that the error values for the original and the transformed programs are different!

• We actually use k= in order to flip the sign• bit for improved detection capability

2

3

Page 26: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

k for exponents

• In order to flip all the bits of the exponent, need to transform program to use k= and k=

• If a fault invalidates a bit of the exponent, the fault will be detected by comparing to the exponents of one of the two transformed programs.

210101010102 2

01010101012

Page 27: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Effectiveness for Mantissa

• Effectiveness of k= (for IEEE 754 single precision)

210101010102

2

3

Page 28: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Effectiveness for Exponent

• Effectiveness of k= (for IEEE 754 single precision)

201010101012

2

3

Page 29: ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

Summary

• ED4I effectively detects Byzantine Failures in numerical applications on uniprocessors.

• Purely software solution using Data Diversity.• Detects permanent and temporary faults.• Works with fixed-point and floating point

numbers.• Compatible with arithmetic and logical

operations (probably with any bitwise logical operation if it can be recast into arithmetic)

• High overhead: 100% or 200%.