29
Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia University - Morgantown FY2001 University Software Initiative for the NASA IV&V Facility - Fairmont WV

Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Embed Size (px)

DESCRIPTION

CHARACTERIZING REDUNDANCY

Citation preview

Page 1: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Deploying Analytical Redundancyfor System Fault Tolerance

V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. ZhuangCSEE Dept. West Virginia University - Morgantown

FY2001 University Software Initiative for the NASA IV&V Facility - Fairmont WV

Page 2: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Outline

• Characterizing Redundancy

• Quantifying Redundancy

• Qualifying Redundancy

Page 3: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

CHARACTERIZING REDUNDANCY

Page 4: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Objectives• To develop a classification of redundancy by identifying

the orthogonal dimensions in redundancy• To analyze physical and analytical redundancy on the basis

of the obtained classification• To answer general questions about redundancy:

– What is redundancy?– Can we talk about redundancy outside the context of fault

tolerance?– Can we distinguish between intrinsic redundancy and redundancy-

by-design?– Is redundancy a representation issue or a design issue?– Is physical redundancy an extreme case of redundancy?

Page 5: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Definition of Redundancy

• From IEEE Dictionary– duplication of elements for the purpose of enhancing

system reliability– presence of auxiliary components in a system for the

purpose of preventing or recovering from failures– the existence of more than one means for performing a

given function– pertaining to characters that do not contribute to the

information content– Log (# symbols) - average information content per

symbol

Page 6: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Definition of RedundancyFunctional vs. State Redundancy

• State redundancy– system state [x0, x1, … xn] (implementation dependent)

• Functional redundancy– System level requirements R={(u,y)| …}– Subsystem/component level requirements R={(xi, xj)|…}

(implementation dependent)

Page 7: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Content Redundancy English language sentence (Shannon)

• No redundancy– symbols are independent and equiprobable

• First-level redundancy– symbols are independent but with frequency of English

text– digram structure as in English text– trigram structure as in English text

• Word redundancy– words are independent but with frequency of English

text– word transition probability is that of English text

Page 8: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Content Redundancy Physical system

• Rigid body in free fall ( p, v, a, F, M)• No redundancy

– quantities are independent and each uniformly distributed• Local redundancy (quantities are still independent)

– each quantity is assigned a probability distribution– relationship among each quantity at different time

instants• System redundancy

– instantaneous dependency between different quantities– temporal dependency between different quantities

Page 9: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Representation RedundancyParity-bit

• Information in order to be processed needs to be represented in some suitable manner

• The parity-bit in serial communication allows detecting non-admissible strings of bits.

• Admissibility of the string of bits is independent of the information content

Page 10: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Temporal/Sequential Redundancy

• Some applications are characterized by a sequential introduction of data

• Shannon’s example– first-order redundancy is a single-step redundancy– following orders of redundancy are multiple-step

• Physical system example– F(ti) = M(ti)a(ti) is single-step (instantaneous)

redundancy– v(t2) = [p(t2)-p(t1)]/(t2-t1) is multiple-step (temporal)

redundancy

Page 11: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Analytical Redundancy

• System/Subsystem/component level functional redundancy

• State redundancy• Content redundancy• Representation redundancy• Single/multiple-step redundancy

Page 12: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Physical Redundancy

• Component level functional redundancy• State redundancy• Content redundancy• Representation redundancy• Single-step redundancy (deterministic asset)

Page 13: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

QUANTIFYING REDUNDANCY

Page 14: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Objectives• To quantify the amount of redundancy by means

of a numeric function• To characterize analytical vs physical redundancy

by means of this function• To characterize Fault Tolerance Capabilities (e.g.,

detection, identification, etc.) by means of this function

• Use this function to support decision making in redundancy vs Fault Tolerant Capability tradeoffs

Page 15: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Redundancy as the ability to choose among representations

X : system state

P : set of all the “possible” system statesC : set of all the “correct” system states

Prob ( X C | X P )

The corresponding conditional entropy is a suitable metric of “how fully the potential domain is being exploited” (or, conversely, how sparsely populated it is), i.e. how much redundancy the system shows in terms of unused possible states

Page 16: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Redundancy as logical relation among state variables

• State made up of two (aggregate of) variables, say X and Y

• P(X|Y) : to what extent the value of Y determines the values of X

• H(X|Y) : Amount of uncertainty that remains about X if we know Y

H(X|Y) = H(X,Y) – H(Y)

Page 17: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

A simple example

a: system variable

SYSTEM

: vector of readings of a

Hypothesis: there is redundancy only if uniquely determines a

H(a | ) = 0 ( = H(a , ) – H() )

a f

a : P(f -1(a)) = P(a)

Page 18: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

This property holds:

H(a) H()

and the distance depends on the injectivity of f (e.g., one-to-one mapping gives H(a) = H() )

Again we may consider, as a measure of redundancy:

() = H() - H(a) ( = H( | a) )

i.e., how fully the potential domain of values is being exploited.

Page 19: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

() = H() - H(a)

We voluntarily omit a as a parameter of because:

• P(a) comes from the intrinsic system operational profile (there is no control on it)

while

• P() is the result of design choices and fault hypotheses (its value can be controlled by design)

Page 20: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

QUALIFYING REDUNDANCY

Page 21: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Objectives

• Whereas the previous section quantifies redundancy, this section qualifies it. The same amount of redundancy may or may not be useful, depending on functional properties

• Whereas in quantifying redundancy we need to distinguish between correct and representable (possible) states, in this section we will distinguish between:– Correct states– Maskable states– Recoverable states– Representable states

Page 22: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Notation

• s0 : system initial state• milestone: breaking point between past and future

behavior of the system : relation that describes the past behavior : relation that describes the future behavior : system requirements

Page 23: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

s0

(s0)

milestone

s is a correct state: (s0,s)

(s0)

Page 24: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

s0

(s0)

maskablemilestone

(s0,s) K (, )s is a maskable state:

(s0)

Page 25: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

(s0)

maskable

rmilestone

s is a recoverable state: s0

r : ’ r K (, )

(s0)

Page 26: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Question

For what ’ and K this equation has a solution?

Analogy: for what a,b does the equation ax=b have a solution?Answer: a0

r : ’ r K (, )

Page 27: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Answer: conditions for existence of r

- C1 - K L ’ L

- C2 - (K L ’)^ K must be a total relation

In practice, we look for the smallest ’ s.t. C1 and C2 hold (i.e., the relation that maps initial to recoverable states only)

- C1 - K L = ’ L

- C2 - ’ K must be a total relation

Page 28: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

A sufficient condition for C2If the domain partition determined by K is preserved by ’

then condition C2 holds

’ ’ K K ’ K is a total relation A simple example

K = { (s,s’) | s’ = s mod 6}

’1 = { (s,s’) | s’ = s mod 12}

’2 = { (s,s’) | s’ = (s+5) mod 18}

Only producesrecoverable states

recovery: s’ = s mod 6

Only producesrecoverable states

recovery: s’ = (s+1) mod 6

’3 = { (s,s’) | s’ = s mod 10} It does not produce

recoverable states

Page 29: Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia

Conclusions and Future Work

• We have developed a framework for reasoning about redundancy

• It includes: Classification/Quantification/Qualification

• Future work– Refining/reorganizing classification– Evaluate quantification– Validate qualification