20
CS 851 Fall 2002 1 2 Oct 2002 Design Paradigms Logical Error in Design Design Change for the Worse Success Masking Error

CS 851Fall 20021 2 Oct 2002 Design Paradigms Logical Error in Design Design Change for the Worse Success Masking Error

Embed Size (px)

Citation preview

CS 851 Fall 2002 1

2 Oct 2002

Design ParadigmsLogical Error in Design

Design Change for the WorseSuccess Masking Error

CS 851 Fall 2002 2

What I’ll present• A brief overview of each design failure paradigm

• Possibly related software system examples – Osprey V-22 failure– The Therac 25 incident– The Ariane 5 flight failure

• Discussion

CS 851 Fall 2002 3

Logical Error in Design

CS 851 Fall 2002 4

Logical Error in Design• Error in fundamental assumptions

– Particularly with respect to a failure mode

• Leading to flawed models and flawed analysis

• Illustration using Galileo’s analysis of cantilever beams– Error in the assumption of uniform stress– However, error is masked in analysis of relative strength– The correct assumption is that the stress is uniformly varying

about the neutral bending axis ‘A’

W

S L

b

hA

CS 851 Fall 2002 5

Logical Error in Design• The error may not have been found since

– Structures that might have been designed using the flawed analysis would have employed rather high safety factors

– As long as successful results are obtained, there is greater confidence in the validity of the assumption

– The credibility of Galileo as a scientific genius

– Human tendency to discount errors in fundamental assumptions when the assumptions are made with authority

CS 851 Fall 2002 6

Logical Error in Design• Hartford Civic Center roof collapse• Fundamental assumptions about the

frame were oversimplified• Overconfidence in the results of the

software model led to neglecting some alarming observations

• De Havilland Comet crashes • Flawed fundamental assumption that metal

fatigue would not be a determining factor in aircraft structural integrity

• Pressurized fuselage failed from metal fatigue

CS 851 Fall 2002 7

Tying it into Software• Therac 25 accidents• Initially flawed fundamental assumption that software cannot

fail– Lack of verification of software – Lack of independent review of design and code– Design error in the software itself

• Design error in removing traditional electromechanical interlocks

• Unrealistic assumptions about software in the safety analysis

• Any others ?

CS 851 Fall 2002 8

Observations/ Discussion• The engineer is responsible to look critically at every aspect of

the design and question the most fundamental assumptions

• Are there fundamental assumptions that we must question when designing software for critical systems?– Assumptions about the process employed?– Assumptions during analysis?

• Are there any software systems where a large safety-factor is employed, possibly masking a fundamental error?

CS 851 Fall 2002 9

Design Change for the Worse• Cracked marble column

– Failure prevention mechanism caused the failure

• Fractured hulls of ships– Fractures similar to the cracked marble column observed– Welding made adjacent steel brittle

CS 851 Fall 2002 10

Design Change for the Worse• Failure of suspended walkways in the Kansas City Hyatt

Regency

Original Design Modified Design

CS 851 Fall 2002 11

Design Change for the Worse• The Challenger explosion• Booster rockets modeled after Titan III rockets• Second O-ring added for “more reliable” design• Faulty seal, which was vulnerable at temperatures below 510

CS 851 Fall 2002 12

Software Example• Osprey V-22

Tilt-rotor aircraft

• PFCS Reset switchto reset software to pre-determined state

• Software was failuremitigation mechanism

• Hydraulic failurecoupled with softwarefailure

Not really an example of design change for the worse, but an example of the failure mitigation

mechanism causing a catastrophe – More a design flaw

CS 851 Fall 2002 13

Observations/ Lessons to be learnt• Additions/ modifications to design should be carefully

evaluated with respect to original design goals

• Failure mitigation techniques should be carefully examined (lest their failure lead an otherwise healthy system to catastrophically fail

• New techniques can introduce new failure modes or trigger existing/ latent failure modes

• Petroski states, “The idea of adding a third support should have been followed by deliberate consideration of whether the newly supported column could break in a different way”. FMEA

CS 851 Fall 2002 14

Tying it into Software

CS 851 Fall 2002 15

Discussion• Is the paradigm applicable to the practice of “patching”

software when bugs are found after the product is released?

• Patches can introduce newer bugs, new ways for the software to fail. Is FMEA possible/ feasible?

• Any examples of software design changes for the worse?

CS 851 Fall 2002 16

Success Masking Error• Success syndrome - basking in the glory of previous

successes

• Overconfidence in design with different parameters because of previous success of the same design

• Dee Bridge failure– Iron girders supported by trusses– Cast iron was chosen– Designer (Stephenson) had designed truss bridges before and

used a similar design– Increased the span length as compared to earlier designs– Neglected additional structural needs for extra length– Wrought iron trusses employed, had not been relied upon before

for carrying tensile load

CS 851 Fall 2002 17

Success Masking Error• Neglected the effect of scaling the design

• Five inches of ballast added above wooden decking with no consideration that increased load would decrease the safety factor

• Collapsed shortly thereafter

• Torsional buckling instability – A previously latent failure mode became dominant with a reuse of design without careful consideration of new parameters

CS 851 Fall 2002 18

Success Masking Error• Tacoma Narrows Bridge a.k.a “Galloping Gertie”

• Confidence in the success of suspension bridges• Lowering the safety factor for Tacoma Narrows Bridge

• Stainless steel fuel rods used in fast-neutron reactor environments

• Design borders on being commonplace and “daring”

CS 851 Fall 2002 19

A Software Example• Ariane 5 failure

– Reuse of Ariane 4 code without changes– A function which was not required was left operational because

the code was unchanged – Software design error left a variable unprotected– Designers reused code because of confidence in their code on the

basis of the Ariane 4 success– Testing and analysis of the module that failed was discounted,

again because of the success of the module in Ariane 4

– Also had a fundamental flaw in the assumption that the software is correct till shown to be at fault

– Reflects the paradigm of logical error in design

CS 851 Fall 2002 20

Observations/ Discussion• The designer should properly anticipate future use and

behavior of the design to avoid being surprised

• In the context of successes, risk tends to be minimized or ignored

• Reuse of software might be a category which fits into this paradigm

• A Conclusion : It is worthwhile to study the failure paradigms in other engineering disciplines to anticipate failure sources in software engineering