Upload
silas-fowler
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
CS 851 Fall 2002 1
2 Oct 2002
Design ParadigmsLogical Error in Design
Design Change for the WorseSuccess Masking Error
CS 851 Fall 2002 2
What I’ll present• A brief overview of each design failure paradigm
• Possibly related software system examples – Osprey V-22 failure– The Therac 25 incident– The Ariane 5 flight failure
• Discussion
CS 851 Fall 2002 4
Logical Error in Design• Error in fundamental assumptions
– Particularly with respect to a failure mode
• Leading to flawed models and flawed analysis
• Illustration using Galileo’s analysis of cantilever beams– Error in the assumption of uniform stress– However, error is masked in analysis of relative strength– The correct assumption is that the stress is uniformly varying
about the neutral bending axis ‘A’
W
S L
b
hA
CS 851 Fall 2002 5
Logical Error in Design• The error may not have been found since
– Structures that might have been designed using the flawed analysis would have employed rather high safety factors
– As long as successful results are obtained, there is greater confidence in the validity of the assumption
– The credibility of Galileo as a scientific genius
– Human tendency to discount errors in fundamental assumptions when the assumptions are made with authority
CS 851 Fall 2002 6
Logical Error in Design• Hartford Civic Center roof collapse• Fundamental assumptions about the
frame were oversimplified• Overconfidence in the results of the
software model led to neglecting some alarming observations
• De Havilland Comet crashes • Flawed fundamental assumption that metal
fatigue would not be a determining factor in aircraft structural integrity
• Pressurized fuselage failed from metal fatigue
CS 851 Fall 2002 7
Tying it into Software• Therac 25 accidents• Initially flawed fundamental assumption that software cannot
fail– Lack of verification of software – Lack of independent review of design and code– Design error in the software itself
• Design error in removing traditional electromechanical interlocks
• Unrealistic assumptions about software in the safety analysis
• Any others ?
CS 851 Fall 2002 8
Observations/ Discussion• The engineer is responsible to look critically at every aspect of
the design and question the most fundamental assumptions
• Are there fundamental assumptions that we must question when designing software for critical systems?– Assumptions about the process employed?– Assumptions during analysis?
• Are there any software systems where a large safety-factor is employed, possibly masking a fundamental error?
CS 851 Fall 2002 9
Design Change for the Worse• Cracked marble column
– Failure prevention mechanism caused the failure
• Fractured hulls of ships– Fractures similar to the cracked marble column observed– Welding made adjacent steel brittle
CS 851 Fall 2002 10
Design Change for the Worse• Failure of suspended walkways in the Kansas City Hyatt
Regency
Original Design Modified Design
CS 851 Fall 2002 11
Design Change for the Worse• The Challenger explosion• Booster rockets modeled after Titan III rockets• Second O-ring added for “more reliable” design• Faulty seal, which was vulnerable at temperatures below 510
CS 851 Fall 2002 12
Software Example• Osprey V-22
Tilt-rotor aircraft
• PFCS Reset switchto reset software to pre-determined state
• Software was failuremitigation mechanism
• Hydraulic failurecoupled with softwarefailure
Not really an example of design change for the worse, but an example of the failure mitigation
mechanism causing a catastrophe – More a design flaw
CS 851 Fall 2002 13
Observations/ Lessons to be learnt• Additions/ modifications to design should be carefully
evaluated with respect to original design goals
• Failure mitigation techniques should be carefully examined (lest their failure lead an otherwise healthy system to catastrophically fail
• New techniques can introduce new failure modes or trigger existing/ latent failure modes
• Petroski states, “The idea of adding a third support should have been followed by deliberate consideration of whether the newly supported column could break in a different way”. FMEA
CS 851 Fall 2002 15
Discussion• Is the paradigm applicable to the practice of “patching”
software when bugs are found after the product is released?
• Patches can introduce newer bugs, new ways for the software to fail. Is FMEA possible/ feasible?
• Any examples of software design changes for the worse?
CS 851 Fall 2002 16
Success Masking Error• Success syndrome - basking in the glory of previous
successes
• Overconfidence in design with different parameters because of previous success of the same design
• Dee Bridge failure– Iron girders supported by trusses– Cast iron was chosen– Designer (Stephenson) had designed truss bridges before and
used a similar design– Increased the span length as compared to earlier designs– Neglected additional structural needs for extra length– Wrought iron trusses employed, had not been relied upon before
for carrying tensile load
CS 851 Fall 2002 17
Success Masking Error• Neglected the effect of scaling the design
• Five inches of ballast added above wooden decking with no consideration that increased load would decrease the safety factor
• Collapsed shortly thereafter
• Torsional buckling instability – A previously latent failure mode became dominant with a reuse of design without careful consideration of new parameters
CS 851 Fall 2002 18
Success Masking Error• Tacoma Narrows Bridge a.k.a “Galloping Gertie”
• Confidence in the success of suspension bridges• Lowering the safety factor for Tacoma Narrows Bridge
• Stainless steel fuel rods used in fast-neutron reactor environments
• Design borders on being commonplace and “daring”
CS 851 Fall 2002 19
A Software Example• Ariane 5 failure
– Reuse of Ariane 4 code without changes– A function which was not required was left operational because
the code was unchanged – Software design error left a variable unprotected– Designers reused code because of confidence in their code on the
basis of the Ariane 4 success– Testing and analysis of the module that failed was discounted,
again because of the success of the module in Ariane 4
– Also had a fundamental flaw in the assumption that the software is correct till shown to be at fault
– Reflects the paradigm of logical error in design
CS 851 Fall 2002 20
Observations/ Discussion• The designer should properly anticipate future use and
behavior of the design to avoid being surprised
• In the context of successes, risk tends to be minimized or ignored
• Reuse of software might be a category which fits into this paradigm
• A Conclusion : It is worthwhile to study the failure paradigms in other engineering disciplines to anticipate failure sources in software engineering