24
1 1 The Validation The Validation Attitude Attitude Bob Colwell Bob Colwell April 2010 April 2010

Colwell validation attitude

Embed Size (px)

Citation preview

Page 1: Colwell validation attitude

1

1

The Validation The Validation AttitudeAttitude

Bob ColwellBob Colwell

April 2010April 2010

Page 2: Colwell validation attitude

2

2

AttitudeAttitude

I could talk about techniques, tools, I could talk about techniques, tools, FVFV

Environments, algorithms, machineryEnvironments, algorithms, machineryLanguages, suites, trainingLanguages, suites, training

but I think but I think attitudeattitude is more important is more important than any of thosethan any of those

Page 3: Colwell validation attitude

3

34/4/07 Bob Colwell

No Perfect DesignsNo Perfect Designs Nothing is perfect, everything has bugsNothing is perfect, everything has bugs

– Shortcomings, compromises, defects, design errata, gaffes, goofs, Shortcomings, compromises, defects, design errata, gaffes, goofs, fumbles, errors, boneheaded mistakes, bobbles, bungles, boo-boosfumbles, errors, boneheaded mistakes, bobbles, bungles, boo-boos

– But not all bugs are equal!But not all bugs are equal!

Can’t test to saturation: schedule matters tooCan’t test to saturation: schedule matters too Why is everything always so darned buggy?Why is everything always so darned buggy?

– Software…need say no more…Software…need say no more…– Why did Titanic not have waterproof compartments?Why did Titanic not have waterproof compartments?– Why did Ford Pinto have gas tank in back?Why did Ford Pinto have gas tank in back?– Why did Challenger fly with leaky O-rings?Why did Challenger fly with leaky O-rings?– Why did torpedoes not explode in WWII?Why did torpedoes not explode in WWII?

Entropy has a preferred directionEntropy has a preferred directionOnly genius could paint Mona Lisa, Only genius could paint Mona Lisa,

but any small child can destroy it quicklybut any small child can destroy it quickly

1000 ways to do things wrong, 1 or 2 that work1000 ways to do things wrong, 1 or 2 that work

Page 4: Colwell validation attitude

4

4Prescription: SW visualization, tools to localize bugs, diagnose problems, and instrument behavior

Page 5: Colwell validation attitude

5

54/4/07 Bob Colwell

Accidents Are InevitableAccidents Are Inevitable– It's the nature of engineering It's the nature of engineering

to push designs to edge of to push designs to edge of failure (schedule, reliability, failure (schedule, reliability, thermals, materials, tools, thermals, materials, tools, judgment of unknowns)judgment of unknowns)

– P(accident) = P(accident) = εε , for , for εε ≠ ≠ 00

– World rewards this behavior World rewards this behavior Cool new features + first to Cool new features + first to

market often preferred to market often preferred to dependabilitydependability

Other markets (life-support) Other markets (life-support) make (or should make) this make (or should make) this trade-off differently! trade-off differently!

Page 6: Colwell validation attitude

6

64/4/07 Bob Colwell

Isn’t that just Isn’t that just ??

Close. But Murphy is not Close. But Murphy is not quite right.quite right.

1.1. #Near-misses >> #disasters#Near-misses >> #disasters

2.2. Competent design/test finds Competent design/test finds simple errorssimple errors

3.3. Complex sequences & unlikely Complex sequences & unlikely event cascades survive to prod’nevent cascades survive to prod’n

Page 7: Colwell validation attitude

7

74/4/07 Bob Colwell

Failures Getting WorseFailures Getting Worse Mechanical things usually fail predictably due to physicsMechanical things usually fail predictably due to physics

– Wings bend, bridges groan, engines rattle, knees acheWings bend, bridges groan, engines rattle, knees ache

– By contrast, computer-based things fail “all over the place”By contrast, computer-based things fail “all over the place”

Helpful Engineering Attitude:Helpful Engineering Attitude:1.1. Nature does not want your Nature does not want your

engineered system to work; will engineered system to work; will actively work against youactively work against you

2.2. Your design will do only what Your design will do only what you’ve constrained it to do, only you’ve constrained it to do, only as long as it has toas long as it has to

3.3. Watch out for…Watch out for…

Normalization of devianceNormalization of deviance (Challenger O-rings, Apollo (Challenger O-rings, Apollo 1 fire)1 fire)

Page 8: Colwell validation attitude

8

8

The Steely-Eyed Missile ValidatorThe Steely-Eyed Missile ValidatorApollo 12Apollo 12 22ndnd try to land on moon, launched 11/14/69 try to land on moon, launched 11/14/69 36 seconds after liftoff, spacecraft struck by lightning => power 36 seconds after liftoff, spacecraft struck by lightning => power

surgesurge– All telemetry went haywire; book said to abort liftoffAll telemetry went haywire; book said to abort liftoff

– Both spacecraft pilot and mission controller were furiously considering that optionBoth spacecraft pilot and mission controller were furiously considering that option

– But John Aaron was on shift, and thought he’d seen this malfunction beforeBut John Aaron was on shift, and thought he’d seen this malfunction before

During testing 1 year earlier, Aaron observed test that went off into weedsDuring testing 1 year earlier, Aaron observed test that went off into weeds– Aaron took it on himself to investigate this – led him to obscure SCE subsystemAaron took it on himself to investigate this – led him to obscure SCE subsystem

In critical “abort or not” few seconds, with lives on line, Aaron made one of In critical “abort or not” few seconds, with lives on line, Aaron made one of most famous calls in NASA historymost famous calls in NASA history

– ““Flight, try SCE to ‘Aux’”Flight, try SCE to ‘Aux’”

– Neither Flight nor spacecraft pilot Conrad knew what that even meant, but Alan Bean tried itNeither Flight nor spacecraft pilot Conrad knew what that even meant, but Alan Bean tried it

– Telemetry came right back, vaulted Aaron into validation stardomTelemetry came right back, vaulted Aaron into validation stardom

He could have blown off earlier test, but He could have blown off earlier test, but he didn’the didn’t His inner validator wanted to know “what just happened?”His inner validator wanted to know “what just happened?”

Isaac Asimov once said 3 most important

words in science are “What was THAT?”

Page 9: Colwell validation attitude

9

9

Complexity Implies SurprisesComplexity Implies Surprises……and surprises are and surprises are badbadChaos effects in complex Chaos effects in complex µµ P’sP’s

– Decomposability is a fundamental tenet of Decomposability is a fundamental tenet of complex system designcomplex system design

– Butterfly wings ruin decomposabilityButterfly wings ruin decomposability

– ““Improve design, get slower performance” not Improve design, get slower performance” not at all uncommonat all uncommon

We must stop designing large We must stop designing large systems as though small ones simply systems as though small ones simply scale upscale up– lesson from comm engineers: lesson from comm engineers: assumeassume errors errors

Page 10: Colwell validation attitude

10

10

Thinking about validationThinking about validation

Ability to think in analogies is highest Ability to think in analogies is highest form of intelligenceform of intelligence– IQ tests like “a:b :: c:d”IQ tests like “a:b :: c:d”

– Hofstadter's book: numerical sequencesHofstadter's book: numerical sequences

Analogies may illuminate a subject in Analogies may illuminate a subject in a way that direct introspection cannota way that direct introspection cannot– They drive our minds to their creative limitsThey drive our minds to their creative limits

Page 11: Colwell validation attitude

11

11

Listen to Your Inner ValidatorListen to Your Inner Validator

You You knewknew it wouldn’t be 3, didn’t you? it wouldn’t be 3, didn’t you?– You sensed something’s not quite as it seemsYou sensed something’s not quite as it seems

Answer: 0, 1, 2, 720!, …Answer: 0, 1, 2, 720!, …= 0, 1, 2, 6!!= 0, 1, 2, 6!!

= 0, 1!, 2!!, 3!!!, …= 0, 1!, 2!!, 3!!!, …

That was the voice of your inner That was the voice of your inner validator that you were hearingvalidator that you were hearing

D. Hofstadter, Fluid Concepts and Creative Analogies

0, 1, 2, …?0, 1, 2, …?

Page 12: Colwell validation attitude

12

12

Lesson: Trust NothingLesson: Trust NothingHyatt Regency Hyatt Regency

hotel, Missouri, hotel, Missouri, 19801980

Catwalks on rodsCatwalks on rods40’ threaded rods 40’ threaded rods

with nuts halfwaywith nuts halfwayKilled 114, Killed 114,

injured 200injured 200

Page 13: Colwell validation attitude

13

13

What Happened?What Happened?

Spec was marginalSpec was marginal40’ threaded rods 40’ threaded rods

“too hard”, changed “too hard”, changed to 2x20’ by contractorto 2x20’ by contractor

No simulation, no testNo simulation, no test

Who goofed? Who goofed? Engineer, contractor, Engineer, contractor, inspector…everyoneinspector…everyone

Page 14: Colwell validation attitude

14

14

Therac-25Therac-25

Medical particle Medical particle acceleratoraccelerator

Electrons, Electrons, protons, X-raysprotons, X-rays

Six fatalities Six fatalities from poor from poor system/SW system/SW designdesign– And blind naïve And blind naïve

faith in computers!faith in computers!

Page 15: Colwell validation attitude

15

15

Question EverythingQuestion EverythingTest Test assumptionsassumptions as well as design as well as design

– If assumptions are broken, design surely is tooIf assumptions are broken, design surely is too

– Try to “catch the field goals”Try to “catch the field goals”

Page 16: Colwell validation attitude

16

16

Fight Urge to Relax RequirementsFight Urge to Relax Requirements

ChallengerChallenger– Not ok to slip design assumptions (launch temp, Not ok to slip design assumptions (launch temp,

# of unburnt O-rings) to suit desires# of unburnt O-rings) to suit desires

AirbusAirbus– Blaming pilot not reasonable explanation; pilot Blaming pilot not reasonable explanation; pilot

is part of system designis part of system design

Runway “incursions” up 71% since ‘93Runway “incursions” up 71% since ‘93– Near-misses are trying to Near-misses are trying to tell us somethingtell us something

Diane Vaughan, The Challenger Launch Decision, Chicago Press Diane Vaughan, The Challenger Launch Decision, Chicago Press 1996; Nancy Leveson, Safeware, Addison-Wesley 19951996; Nancy Leveson, Safeware, Addison-Wesley 1995

Page 17: Colwell validation attitude

17

17

If You Didn’t Test It, If You Didn’t Test It, It Doesn’t Work It Doesn’t Work

Mir: fire extinguishers Mir: fire extinguishers boltedbolted to wallto wall– Still had strong metal launch strapsStill had strong metal launch straps

– Had never been needed before, so never testedHad never been needed before, so never tested

– Discovered with a roaring fire several feet awayDiscovered with a roaring fire several feet away

Page 18: Colwell validation attitude

18

184/4/07 Bob Colwell

Complexity Makes Everything WorseComplexity Makes Everything Worse

Some things must be complicated to do their jobSome things must be complicated to do their job– Our brains, for exampleOur brains, for example

But complex sequences are root of most disastersBut complex sequences are root of most disasters– Challenger, Bhopal, Chernobyl, FDIV, Exxon Valdez Challenger, Bhopal, Chernobyl, FDIV, Exxon Valdez

Where does complexity come from? Why does it Where does complexity come from? Why does it keep increasing? Where are the limits?keep increasing? Where are the limits?

– Pentium 4Pentium 4

““in the small” vs “in the large” design (micros vs in the small” vs “in the large” design (micros vs comm systems)comm systems)

What to do? Vigilance, testing, awareness…we are What to do? Vigilance, testing, awareness…we are all validatorsall validators

Page 19: Colwell validation attitude

19

19

What To DoWhat To Do

Get the spec rightGet the spec right Design for correctness but…Design for correctness but… design knowing perfection is unattainable design knowing perfection is unattainable Users are part of the systemUsers are part of the system Formal methodsFormal methods Pre-production testing and validationPre-production testing and validation Post-production testing and verificationPost-production testing and verification Education of the publicEducation of the public

Page 20: Colwell validation attitude

20

204/4/07 Bob Colwell

RolesRolesEngineers must stand Engineers must stand

their groundtheir ground– There are always doubts, There are always doubts,

incomplete data; don’t let incomplete data; don’t let ‘em use those against you‘em use those against you

Judgment is crucially Judgment is crucially needed -- needed -- YOURSYOURS

–Remember the ChallengerRemember the Challenger““My God, Thiokol, when do you want me to launch? Next April?”My God, Thiokol, when do you want me to launch? Next April?”

–Be careful with “data”Be careful with “data”““Risk assessment data is like a captured spy; if you torture it long enough, it will tell you Risk assessment data is like a captured spy; if you torture it long enough, it will tell you anything you want to know…” anything you want to know…” (Wm. Ruckelshaus)(Wm. Ruckelshaus)

–Crushing, conflicting demands are normCrushing, conflicting demands are normDesignDesign must push the envelope w/o ceding responsibility must push the envelope w/o ceding responsibility

ValidationValidation establishes whether they've pushed it too far establishes whether they've pushed it too far

ManagementManagement must beware overriding tech judgment must beware overriding tech judgment

PublicPublic must understand limits of human design process must understand limits of human design process

All players must value roles of others!All players must value roles of others!

engineermgt HR

Page 21: Colwell validation attitude

21

21

Roles cont.Roles cont.

ManagementManagement– wants to assume a product is safewants to assume a product is safe

– knows nothing’s ever perfect, knows nothing’s ever perfect, comes a time to “shoot the engineers” or they’ll never comes a time to “shoot the engineers” or they’ll never

stop tinkeringstop tinkering

ValidatorsValidators– want to prove a product is safewant to prove a product is safe

– assume it is not by defaultassume it is not by default

– only informed arbiters of when product is readyonly informed arbiters of when product is ready

don’t fall for “might as well sign, we’re shipping anyway”!!

Page 22: Colwell validation attitude

22

22

Future Directions:Future Directions:Public ExpectationsPublic Expectations

Andy Grove’s FDIV epiphanyAndy Grove’s FDIV epiphany

Paradoxically, the more high tech, the more public expects of productParadoxically, the more high tech, the more public expects of product

Users caused Chernobyl, TMI by going “off book”, but prevented many Users caused Chernobyl, TMI by going “off book”, but prevented many other disasters with real-time creativity…lessons are subtleother disasters with real-time creativity…lessons are subtle

Takes exquisite understanding & judgment to discern Takes exquisite understanding & judgment to discern accidents from reasonable risk-taking and accidents from reasonable risk-taking and bonehead errors or incompetencebonehead errors or incompetence

This is what a jury must do.This is what a jury must do.

How?How?

Can’t keep trending this wayCan’t keep trending this way

Page 23: Colwell validation attitude

23

23

Future of ValidationFuture of Validation

Multiple Culture Changes NeededMultiple Culture Changes Needed

Public needs to stop expecting perfectionPublic needs to stop expecting perfection

Design teams must explicitly limit complexityDesign teams must explicitly limit complexity

and avoid auto-scale-up assumptionsand avoid auto-scale-up assumptions

Companies must mature past point of viewing Companies must mature past point of viewing validation as an unpleasant overheadvalidation as an unpleasant overhead

does your company have “Validation Fellows?”does your company have “Validation Fellows?”

Validation is a profession of its own. Validation is a profession of its own. Cultivate the Validation Attitude!Cultivate the Validation Attitude!

Page 24: Colwell validation attitude

24

24

The EndThe End