29
HAB Software Woes John Graham-Cumming September 2012 Or “My capsule didn’t crash but my software did”

HAB Software Woes

Embed Size (px)

DESCRIPTION

My talk from the UKHAS 2012 conference about problems in HAB software.

Citation preview

Page 1: HAB Software Woes

HAB Software WoesJohn Graham-CummingSeptember 2012

Or “My capsule didn’t crash but my software did”

Page 2: HAB Software Woes

Background> 30 years of

programming experience

One HAB flight◦ GAGA-1

http://blog.jgc.org/2011/04/gaga-1-flight.html

https://github.com/jgrahamc/gaga

Page 3: HAB Software Woes

Where’s your flight’s complexity?Example: GAGA-1

◦One balloon, parachute, polystyrene box◦Many metres of cord attached with knots◦An off-the-shelf camera

◦2,836 lines of code◦Common to see defect rates of 2 to 4 per

KLOC◦So GAGA-1 likely has 5 to 10 errors in it

Page 4: HAB Software Woes

Real Stuff Seen on HAB flightsComplete computer crashAltitude going negativeLatitude and longitude garbledCutdown triggered in back of carLong periods of no transmissionNot setting the GPS up before launchNot turning the camera onRunning out of camera disk spaceAltitude jumping around rhythmically

Page 5: HAB Software Woes

The Curse and Joy of DeterminismComputers do what you tell them

to◦Precisely what you tell them to◦Not what you think you told them to

doA Curse

◦Will do things you don’t expect◦Will process bogus input without

complaintThe Joy

◦Easy to test that it does what’s expected

Page 6: HAB Software Woes

HAB Is A Harsh EnvironmentColdVibrationStuff breaks in flight

Software needs to be able to cope with failing hardware

Very important to think about failure modes

YOUR CODE IS ON ITS OWN OUT THERE

Page 7: HAB Software Woes

Deadly SinsThe “It works!” FallacyThe Last Minute ChangeBeing Far Too CleverOverlooking Odd BehaviourCopying Other People’s CodeAssuming Finding A Bug Solves

The Problem

Page 8: HAB Software Woes

The “It works!” FallacyIf you’re an inexperienced (and

sometimes experienced) programmer…◦You hack some code together◦It works once◦You assume it will always work

Only solution to this is◦Testing◦Paranoia

Page 9: HAB Software Woes

The Last Minute ChangeNever, ever change anything in

code at the last minute no matter how simple.

Example: HABE 1◦Complete camera failure◦Maximum integer size in uBASIC on

CHDK is 999,999◦Last minute change of integer from

600,000 to 1,000,000 caused total failure

Page 10: HAB Software Woes

Being Far Too CleverExample: GAGA-1

◦Entered the wrong value of 2 * pi in code to do GPS position conversion from radians to degrees

◦Caught before flight because I verified the location of my own back garden

◦Note to self: 2 * pi != 6.2818.

https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/gps.cpp#L113

Page 11: HAB Software Woes

Overlooking Odd BehaviourExample: GAGA-1

◦ In tests RTTY output was fine some of the time, garbled at other times

◦Turned out to be interrupts from the GPS messing up the RTTY timing

◦Solution: disable GPS serial interface while sending RTTY string

ALWAYS BE HONEST WITH YOURSELF ABOUT YOUR CODE

EXPECT THE SPANISH INQUISITION!

https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/tsip.cpp#L229

Page 12: HAB Software Woes

Copying Other People’s CodeDon’t do this, you have no idea

what you are copying or who they copied it from

Better practice is to look at other people’s code and…◦Write your own version◦That you understand◦That you are able to test◦Example: GAGA-1

Read lots of people’s RTTY code, wrote my ownhttps://github.com/jgrahamc/gaga/blob/master/gaga-1/

flight/gaga1/rtty.cpp

Page 13: HAB Software Woes

APRS Tracker using copied code

If the altitude in metres contained an 8 or a 9 the altitude reported would be wrong

http://sharon.esrac.ele.tue.nl/users/pe1rxq/aprstracker/aprstracker.html

Page 14: HAB Software Woes

Assuming Finding The Bug Solves The ProblemJust because you’ve found A bug

doesn’t mean it was THE bugLots of research in computer

science shows bugs tend to cluster

Example: CLOUD1, CLOUD2◦Three bugs in printing latitude,

longitude and altitude◦One fixed on CLOUD1, …

Page 15: HAB Software Woes

“The One Thing I Didn’t Test”

http://ukhas.org.uk/guides:common_coding_errors_payload_testing

Page 16: HAB Software Woes

Common problems with uCLack of floating point supportSmall integers

Page 17: HAB Software Woes

You might never be a great programmer…

… but you can be a paranoid tester!

Page 18: HAB Software Woes

Good Things To DoNo infinite loopsSelf-CheckingUnexpected Error HandlingHandle ExceptionsSimulationSimplify, Simplify, SimplifyUnit TestWrite Log Files

Page 19: HAB Software Woes

No Infinite LoopsNever sit in a loop waiting foreverExample: ATLAS 3while (1) {    // Make sure data is available to read    if (Serial.available()) {      b = Serial.read();            if(bytePos == 8){        navmode = b;        return true;      }                              bytePos++;    }    // Timeout if no valid response in 3 seconds    if (millis() - startTime > 3000) {      navmode = 0;      return false;    }  }}

https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L211

Page 20: HAB Software Woes

Self-Checking-- Now enter a self-check of the manual mode settings

log( "Self-check started" )

assert_prop( 49, -32764, "Not in manual mode" )assert_prop(  5,      0, "AF Assist Beam should be Off" )assert_prop(  6,      0, "Focus Mode should be Normal" )assert_prop(  8,      0, "AiAF Mode should be On" )assert_prop( 21,      0, "Auto Rotate should be Off" )assert_prop( 29,      0, "Bracket Mode should be None" )assert_prop( 57,      0, "Picture Mode should be Superfine" )assert_prop( 66,      0, "Date Stamp should be Off" )assert_prop( 95,      0, "Digital Zoom should be None" )assert_prop( 102,      0, "Drive Mode should be Single" )assert_prop( 133,      0, "Manual Focus Mode should be Off" )assert_prop( 143,      2, "Flash Mode should be Off" )assert_prop( 149,    100, "ISO Mode should be 100" )assert_prop( 218,      0, "Picture Size should be L" )assert_prop( 268,      0, "White Balance Mode should be Auto" )assert_gt( get_time("Y"), 2009, "Unexpected year" )assert_gt( get_time("h"), 6, "Hour appears too early" )assert_lt( get_time("h"), 20, "Hour appears too late" )assert_gt( get_vbatt(), 3000, "Batteries seem low" )assert_gt( get_jpg_count(), ns, "Insufficient card space" )

https://github.com/jgrahamc/gaga/blob/master/gaga-1/camera/gaga-1.lua#L96

Page 21: HAB Software Woes

Self-CheckingExample: ALTAS 3Makes sure uBlox GPS will work

at high altitude; fixes it if not    if((count % 10) == 0) {     digitalWrite(6, LOW);     checkNAV();     delay(1000);     if(navmode != 6){       setupGPS();       delay(1000);     }     checkNAV();     delay(1000);     digitalWrite(6, HIGH);   }

https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L342

Page 22: HAB Software Woes

Unexpected Error Handlingdef temperature(): t = at.cmd( 'AT#TEMPMON=1' )

# Command returns something like: # # #TEMPMEAS: 0,28 # # OK # # So split on whitespace first to isolate the temperate 0,28 # and then split on comma to get the temperature

w = t.split() if len(w) < 2: logger.log( "Temperature read returned %s" % t ) return -1000 m = w[1].split(',') if len(m) != 2: logger.log( "Temperature read returned %s" % t ) return -1000 else: return int(m[1])

https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/util.py

Page 23: HAB Software Woes

Handle ExceptionsIf your language can generate

exceptions then you’d better handle them!

Example: GAGA-1◦Recovery computer used Python◦Exception could have killed it◦Global exception handler

Bonus: What’s wrong with that code?

except: logger.log( "Caught exception in main loop: %s" % sys.exc_info()[1] )

https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/gaga-1.py#L144

Page 24: HAB Software Woes

SimulationSimulate a flightExample: UKHAS wiki has

example of using a PC as a fake GPS

Example: GAGA-1◦To test the embedded Telit module

wrote modules that faked the entire Telit Python interface.

http://www.ukhas.org.uk/guides:common_coding_errors_payload_testing

https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/GPS.py

https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/MDM.py

Page 25: HAB Software Woes

Simplify, Simplify, SimplifyMake your code as simple as

possibleNever have ‘duplicated’ or ‘copy

and paste’ codeBreak it up into small functions

that you understandMake sure you understand the

limitations of the functions you call

Page 26: HAB Software Woes

Unit TestBreak your program up into

small, separate functionsWrite tests that call that function

and make sure it does what you expect.

Lots of ways to do this◦Use something like cpptest◦ArduinoUnit◦Write your own test program

Page 27: HAB Software Woes

Unit Test ExampleIn the bad APRS programTurn metres to feet code into a

separate function: int m_to_f(int m)assertEquals(m_to_f(1000),3300)assertEquals(m_to_f(2000),6600)assertEquals(m_to_f(3000),9900)assertEquals(m_to_f(4000),13200)assertEquals(m_to_f(5000),16500)assertEquals(m_to_f(6000),19800)assertEquals(m_to_f(7000),23100)assertEquals(m_to_f(8000),26400)assertEquals(m_to_f(9000),29700)assertEquals(m_to_f(10000),33000)

Page 28: HAB Software Woes

Write Log FilesWrite detailed log files to non-

volatile memory for post flight debugging

Data sent via RTTY or APRS is limited

Log exceptions and errors in detail

Make sure you have a timestamp

Page 29: HAB Software Woes

Perform system testingTest your entire system before flight

◦Put your tracker in the garden◦Get a GPS lock◦Listen to the RTTY on your radio◦Look at the decoded RTTY on your

computer◦Test uploaded data on the tracker*

◦*I didn’t do that step, on the day people had to fix the tracker for me.