Upload
irene-bailey
View
213
Download
0
Embed Size (px)
Citation preview
California Institute of Technology
Formalized Pilot Study of Safety-Critical Software Anomalies
Dr. Robyn Lutz and Carmen Mikulski
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The work was sponsored by
the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility.
This activity is managed locally at JPL through the Assurance and Technology Program Office
OSMA Software Assurance Symposium
September 4-6, 2002
SAS’02 2
California Institute of TechnologyTopics
• Overview• Results
– Quantitative analysis– Evolution of requirements– Pattern identification & unexpected
patterns• Work-in-progress• Benefits
SAS’02 3
California Institute of Technology
Overview
• Goal: To reduce the number of safety-critical software anomalies that occur during flight by providing a quantitative analysis of previous anomalies as a foundation for process improvement.
• Approach: Analyzed anomaly data using adaptation of Orthogonal Defect Classification (ODC) method– Developed at IBM; widely used by industry– Quantitative approach– Used here to detect patterns in anomaly data– More information at
http://www.research.ibm.com/softeng
• Evaluated ODC for NASA use using a Formalized Pilot Study [Glass, 97]
SAS’02 4
California Institute of Technology
Overview: Status
• Year 3 of planned 3-year study Plan Design Conduct Evaluate Use
• FY’03 extension proposed to extend ODC work to pre-launch and transition to projects (Deep Impact, contractor-developed software, Mars Exploration Rover testing)
• Adapted ODC categories to operational spacecraft software at JPL:– Activity: what was taking place when anomaly
occurred?– Trigger: what was the catalyst?– Target: what was fixed?– Type: what kind of fix was done?
SAS’02 5
California Institute of Technology
Results: ODC Adaptation
Activities Triggers
Targets Types
Software Configuration Function/Algorithm Hardware Configuration
Ground Software Interfaces
System Test Start/Restart, Shutdown Assignment/Initialization Command Sequence Test
Timing
Inspection/Review Function/Algorithm Recovery Interfaces Normal Activity Flight Software Assignment/Initialization Flight Operations Data Access/Delivery Timing Special Procedure Flight Rule Hardware Failure Build /Package Install Dependency Unknown Unknown Packaging Scripts Ground Resources Resource Conflict Info. Development Documentation Procedures Hardware Hardware None/Unknown Nothing Fixed Unknown
Adapted ODC classification to post-launch spacecraft Incident Surprise Anomalies (ISAs)
SAS’02 6
California Institute of TechnologyResults: Quantitative Analysis
• Analyzed 189 Incident/Surprise/Anomaly reports (ISAs) of highest criticality from 7 spacecraft– Cassini, Deep Space 1, Galileo, Mars Climate Orbiter, Mars
Global Surveyor, Mars Polar Lander, Stardust
• Institutional defect database Access database of data of interest Excel spreadsheet with ODC categories Pivot tables with multiple views of data
• Frequency counts of Activity, Trigger, Target, Type, Trigger within Activity, Type within Target, etc.
• User-selectable representation of results • User-selectable sets of spacecraft for comparison• Provides rapid quantification of data
SAS’02 7
California Institute of Technology
Results: Quantitative Analysis
Distribution of Triggers within Activity
0
10
20
30
40
50
60
70
80
Dat
a A
cces
s/D
eliv
ery
Har
dwar
e F
ailu
re
Nor
mal
Act
ivity
Rec
over
y
Spe
cial
Pro
cedu
re
Cm
d S
eq T
est
Har
dwar
eC
onfig
urat
ion
Insp
ectio
n/R
evie
w
Sof
twar
e C
onfig
urat
ion
Sta
rt/R
esta
rt/S
hutd
own
Unk
now
n
Flight Operations System Test Unknow n
Total
PROJECT (All)
Count of Trigger
Activity Trigger
Drop More Series Fields Here
SAS’02 8
California Institute of Technology
Results: Quantitative Analysis
Flig
ht S
oftw
are
Gro
und
Softw
are
Sys
tem
Tes
t - T
imin
g
Sys
tem
Tes
t - In
terf
aces
Sys
tem
Tes
t -F
unct
ion/
Alg
orith
m
Sys
tem
Tes
t -A
ssig
nmen
t/Ini
tializ
atio
n
Flig
ht O
pera
tions
- T
imin
g
Flig
ht O
pera
tions
- In
terf
aces
Flig
ht O
pera
tions
-F
unct
ion/
Alg
orith
m
Flig
ht O
pera
tions
-A
ssig
nmen
t/Ini
tializ
atio
n
0
2
4
6
8
10
12
14
16
Data Access/Delivery
Hardw are Failure
Normal Activity
Recovery
Special Procedure Flight Softw are
Ground Resources
Ground Softw are
Hardw are
Information Development
None/Unknow n0
5
10
15
20
25
30
Trigger vs. Target
Ground/Flight S/W vs. Type within Activity
SAS’02 9
California Institute of Technology
Results: Evolution of Requirements• Anomalies sometimes result in changes to
software requirements• Finding:
– Change to handle rare event or scenario (software adds fault tolerance)
– Change to compensate for hardware failure or limitations (software adds robustness)
• Contradicts assumption that “what breaks is what gets fixed”
Example: Damaged Solar Array Panel cannot deploy as planned
Activity = Flight Operations (occurred during flight) Trigger = Hardware failure (Solar Array panel incorrect position--
broken piece rotated & prevented latching)
Target = Flight Software (Fixed via changes to flight software)
Type = Function/Algorithm (Added a solar-array-powered hold capability to s/w)
SAS’02 10
California Institute of TechnologyResults: Pattern Identification
• Sample Question: What is the typical signature of a post-launch critical software anomaly?
• Finding: – Activity = Flight Operations– Trigger = Data Access/Delivery– Target = Information Development– Type = Procedures
• Example: Star Scanner anomaly– Activity = occurred during flight – Trigger = star scanner telemetry froze – Target = fix was new description of star calibration – Type = procedure written
SAS’02 11
California Institute of Technology
Results: Unexpected Patterns
Examples of Unexpected ISA patterns:
Process Recommendation: Example (from spacecraft):
22% of critical ISAs had ground software as Target (fix)
Software QA for ground software
Unable to process multiple submissions. Fixed code.
23% of critical ISAs had procedures as Type
Assemble checklist of needed procedures for future projects
Not in inertial mode during star calibration. Additions made to checklist to prevent in future.
Of these, 41% had Data access / delivery as Trigger
Better communication of changes and updates to operations
Multiple queries for spacecraft engineering and monitor data failed. Streamlined notification to operators of problems.
34% of critical ISAs involving system test had software configuration as Trigger (cause) ; 24% had hardware configuration as Trigger
Additional end-to-end configuration testing
OPS personnel did not have a green command system for the uplink of two trajectory-correction command files. Problems resulted from a firewall configuration change.
SAS’02 12
California Institute of Technology
Work-In-Progress
• Assembling process recommendations tied to specific findings and unexpected patterns; in review by projects
• Working to incorporate ODC classifications into next-generation problem-failure reporting database (to support automation & visualization)
• Disseminating results: invited presentations to JPL Software Quality Improvement task, to JPL Mission Assurance Managers, to MER, informal briefings to other flight projects; at Assurance Technology Conference (B. Sigal), included in talk at Metrics 2002 (A. Nikora), at 2001 IFIP WG 2.9 Workshop on Requirements Engineering; papers in 5th IEEE Int’l Symposium on Requirements Engineering and The Journal of Systems and Software (to appear).
SAS’02 13
California Institute of Technology
Work-In-Progress
• Collaborating with Mars Exploration Rover to experimentally extend ODC approach to pre-launch software problem/failure testing reports– Adjusted ODC classifications to testing phases
(build, integration, acceptance)– Delivered experimental ODC analysis of 155
Problem/ Failure Reports to MER – Feedback from Project has been noteworthy– Results can support tracking trends and progress:
• Graphical summaries • Comparisons of testing phases
– Results can provide better understanding of typical problem signatures
SAS’02 14
California Institute of TechnologyBenefits
• User selects preferred representation (e.g., 3-D bar graphs) and set of projects to view
• Data mines historical and current databases of anomaly and problem reports to feed-forward into future projects
• Uses metrics information to identify and focus on problem areas
• Provides quantitative foundation for process improvement
• Equips us with a methodology to continue to learn as projects and processes evolve