20
March 14, 2016 Sam Siewert SE420 - Software Quality Assurance Lecture 9 – Negative Testing, Defect Tracking and Root-Cause Analysis http://dilbert.com/strip/2010-08-21

SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

March 14, 2016 Sam Siewert

SE420 - Software Quality Assurance

Lecture 9 – Negative Testing, Defect Tracking and Root-Cause Analysis

http://dilbert.com/strip/2010-08-21

Page 2: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Reminders Assignment #4

Remaining Assignments [Top Down / Bottom-Up] – #5 – Design, Module Unit Tests and Regression Suite – #6 – Complete Code, Refine and Run all V&V Tests and Deliver

Track Bugs with Bugzilla - http://prclab.pr.erau.edu/ Import your Project Code into GitHub - https://github.com/

Sam Siewert 2

Page 3: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Negative Testing, Defect Tracking and Root-Cause Analysis

Sam Siewert

3

http://www.nasa.gov/pdf/65776main_noaa_np_mishap.pdf, http://en.wikipedia.org/wiki/NOAA-19

Page 4: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Integration and Test Integrate Software Modules [units] and Hardware Components into Sub-systems Test Focus on Interfaces [Function, Message, Shared Memory, Hardware], Protocols, and Interoperability of Modules

Sam Siewert 4

Page 5: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Test Types – Goals Today Positive Tests

– Functional Software Interface Tests Functions calling Functions – API Message Passing – Local Message Queues, Network, Client-Server Shared Memory – Synchronization, Buffers

– Hardware Interface Tests Drivers and Device Interfaces Firmware [ROM Code, Run out of Reset]

Negative Tests

– Software Interface Faults – Hardware Interface Fault Injection

Bug Tracking, Defect Rate, How to Use for Project and SQA Management Root-Cause-Analysis [RCA] Wrap-Up – JPL Mars Pathfinder Story Diagnostics [Built-in Self-Test] Unit Interoperability

– Sub-system Resource Testing – Memory, CPU, I/O, Storage, Power – Protocols – Message Acknowledgement, Command/Response, Background Commands, Peer-to-

Peer, etc.

Performance Tests – Profiles and Traces

Sam Siewert 5

Page 6: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Outline for Every Integration Test 1. Check out Specific Source Code Test Configuration – CMVC Tools, Git

– Collection of Modules [Units] Tagged by Revision Control – OR Current

2. Build and Link Modules (*.o) and Libraries (*.a) into Sub-system to Test

3. Load / Install Sub-system Code onto Test Hardware Platform of Known

Configuration – Record key hardware configuration parameters – E.g. for I/O HW config - lspci, lsusb, – General config - hwinfo – Linux OS kernel build config - uname –a – cat /proc/meminfo – cat /proc/cpuinfo

4. Run Integrated Test(s) [with Gcov, Lcov, Gprof] 5. Review of Expected Syslogs, Output to Terminal, for Each Feature 6. Review Performance Profiles 7. Track Bugs, Anomalies, and Disposition as Defects

Sam Siewert 6

Page 7: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Bug Open/Close Rates and Readiness Controversy – Bug Counts, Closure and Prediction of Phase Transition Readiness – E.g. Unit to I&T to System Test to Acceptance Test to Shipment – Can Be inaccurate due to Unsatisfactory Testing or Lack of Criteria – Guideline for Project Management [Compared to Guessing!] – Not all Reported Bugs Become Defects [Test Case Errors, Human Error]

Sam Siewert 7

http://www.testandverification.com/DVClub/24_Jan_2011/Greg_Smith.pdf

Test

Cas

e C

over

age

[E.g

. Cod

e P

ath

Cov

erag

e]

Bug

Cou

nts

[Rep

orte

d, N

ot V

erifi

ed a

s D

efec

t]

Page 8: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Root-Cause Analysis Field Issue - Anomaly, Reported Bug, Data Corruption, … – Software Defect? – Hardware Reliability – User Error

Reproducibility – Capture Conditions via Logging – Recreate Scenario in SQA / QA Lab

Trace to Root-Cause – Assert – Analysis Triggers

– Propose Fixes – Apply and Regression Test – Release Maintenance Patch

Sam Siewert 8

Page 9: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Case Study – Mars Pathfinder Story JPL Mission Flow to Mars, Landing on July 4th, 1997 Pathfinder Rolling Resets on Final Approach to Mars Capture Orbit VxWorks RTOS Used Reproduction of Anomaly on the Ground Root-Cause Analysis Proposed Fix

Sam Siewert 9

Page 11: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Note on Data Driven Algorithms and CPU Loading

Real-Time Algorithms Ideally have Fixed Computational Demands per Request – Provide Predictable Response,

Enables Accurate Rate-Monotonic Analysis

– Rate Monotonic Theory Requires Known C, T, D Inputs [CPU Required, Request Rate, Deadline Relative to Request Time]

Computer Vision and Image Processing Depends on Data from Instrument Observation – Parsing Scene for Linear Segments

[Edges] – Finding Elliptical or Circular Objects

[Craters, Holes, etc.] – Number of Features Found and

Processed will Vary! – Optical Navigation – Making an Impact:

AI Group at JPL Sam Siewert 11

Hough Linear Example

Hough Circular Example

Page 12: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Discussion … List of Theories for Root Cause [Good List, From OS, General Engineering Judgement] Suggestions for Teamwork [Good Approaches – Brainstorm, Gather all Cognizant Engineers into One Room – JPL, Wind River, RAD6000] Scenario and Anomaly [Rolling Reset on Approach] Reproduction on Ground System Software Re-Use and Lack of Default to Inversion Safe MUTEX in POSIX Pipes, Triggered due to Meteorological Increased CPU Loading for Landing Sites, Root-Cause Ground Verification and Uplink to Enable Inversion Safe Option for Hidden MUTEX Mission Saved and Quite Successful! Sam Siewert 12

Page 13: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Diagnostic Tests Primarily Hardware Tests, Driven by Software Could be OS test, E.g. During Boot of System – CPU – I/O – Network – Memory test – File system test – OS Services

Memory Test – Simple – Walking 1’s,

Address Bus Test, Pattern Tests all Read-after-Write to Address

– Advanced – ECC, SoC Drawer Paper Sam Siewert 13

E.g. Linux Boot-up Process for Centos 6.x

Page 14: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

BIST – Built-in Self Tests SW Driven and Controlled Diagnostics [Firmware] Key to Hardware Verification Cooperative Hardware and Firmware Mode Make Available for Root-Cause Analysis Post-Ship or During I&T and System Testing E.g. Dell Laptops – LCD BIST Disk Drive Test-Unit Ready – sg_turs, T10 TUR

Sam Siewert 14

Page 15: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Performance Tests Profiling – Gprof – Open souce tool [similar to Gcov, but for Profiling] – Vtune – Commercial Tool from Intel – Logic Analyzer and HP’s SPA (Statistical Performance Analysis)

Tracing – E.g. Timestamps output to syslog

Statistics – top, htop – iostat – memstat

Workloads – Iometer – stress

Sam Siewert 15

Page 16: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Performance - Sysprof What is Using CPU on my System Rather than Profile of an Application – Sub-System [Service]

Sam Siewert 16

Page 17: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Gprof Simple –pg compile opiton Run, gprof on gmon.out to get analysis

Sam Siewert 17

%make cc -O3 -Wall -pg -msse3 -malign-double -g -c raidtest.c raidtest.c: In function 'main': raidtest.c:99: warning: format '%d' expects type 'int', but argument 2 has type 'long unsigned int' raidtest.c:68: warning: unused variable 'aveRate' raidtest.c:68: warning: unused variable 'totalRate' raidtest.c:66: warning: unused variable 'rc' raidtest.c:212: warning: control reaches end of non-void function cc -O3 -Wall -pg -msse3 -malign-double -g -c raidlib.c cc -O3 -Wall -pg -msse3 -malign-double -g -o raidtest raidtest.o raidlib.o %./raidtest Will default to 1000 iterations Architecture validation: sizeof(unsigned long long)=8 RAID Operations Performance Test Test Done in 453 microsecs for 1000 iterations 2207505.518764 RAID ops computed per second %ls Makefile gmon.out raidlib.h raidlib64.c raidtest raidtest.o Makefile64 raidlib.c raidlib.o raidlib64.h raidtest.c raidtest64 %gprof raidtest gmon.out > raidtest_analysis.txt

Page 18: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Gprof Analysis 1 million iterations of RAID test XOR and Rebuild

Sam Siewert 18

Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 82.13 1.54 1.54 main 15.47 1.83 0.29 2000001 145.38 145.38 xorLBA 2.67 1.88 0.05 2000001 25.07 25.07 rebuildLBA % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. … calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, … total the average number of milliseconds spent in this ms/call function and its descendents per call, … name the name of the function. …

RAID Operations Performance Test Test Done in 206417 microsecs for 1000000 iterations 4844562.221135 RAID ops computed per second

Page 19: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Call Graph Profile from Gprof

Sam Siewert 19

Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.53% of 1.88 seconds index % time self children called name <spontaneous> [1] 100.0 1.54 0.34 main [1] 0.29 0.00 2000001/2000001 xorLBA [2] 0.05 0.00 2000001/2000001 rebuildLBA [3] ----------------------------------------------- 0.29 0.00 2000001/2000001 main [1] [2] 15.4 0.29 0.00 2000001 xorLBA [2] ----------------------------------------------- 0.05 0.00 2000001/2000001 main [1] [3] 2.7 0.05 0.00 2000001 rebuildLBA [3] ----------------------------------------------- This table describes the call tree of the program, and was sorted by the total amount of time spent in each function and its children… % time This is the percentage of the `total' time that was spent in this function and its children… self This is the total amount of time spent in this function. children This is the total amount of time propagated into this function by its children. called This is the number of times the function was called…

Page 20: SE420 - Software Quality Assurancemercury.pr.erau.edu/~siewerts/se420/documents/... · Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-

Discussion and Q&A I&T is to Verify and Validate Sub-systems from Integrated SW Units and HW Components, in a Configuration – Unit Tests Precede – Integrate and Configure – Function/Feature Positive Tests – Negative Testing [Fault Injection] – Interoperability Testing – Diagnostics, Root-Cause, and Bug Tracking Critical New Aspects – Performance Testing [of Integrated and Configured Sub-systems] – Determine Readiness for Final Integration and Entry to System

Testing – Provides Regression Test Cases for System Test

Precedes System Test, Where Sub-systems are … – Fully Integrated – Configured Similar to Deployment [Perhaps Not Exact – E.g.

Spacecraft in Thermal-Vac Testing] – Stimulated with Tests Replicating Operations

Sam Siewert 20