© 2010 Carnegie Mellon University
Approach and Challenges in Qualification of Mission-critical Software-reliant Systems
Software Engineering InstituteCarnegie Mellon UniversityPittsburgh, PA 15213
Peter H FeilerJune 15, 2010
2
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
OutlineSources of unreliability in software-reliant systemsAn architecture-centric approach to software-reliant system engineeringA framework for reliability validation & improvement
3
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
We Rely on Embedded Software Systems for Safe & Reliable System Operation
4
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Software Problems not just in Aircraft
How do you upgrade washing machine software?
5
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
State of Current Practice
Reliance on engineering process• Safety culture and build-then-test practice• Primarily text-based requirements documentation & traceability
Separation of system engineering & software engineering• Leads to requirements gaps for software
Reliability engineering treats hardware & software reliability the same• Reliability engineering has its roots in hardware
– focuses on physical fault history of slowly evolving systems – assumes independence of events
• Software is often assumed to be perfectible– software faults are due to design errors and are often systemic– rapid SW evolution and changing context result in reliability degradation
over timeFalse assumption that reliable and secure systems are safe systems
6
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Late Discovery of System-Level Problems
5x
SoftwareArchitectural
Design
SystemDesign
ComponentSoftwareDesign
CodeDevelopment
UnitTest
SystemTest
Integration Test
Acceptance Test
RequirementsEngineering
110x
Where faults are introducedWhere faults are found
The estimated nominal cost for fault removal
20.5%
1x
20%, 16%
10%, 50.5%
0%, 9% 40x
70%, 3.5% 16x
Sources:
NIST Planning report 02-3, The Economic Impacts of Inadequate Infrastructure for Software Testing, May 2002.
D. Galin, Software Quality Assurance: From Theory to Implementation, Pearson/Addison-Wesley (2004)
B.W. Boehm, Software Engineering Economics, Prentice Hall (1981)
60% of errors in fault management software
Requirements & system interaction errors
Software components largely meet requirements
80% late error discovery at high
repair cost
80% late error discovery at high
repair cost
80% late error discovery at high rework &
recertification cost
80% of accidents due to operator errorHigh recertification cost of design error corrections
leads to 75% of operator time spent in work-arounds
7
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
New Levels of System Interaction Complexity & Mismatched Assumptions
System Engineer
Control Engineer
Application Developer
System User
SystemUnder Control
ControlSystem
ComputePlatform
RuntimeArchitecture
ApplicationSoftware
Embedded SW System Engineer
Physical Plant CharacteristicsLag, proximity
Model recalibration
Data StreamsUnstable control & inconsistent state
due to jitter and loss
Data RepresentationAriane 4/5: 16-bit data Air Canada: gal vs. l
Concurrency Race conditions crash applications
designed for single-core on multi-cores
Distribution & RedundancyLoss of redundancy & other
hazards due to HW Virtualization
Operator ErrorDriver lockout
Software runtime system impacts safety-critical software & system properties
Human
8
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Modeling Pitfall: Low Confidence due to Multiple “Truths”
Fault model
System implementation
Inconsistency of independently maintained analysis models
Lack of confidence that implementation is consistent with model
Security modelTiming model Redundancy modelPower budget
System evolving over time
Aircraft Industry Experience
9
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
OutlineSources of unreliability in software-reliant systemsAn architecture-centric approach to software-reliant system engineeringA framework for reliability validation & improvement
10
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
SAE Architecture Analysis & Design Language (AADL) Captures Architecture for Analysis
The System
Computer SystemHardware & OS
Physical platformAircraft
ControlGuidance
Deployed on
Physical connectivity
AADL focuses on software runtime architecture and its interaction with physical systems & computer systemsIncludes execution & communication, timing, behavior, error, partition (incl. ARINC653), data modeling, deployment semantics
Embedded Application SoftwareFlight control & Mission The Software
Focus on software runtime architecture
The Computer System
11
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
AADL: The LanguagePrecise execution semantics for components• Thread, process, data, subprogram, system, processor, memory, bus, device,
virtual processor, virtual bus
Continuous control & event response processing• Data and event flow, synchronous call/return, shared access• End-to-End flow specifications
Operational modes & fault tolerant configurations• Modes & mode transition
Modeling of large-scale systems• Component variants, layered system modeling, packaging, abstract,
prototype, parameterized templates, arrays of components and connection patterns
Accommodation of diverse analysis needs• Extension mechanism, standardized extensions
12
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Single Truth through Consistency Across Analysis Models
SecurityIntrusionIntegrity
Confidentiality
Safety & Reliability
MTBFFMEAHazard
analysis
Real-timePerformance
Execution time/Deadline
Deadlock/starvationLatency
ResourceConsumption
BandwidthCPU time
Power consumption
Data precision/accuracyTemporal
correctnessConfidence
Data Quality
Embedded System Architecture Model
In SAE AADL
Achieved by Single Source Annotated Architecture Model with Well-defined Semantics
Auto-generation of analytical models &
implementation
Interaction Behavior
Operational modes
Protocols
13
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
AADL Standard Suite & Industry Initiatives
SAE AADLStandard2004/2009
Automotive
OSATE ToolsetSEI
STOOD ElliDiss
AADL Meta Model & XMIJune 2006
AADL Error Annex Standard
June 2006
Avionics
Aerospace
TOPCASEDOpen Source EmbeddedSystems Tool Framework
28 partners €20+M 2005-2009
ITEA SPICESModel-Driven Embedded
Systems Engineering15 partners €16M 2006-2009
AVSI SAVIAnalysis-based System Validation12+ partners $80+M 2008-2014
EC ASSERTProof-based Satellite
Architectures ESA + 30 partners€15M 2004-2007
OMG MARTE2005-2009
EAST ADLConsortium
AutoSAR
IST ARTIST2Embedded SystemsCenter of Excellence
2007-2011
OpenGroupReal-Time ForumEU + US partners
AADL UML MARTE Profile
2009
AADL BehaviorAnnex2009
AADL ARINC653Annex2009
AADL Data Modeling Annex
2009
Medial devices
Autonomous systems
AADL Ada/C Code Generation
Annex Standard2006/2009
www.aadl.infowiki.sei.cmu.edu/aadl
14
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
The JPL Mission Data System
A reference architecture autonomous space systems• To be instantiated for different applications
An embedded systems architecture• Consists of physical system, computing hardware, application software
A control systems architecture• Feedback loops in application architecture
• Feedback loops in data management architecture
A multi-layered architecture• From low-level control loops to goal-oriented planning and plan execution
15
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Mission Data System (MDS) Architecture*
* M. Bennett, R. Knight, R. Rasmussen, M. Ingham, “State-Based Models for Planning and Execution, 2006-08-11.
Closed loopGoal-Directed Explicit modelsSeparation of ConcernsIntegral Fault Protection
16
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
AADL Model of the MDS Reference Architecture
Textual & Graphical Representations
Excerpt from the Textual Specification:process implementation MDSControlSystem.basic
subcomponents
GoalPlanner: thread group ControlSoftware::GoalPlanner;
GoalExecutive: thread group ControlSoftware::GoalExecutive;
GoalMonitor: thread group ControlSoftware::XGoalMonitor;
StateEstimation: thread group ControlSoftware::estimator;
StateControl: thread group ControlSoftware::controller;
OperatorConsole: thread group ControlSoftware::OperatorConsole;
Focus on Information Flow
17
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Reference Architecture Instantiation
Instantiation of reference architecture
through refinement of AADL model
System Under Control
System Under Control
Heater 1
Switch 1Actuator
Switch 1command
Temperaturemeasurement
TemperatureSensor
Switch 2Actuator
Heater 2
Switch 2command
PS2
PS1
+
-
Camera
Deployment on different
computing hardware platforms
18
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Temperature Control Loop Latency & Jitter
flow path
Use of immediate & delayed connections to achieve deterministic sampling
19
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Latency Contributors
System & control engineering view:• Processing latency, sampling latency, physical signal latency
Embedded software systems engineering view:• Processor speed, scheduling protocol, sampled & queued message
processing, resource contention, communication protocol & physical communication delays, legacy shared variable communication, rate group optimization, partitioned architecture, deployment changes
20
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
MDS Mission Planning & Plan Execution
State Analysis framework for mission modeling (CalTech/JPL)Represent planning & plan execution tasksRepresent goal-based fault management
See alsoApplication of a Safety-Driven Design Methodology to An Outer Planet Exploration Mission by Brandon D. Owens, Margaret Stringfellow Herring, Nicholas Dulac, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. IEEE Aerospace Conference, Big Sky, Montana, March 2008.
21
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Workload Analysis of Goal Network
Goal network interpretation & analysis in terms of modes• Starting set: tasks with no predecessor• Active set between synchronization points• Generate System Operation Mode (SOM) sets• Perform mode specific scheduling analysis
22
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
OutlineSources of unreliability in software-reliant systemsAn architecture-centric approach to software-reliant system engineeringA framework for reliability validation & improvement
23 Approved for public release; distribution unlimited. Review completed by the AMRDEC Public Affairs Office 20 Apr 2010; FN4596.
WIP: SEI Reliability Validation Framework
SEI researching approach to establish an industry standard practice of reliability validation and qualification for software-reliant mission-critical systems by:
Establishing an engineering framework for reliability validation & improvement
• Integrates state-of-the-art technologies including formalized functional and non-functional requirements
• System-focused safety analysis • Architecture-centric model-based
analysis• Assurance cases • Formalized static analysis to
complement current build-then-test practice
Demonstrating its feasibility of reliability validation via model-based architecture analysisProposing a set of metrics for cost-effective reliability evaluation & improvement
In collaboration with AVSI System Architecture Virtual Integration (SAVI) Initiative (Thursday talk)
24
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Reliability Validation & Improvement Framework
Model Repository
Mission Requirements
FunctionBehavior
Performance
Safety-criticality Requirements
ReliabilitySafety
Security
Incremental System Requirements Validation and Verification
Software AssuranceJustified confidence that mission & safety-
criticality requirements (claims) are metEvidence through reviews, analysis, testing,
and validated assumptions
From System Requirements to Software RequirementsFormalized requirementsFocus on safety-criticality
requirements
Architecture Model
Component Models
System Implementation
Static AnalysisFormal methods to complement testing
End-to-end V&V of mission and safety-criticality
requirements
Resource & Performance
Analysis
Reliability & Safety Analysis
Mode & Interaction Behavior Analysis
Architecture-centric Model-based EngineeringArchitecture model with well-defined semantics
(AADL)Consistency across analysis dimension
25
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Confidence through Argument and EvidenceAn assurance case requires claims, evidence, and an argument linking evidence to claims:• Claim
• Assertion that a design or implementation meets a requirement
• Evidence– Results of observing, analyzing, testing,
simulating, and estimating the properties of a system
• Argument– Explanation of how the decomposition of claims
and available evidence demonstrates compliance with mission & safety-criticality requirements
• Confidence through credible evidence and effective verification methods– Effectiveness of reviews, testing, analysis,
simulation– Reuse of claim & evidence patterns
Claim
Sub-claim 1 Sub-claim 2
Sub-claim 3 Sub-claim 4
Evidence Evidence Evidence
Arg
umen
t
Roots in Safety Cases (Kelly)UK Defense Std 00-56: Safety MgntRequirements for Defense Systems
Compositional certification (Rushby)Composability, compositionality,
emergence of propertiesPartitioning as key to compositional
reasoningAutomation demonstrated in System Verification
Manager project by B. Krogh et.al. (MoBIES)
26
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Environment
Requirements for Operational Systems
System
Behavior
State
Resources
Input Output
Constraints/Controls
Operational system process: a set of behaviors by execution of functions to transform input into output utilizing state, respecting constraints/controls, requiring resources, to meet a defined mission in a given environment. [Association Française d'Ingénierie Système (AFIS)]
Requirement specification consists of:
Mission requirements that specify required behavior under nominal conditions for a given time (functionality, behavior, and performance)
Safety-criticality requirements that specify the ability to operate under anomalous conditions (reliability, safety, and security). Behavior
Performance
Mission Requirements
Function
Safety
Security
Safety-criticality Requirements
Reliability
Requirements and ArchitectureTraceable decomposition of system requirements to system & software
components to assure coverage
Formalized specification of error propagation and fault management
via AADL Error Model Annex
27
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Archetype-based Fault & Hazard Identification Application interaction architecture patterns• Feedback control system• Data, event, message, command streams• State-based interaction protocols• Multi-tier service layers
Fault management architecture patterns• Redundancy• Monitoring & recovery• Partitions
Pre-analyzed architecture patterns enable analysis of potentially high-
risk safety-criticality areas
Application as well as fault management architecture patterns
have fault & hazard potential
Fault & hazard types in common architecture patterns as starting point for FHA, FTA, FMEA, root cause analysis, and IV&VExample: Partitions limit error propagation
to input/output errors [Rushby]
Example: Potential hazard areas in feedback loop [Leveson]
28
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
SoftwareArchitectural
Design
SystemDesign
ComponentSoftwareDesign
CodeDevelopment
UnitTest
SystemTest
Integration Test
Acceptance Test
Top-Level Verification Items
High-levelAADL Model
DetailedAADL Model
Specify Model-Code Interfaces
Predictive validation
Sensitivity analysis for uncertaintyRequirementsEngineering
Continuous V&V through Virtual Integration
→ generation of test cases← updating models with actual data
Verification Evidence
Confidence in implementation
Analysis-based validation & verification to support and complement testing.
29
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Metrics for Cost-effective Reliability Improvement
Metrics for identifying high-risk areas: unreliability contributors• High complexity architecture interaction patterns & root cause problem areas• Faults, hazards, assumption coverage & their impact potential• Independently maintained overlapping models• New faults & hazards due to technology insertion and paradigm shifts
Metrics for identifying effective V&V: reliability contributors• Mission & safety-criticality requirement decomposition & coverage• Effectiveness of reviews, analysis, testing, … for a given claim• Effectiveness of fault management & residual fault monitoring• Evidence of assumption validation & end-to-end V&V
Maximize risk reduction through cost-effective reliability validation• Focus on high risk areas first – taking into account criticality & impact severity• Take into account cost & effectiveness of V&V methods and fault management
30
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Summary
Software-reliant system engineering and qualification requires a paradigm shift from build-then-test to model-based engineering
Architecture-centric modeling with strong semantics as found in the SAE AADL standard is key to analysis and validation of system-level mission and safety-critical properties throughout development life cycle
Qualification requires assurance through justified confidence in evidence that mission and safety-criticality requirements are met
Aerospace industry is advancing a virtual system architecture integration approach centered around the AADL Meta model & semantics and a model repository & model bus concept
31
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
Peter H Feiler
32
S5 Challenges & Approach to Software-reliant System QualificationJune 15, 2010© 2010 Carnegie Mellon University
NO WARRANTYTHIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING
INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS" BASIS. CARNEGIE MELLONUNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED ORIMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OFFITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTSOBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOESNOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROMPATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
Use of any trademarks in this presentation is not intended in any way to infringe onthe rights of the trademark holder.
This Presentation may be reproduced in its entirety, without modification, and freelydistributed in written or electronic form without requesting formal permission. Permission isrequired for any other use. Requests for permission should be directed to the SoftwareEngineering Institute at [email protected].
This work was created in the performance of Federal Government ContractNumber FA8721-05-C-0003 with Carnegie Mellon University for the operation of the SoftwareEngineering Institute, a federally funded research and development center. The Governmentof the United States has a royalty-free government-purpose license to use, duplicate, ordisclose the work, in whole or in part and in any manner, and to have or permit others to doso, for government purposes pursuant to the copyright license under the clause at 252.227-7013.