Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
© 2015 Carnegie Mellon University
Layered Assurance Workshop 2015Dec 8, 2015
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Virtual Integration and Incremental Assurance of Critical SystemsPeter H. Feiler
2Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Copyright 2015 Carnegie Mellon University
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
[Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.
This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at [email protected].
Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.
DM-0003120
3Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Challenges and Four Pillar Strategy for Critical Software Systems
Virtual System Integration
Software Hazards and Vulnerabilities
Incremental Lifecycle Assurance
Agenda
4Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
We Rely on Software for Safe Aircraft Operation
Embedded software systems introduce a new
class of problems not addressed by traditional system safety analysis
5Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Software Problems not just in Aircraft
How do you upgrade washing machine software?
How do you prevent your engine from cheating?
6Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
High Fault Leakage Drives Major Increase in System Cost
5x
SoftwareArchitectural
Design
SystemDesign
ComponentSoftwareDesign
CodeDevelopment
UnitTest
SystemTest
Integration Test
Acceptance Test
RequirementsEngineering
300-1000x
Where faults are introducedWhere faults are foundThe estimated nominal cost for fault removal
20.5%
1x
20%, 16%
10%, 50.5%
0%, 9% 80x
70%, 3.5% 20x
Sources:
NIST Planning report 02-3, The Economic Impacts of Inadequate Infrastructure for Software Testing, May 2002.
D. Galin, Software Quality Assurance: From Theory to Implementation, Pearson/Addison-Wesley (2004)
B.W. Boehm, Software Engineering Economics, Prentice Hall (1981)
70% Requirements & system interaction errors 80% late error
discovery at high repair cost
80% late error discovery at high
repair cost
80% late error discovery at high
rework cost
Aircraft industry has reached limits of affordability due to exponential growth in SW size and complexity.
Major cost savings through rework avoidance by early discovery and correction
A $10k architecture phase correction saves $3M
Software as % of total system cost1997: 45% → 2010: 66% → 2024: 88%
Post-unit test software rework cost 50% of total system cost and growing
7Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Mismatched Assumptions in System Interactions
System Engineer Control Engineer
SystemUnder Control
ControlSystem
Physical Plant CharacteristicsLag, proximity
Operator ErrorAutomation & human actions
Syst
em U
ser/E
nviro
nmen
t
HazardsImpact of
system failures Application D
eveloper
ComputePlatform
RuntimeArchitecture
ApplicationSoftware
Embedded SW System Engineer
Data Stream Characteristics
Latency jitter affects control behavior
Potential event loss
Measurement Units, value range Boolean/Integer abstraction
Air Canada, Ariane, 7500 Boolean variable architecture
Concurrency Communication
ITunes crashes on dual-cores
Distribution & RedundancyVirtualization, load balancing,
mode confusion
HardwareEngineer
Embedded software system as major source of hazards
Why do system level failures still occur despite fault tolerance techniques being deployed in systems?
8Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Model-based Engineering Pitfalls
The system
System models
System implementation
Inconsistency between independently developed
analytical models
Confidence that model reflects implementation
This aircraft industry experience has led to the System Architecture Virtual Integration (SAVI) initiative
9Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Awareness of Requirement QualityTextual requirement quality statistics
• Current requirement engineering practice relies on stakeholders traceability and document reviews resulting in high rate of requirement change
NIST Study
System to SW requirements gap [Boehm 2006]
How do we trace low level SW requirements against system requirements?
When StartUpComplete is TRUE in both FADECs and SlowStartupComplete is FALSE, the FADECStartupSW shall set SlowStartupInCompleteto TRUE
10Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Assurance & Qualification Improvement Strategy
2010 SEI Study for AMRDEC Aviation Engineering Directorate
Assurance: Sufficient evidence that a system implementation meets system requirements
Architecture-centric Virtual System
Integration
Model RepositoryArchitecture
Model
Component Models
System Implementation
Resource, Timing &
Performance Analysis
Reliability, Safety,
Security Analysis
Operational & failure modes
Static Analysis & Compositional
Verification
System configuration
Early Problem Discovery through Virtual System Integration & Analysis
Incremental Assurance Plans & Cases
throughout Life Cycle
Mission Requirements
FunctionBehavior
Performance
Survivability Requirements
ReliabilitySafety
Security
Architecture-led Requirement Specification
Improved Assurance through Better Requirements & Automated Verification
11Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Assure the System
Improved Cost, Time and Quality
80% Post Unit Test Discovery
70% Defect Introduction Reduced Cost and Time through Early Discovery
Improved Quality through Better Requirements & Evidence
12Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Challenges and Four Pillar Strategy for Critical Software Systems
Virtual System Integration
Software Hazards and Vulnerabilities
Incremental Lifecycle Assurance
Agenda
13Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
SAE Architecture Analysis & Design Language (AADL) to the Rescue
Distributed Computer Platform
Physical system
Command & Control
Deployed on
Physical interface
AADL Addresses Increasing Interaction Complexity and Mismatched Assumptions
Task & Communication Architecture
SW Design Architecture
14Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
SAE AADL Standard Suite (AS-5506 series)Core AADL language standard (V2.1-Sep 2012, V1-Nov 2004)
• Strongly typed language with well-defined semantics• Textual and graphical notation• Standardized XMI interchange format
Standardized AADL ExtensionsError Model language for safety, reliability, security analysis
ARINC653 extension for partitioned architecturesBehavior Specification Language for modes and interaction behavior
Data Modeling extension for interfacing with data models (UML, ASN.1, …)Guidance for runtime executive generation
AADL Extensions in ProgressRequirements Definition and Assurance Language
Synchronous System Specification LanguageHybrid System Specification Language
System Constraint Specification Language
15Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
AADL: The LanguagePrecise execution semantics for components
• Thread, process, data, subprogram, system, processor, memory, bus, device, virtual processor, virtual bus
Continuous control & event response processing• Data and event flow, call/return, shared access• End-to-End flow specifications
Operational modes & fault tolerant configurations• Modes & mode transition
Modeling of large-scale systems• Component variants, layered system modeling, packaging, abstract,
prototype, parameterized templates, arrays of components, connection patterns
Accommodation of diverse analysis needs• Extension mechanism, standardized extensions
16Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
System Level Problem AreasTime-sensitive Data Stream Assumptions
• Stream miss rates, Mismatched data representation, Latency jitter & agePartitions as Isolation Regions
• Competing demands by security and safety• Space and time partitioning for processors and networks• Isolation not guaranteed due to undocumented resource sharing
Virtualization of Resources• Logical vs. physical redundancy• Virtualization of time
Inconsistent System States & Interactions• Modal systems with modal components• Concurrency & redundancy management• Application level interaction protocols
Resource guarantees• Performance impedance mismatchess• Unmanaged system resources
Operational and failure modesInteraction behavior specification
Dynamic reconfigurationFault detection, isolation, recovery
End-to-end latency analysisPort connection consistency
Process and virtual processor to model partitioned architectures
Resource allocation & deployment configurationsResource budget analysis
& scheduling analysis
Virtual processors & busesMultiple time domains
Codified in Virtual Upgrade Validation method
17Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Well-defined Execution Semantics
SAE AADLFocus on Architecture Abstraction•Thread execution• Communication timing• Operational modes & architecture reconfiguration
OMG MARTE
AADL Execution &Communication Model
SynchronousLanguages
RavenScarComputational
Model
Clocks&Timers
OMG MARTEFocus on implementation• Timers to trigger task execution• Send/receive operations• Behavioral states and transitions
18Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Partitioned Run-Time Architecture
Strong Partitioning • Timing Protection
• OS Call Restrictions
• Memory Protection
Interoperability/Portability• Tailored Runtime Executive
• Standard RTOS API
• Application Components
Real-Time Operating System
Application Software
Component
Embedded Hardware Target
AADL Runtime System
Application Software
Component
Application Software
Component
Application Software
Component
AADL Model
Auto-generated task & communication code
from AADL model
Application SW inserted into runtime system
Runtime system performs communication for app
Runtime system dispatches app code
AADL ARINC653 Annex aligned with latest ARINC653 Standard
Virtual processor and virtual bus concepts
19Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Analysis of Virtually Integrated Software Systems
Security• Intrusion• Integrity• Confidentiality
Safety & Reliability• MTBF• FMEA• Hazard analysis
Real-timePerformance• Execution time/Deadline • Deadlock/ starvation• Latency
ResourceConsumption• Bandwidth• CPU time• Power consumption
• Data precision/accuracy• Temporal correctness• Confidence
Data Quality
Architecture Model
Single Annotated Architecture Model Addresses Impact Across Operational Quality Attributes
Auto-generated analytical models
Change of Encryption from 128 bit to 256 bit
Higher CPU demandIncreased latency
Affects temporalcorrectness
Potential new hazard
20Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Multi-tier system & software architecture (in AADL) Incremental end-to-end verification of system properties
LRU/IMA System: (Tier 2)Hardware platform, software partitionsPower, MIPS, RAM capacity & budgetsEnd-to-end flow latency
Subcontracted software subsystem: (Tier 3)Tasks, periods, execution timeSoftware allocation, schedulabilityGenerated executables
OEM & Subcontractor:Subsystem proposal validationFunctional integration consistencyData bus protocol mappings
Incremental Multi-Tier Assurance in SAVI
Proof of Concept Demonstration and Transition by Aerospace industry initiative• Architecture-centric model-based software and system engineering• Architecture-centric model-based acquisition and development process• Multi notation, multi team model repository & standardized model interchange
Aircraft: (Tier 0)
Repeated Virtual Integration Analyses:Power/weightMIPS/RAM, SchedulingEnd-to-end latencyNetwork bandwidth
System & SW Engineering:Mechatronics: Actuator & WingsSafety Analysis (FHA, FMEA)Reliability Analysis (MTTF)
Aircraft system: (Tier 1)Engine, Landing Gear, Cockpit, …Weight, Electrical, Fuel, Hydraulics,…
21Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Latency Sensitivity in Control Systems
System Engineer Control Engineer
System
Under
Control
Control
System
Operational
Environment
Common latency data from system engineering
• Processing latency• Sampling latency• Physical signal latency
Impact of Scheduler Choice on Controller Stability
A. Cervin, Lund U., CCACSD 2006
22Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Software-Based Latency Contributors
Execution time variation: algorithm, use of cacheProcessor speedResource contentionPreemptionLegacy & shared variable communicationRate group optimizationProtocol specific communication delayPartitioned architectureMigration of functionalityFault tolerance strategy
23Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Latency analysis throughout life cycle• Functional & system architecture: latency budgets• Task & communication architecture: processing, sampling, transfer• Platform architecture: partitions, protocols, computer hardware
Latency contributors• Systems: processing, sampling, queuing latency• Connections: protocol overhead, physical transfer, sampling• Partitions: sampling, window schedule
Trade studies• Best-case & worst-case, latency jitter• Mid-frame and frame-delayed communication• Synchronous and asynchronous systems• Partition end and major frame output policy• Empty & full queue• Data set processing
Top-down & bottom-up• Latency budgets & rate, size, time based actuals
Incremental Latency Analysis
Utilizes end-to-end flows Incremental refinement
Interprets deployment bindingsOperational mode specific analysis
24Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Detailed Latency Analysis Reports
25Virtual Integration and Assurance: Impact on FVL July 14, 2015© 2015 Carnegie Mellon University
Finding Problems Early
Issue: Contractor could not assess integration risk early enough.
Action: 6 Week Virtual Integration identified 20 major issues.
Result: Adjusted CDR Schedule to remediate.
• Prevented 12 month delay in a 2 year project.
The current method would not have identified the issues until 3 months before delivery
Image: http://www army mil/
System Architecture Virtual Integration (SAVI) 2008-Proof of concept with AADL led to ten year commitment
SAVI ROI Study (2009/10) $2B savings on $10B aircraft through 33% early detection
International Commercial Aircraft Industry Consortium
2014/15 Virtual Integration Shadow led to early discovery of 85+ potential integration issues
Led to acceleration of adoption by JMR contractors andinclusion in RFP for FY16/17 projects
Architecture-centric Virtual Integration Practice (ACVIP)
26Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
SAE AADL Standard & AADL Workbench: Research Transition Platform
2004 2016
Army and other Government Shadow Projects
CommonAvionics
ArchitectureSystem
ApacheBlock IIIATAM CH47F
Health Monitor
JPLMission Data
System
DARPAMetaHACME
AADLError Model
US & European Research Initiatives
EuropeanCommissionSLIM/FIACRE
DARPAMETA
DARPAHACMSSecurity
Other Standards and Regulatory Guidance
OMG MARTE
EmbeddedSystems
ARINC653Partitions
Regulatory GuidanceNRC, FDA, UL
Avionics Network Standards
System Safety Practice Standards
System Architecture Virtual Integration (SAVI) Software & Systems Engineering
AADLSoftware & System
Co-engineeringRequirements
AssuranceMulti-team
Safety
JMR TD: ACVIP Shadow Projects
Future Vertical Lift
Virtual System Integration System Assurance
Architecture-centric Acquisition
Towards an Architecture-Centric Virtual Integration Practice (ACVIP)
27Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Challenges and Four Pillar Strategy for Critical Software Systems
Virtual System Integration
Software Hazards and Vulnerabilities
Incremental Lifecycle Assurance
Agenda
28Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Safety Practice in Development Process Context
Labor-intensiveEarly in system engineeringRarely repeated due to cost
Focus on System Engineering Largely Ignores Software as Hazard Source
Leveson (MIT) Socio-technical Control Framework based on Rasmussen (NASA) model of risk management
Multiple hazard contributors in development and operational context
Safety analysis automation through virtual system integration allows for use at all layers
and throughout life cycle.
29Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
AADL Error Model Scope and Purpose
System safety process uses many individual methods and analyses, e.g.• hazard analysis• failure modes and effects analysis• fault trees• Markov processes
Related analyses are also useful for other purposes, e.g.• maintainability• availability• Integrity
Goal: a general facility for modeling fault/error/failure behaviors that can be used for several modeling abstractions and analyses.
SAE ARP 4761 Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment
Annotated architecture model permits checking for consistency and completeness between these abstractions.
System
Component
Subsystem
Capture FMEA model
Capture hazards
Capture risk mitigation architecture
30Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Error Propagation PathsError Model V2: Abstraction and RefinementFour levels of abstraction:
• Focus on fault interaction with other components• Probabilistic error sources, sinks, paths and transformations• Fault propagation and Transformation Calculus (FPTC) from York U.
• Focus on fault behavior of components• Probabilistic typed error events, error states, propagations• Voting logic, error detection, recovery, repair
• Focus on fault behavior in terms of subcomponent fault behaviors• Composite error behavior state logic maps states of parts into
(abstracted) states of composite • Types of malfunctions and propagations
• Common fault ontology• User definable types
31Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Fault Propagation Taxonomy and Guide Words
Key Taxonomy TermsService: Omission/CommissionTiming: Early/late arrivalValue: Out of range/incorrect valueStream: High/low/variable rateReplication: Assymetric value errorConcurreny: race conditionAuthentication/authorization
Guide WordsMissing sensor reading/command
Loss of powerEarly/late command, feedback delay
Inaccurate measuresCommand value out of rangeCommanded volume too high
32Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Consistency in Error Propagation
Component ANoData
NoDataBadData
ProcessorMemory
Bus NoResource
P2P1
LateDataBadData
Component BNoData
ProcessorMemory
Bus NoResource
P2P1
BadData
Mismatched fault propagation and containment assumptionsDiscovery of unhandled error
propagations.Component A
Outgoing Propagation
P2
Not propagated
Component BP1
Not propagated Propagated
Propagated
Not propagatedPropagated
Not propagated
PropagatedUnspecified
UnspecifiedPropagatedUnspecified
Unspecified
Unspecified
IncomingPropagation
Contract Assumption
Present and absent outgoing and incoming error propagations
Error sources, paths, and sinks (FPTC)Connections as error sources
33Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Software Induced Flight Safety Issue
Auto Pilot
FMSProcessorOperational
Failed
Flight Mgnt System
Anticipated: No Stall Propagation
FMS Power
AirspeedData
Failed
ActuatorCmd
StallNoService
Anticipated: NoService
OperationalNoData
EGI
Oper’l
Failed
Anticipated:No EGI data
NoData
Original Preliminary System Safety Analysis (PSSA) System engineering activity with focus on failing components.
34Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Unhandled Hazard Discovery through Virtual Integration
Auto Pilot
FMSProcessorOperational
Failed
Flight Mgnt System
Response to corrupted airspeed causes stall
FMS Power
AirspeedData
Failed
ActuatorCmd
StallNoService
CorruptedData
Corrupted data showsairspeed of 2000 knots
OperationalNoData
EGI
EGI HW
EGI Logic
Oper’lFailed
Oper’l
Failed
Corrupted
Virtual integration of architecture fault models recording SIL test observations detects unhandled fault.
Vibration causes data corruption through
touching boards
35Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Automation of SAE ARP4761 System Safety Assessment Practice
& EMV2FHASpreadsheet
Uses error sources
FTACAFTA, OpenFTA
Uses compositeerror behavior
Markov ChainPRISM
Uses error flows & behavior
FMEASpreadsheet
Uses error flows &propagations
RBD/DDOSATE plugin
Uses composite error behavior
36Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Automated FMEA ExperienceFailure Modes and Effects Analyses are rigorous and comprehensive reliability and safety design evaluations
• Required by industry standards and Government policies• When performed manually are usually done once due to cost and schedule• If automated allows for
• multiple iterations from conceptual to detailed design• Tradeoff studies and evaluation of alternatives• Early identification of potential problems
Largest analysis of satellite to date consists of 26,000 failure modes• Includes detailed model of satellite bus• 20 states perform failure mode• Longest failure mode sequences have 25 transitions (i.e., 25 effects)
36
Myron Hecht, Aerospace Corp.Safety Analysis for JPL, member of DO-178C committee
37Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Secure Mathematically-Assured Composition of Control Models
Technical Approach• Develop a complete, formal architecture model for UAVs
that provides robustness against cyber attack
• Develop compositional verification tools driven from the architecture model for combining formal evidence from multiple sources, components, and subsystems
• Develop synthesis tools to generate flight software for UAVs directly from the architecture model, verified components, and verified operation system
Accomplishments• Created AADL model of vehicle hardware & software
architecture• Identified system-level requirements to be verified
based on input from Red Team evaluations• Developed Resolute analysis tool for capturing and
evaluating assurance case arguments linked to AADL model
• Developed example assurance cases for two security requirements
• Developed synthesis tool for auto-generation of configuration data and glue code for OS and platform hardware
ARCHITECTURE-CENTRIC PROOF
TA4 – Research Integration and Formal Methods Workbench
Rockwell Collins and University of Minnesota
Key ProblemMany vulnerabilities occur at component interfaces. How can we use formal methods to detect these vulnerabilities and build provably secure systems?
Contract-based Compositional Verification
16 months into the project
Draper Labs could not hack into the system in 6 weeks
Had access to source code
Increasing Proactive Architecture-focused Security Research
Research Technologies Originally Focused on Safety
38Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Challenges and Four Pillar Strategy for Critical Software Systems
Virtual System Integration
Software Hazards and Vulnerabilities
Incremental Lifecycle Assurance
Agenda
39Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Incremental Lifecycle Assurance Objectives
Measurably improve critical system assurance through
• Better requirement coverage and managed uncertainty• Incremental analytical verification throughout lifecycle• Focus on high payoff areas
40Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Requirements & Architecture Design Constraints
We have effectively specified a system partial architecture
Textual Requirements for a Patient Therapy System
Importance of understanding system boundary
U Minnesota Study
Same Requirements Mapped to an Architecture Model
41Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Awareness of Requirement Change UncertaintyManaged awareness of requirement uncertainty reduces requirement changes by 50%
• 80% of requirement changes from development team
• Expert assessment of change uncertainty• Focus on high uncertainty and high importance
areas• Engineer for inherent variability
Rolls Royce Study
42Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
VAVAVA
Compositional Verification
RS RS RS
Design & ReqRefinement
VAVAVA
Compositional Verification
Contract-based Compositional Verification
RS
RS RS RS
Design & ReqRefinement
RequirementCoverage
Incremental assurance through virtual system integration for early discovery
Early Discovery leads to Rework Reduction
Priority focused architecture design exploration for high payoff
Timing (H)Performance (M)Safety (H)Security (L)Reliability (L)Modifiability (L)Portability (M)Configurability (M)
Three Dimensions of Incremental Assurance
Auto-generated Assurance Cases
43Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Three Dimensions of Requirement Coverage
GuaranteesAssumptions
Implementation constraints
Invariants
Exceptionalconditions
System interactions, state, behavior Design & operational quality attributes
System Under Control
Behavior
Actuator Sensor
State
Control System
Behavior
Output InputStateValue errors
Timing errors
Rate errors Concurrency errors
Replication errors
Sequence errors
Omission errors
Commission errors
Authentication errors
Authorization errors
Fault Propagation Ontology
Fault impact & contributors
Leveson STAMP pattern
44Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Stakeholder Goals
Tier 0
Tier 1
Tier 2Model+2’
Ver Plan
Ver Plan
Req+2Ver Plan
Req+1
Req
Model+1
Model
Automated Incremental Assurance Workbench
Abstraction
Level
Low LevelClose to Implementation
High Abstraction
Model+2
Assurance Case
Identify Assurance Hotspots throughout Lifecycle
Code
Code
45Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Integration of Concepts/Meta models
GoalsRequirements
GORE/KAOS
Claims & Arguments
Assurance Results
ResoluteAgree
VerificationActivities
VerificationMethods
RDAL
SACM
Architecture specification
Architecture instance
AADLHazards
Obstacles
EMV2
SVM
46Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Assurance Automation and MetricsAssurance Automation
• Compositional verification plans, priority-focused assurance case configurations• Multi-valued verification results (success, fail, timeout, error)
Measurement based assurance hotspot identification throughout lifecycle• Requirement coverage measures• Weighted claims, verification methods & results• Priority measures: Change uncertainty, safety/security risk and esign assurance
levels
47Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Time-sensitive Auto-brake Mode Confusion SAVI case study
• Wheel braking system from SAE AIR6110 & real accidents• Can software related hazards be identified analytically?
Auto-brake mode selection by push button• Three buttons for three modes
Event sampling in asynchronous system setting• Dual channel COM/MON architecture• Each COM, MON unit samples separately
• Button push close to sampling rate results in asymmetric value error• Repeated button push does not correct problem• Operational work around (1 second push) is not fool proof
Avoidable complexity design issue• Concept mismatch: desired state as flag that is sampled• Better solution: State communication by multi-position switch
48Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Diagnostics and Redesign AssuranceStepper motor (SM) controls a valve
• Commanded to achieve a specified valve position• Fixed position range mapped into units of SM steps
• New target positions can arrive at any time• SM immediately responds to the new desired position
Safety hazard due to software design• Execution time variation results in missed steps• Leads to misaligned stepper motor position and control system states• Sensor feedback not granular enough to detect individual step misses
Software modeled and verified in SCADE
Full reliance on SCADE of SM & all functionality
Problems with missing steps not detected
Two Customer Proposed Solutions
Sending of data at 12ms offset from dispatch
Buffering of command by SM interface
No analytical confidence that the problem will be addressed
Software tests did not discover the issue
Time sensitive systems are hard to test for.
We utilized execution and communication timing semantics of AADL model, Error Model annex as diagnostic tool, and auto-generated assurance case from verification
evidence
49Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Analysis Results and SolutionArchitecture Fault Model Analysis
• Fault impact analysis identifies multiple sources of missed steps• Early arrival of step increment commands• Step increment command rate mismatch• Transient message corruption or loss
• Understanding of error cause• Guaranteed delivery and synchronous
system assumptions• Quantifying too early condition
• Complexity in architecture design• Impact of avoidable complexity• Communication of state change vs. state
Cx1.1aAccurately: to the nearest digital
step corresponding to theanalog % open requested by
the ECS. T is determined by theapplication using the SM.
C1.1The
SM_PCS accuratelypositions the SM to the
position requested by theECS as soon as possiblebut no longer than T ms.A
A1.1a
ECS requests % open for SMvia direct memory. SM_PCS
runs in a frame of length F. SMis commanded to a specific
position.
C3.1The likelihood of a SM
malfunction is lowenough for the
application
Ev7.1Test results
showingsufficient SM
reliability
IR4.2
If there is evidence ofsufficient SM reliability thenthe likelihood of malfunction
is tolerable for the application
R2.1
Unless there is amalfunction in the
SM
UM6.1Unless the SM
doesn't conform
R4.1Unless the likelihood
of malfunction wasincorrectly determined
Ev5.1
Data sheet for the SMshowing sufficient
reliability
N
R3.3Unless the SM_PCStakes longer than F to
command a newposition requested by
the ECS
Ev6.2Test resultsconsistent
with the datasheet claim
C4.4At the start of each
frame theSM_PCS issues acommand to the
SM
R3.2
Unless the SM isincapable of reachingany requested position
within time T-F
R2.2
Unless the SM doesn'treach any requested
position within time T-F
J
J2.2a
The request fromthe ECS can comein up to F before theSM_PCS issues its
command
N
Ev4.3Analysis of data sheet for
SM shows that it canmove to arbitrary
positions within time T-F
UM5.2
Unless the SMdoes not conformto the data sheet
N
R2.3Unless the SM_PCS
doesn't issue theappropriate commands to
the SM
C3.4
The SM_PCS asks the SM to moveto the location most recently
specified by the ECS
Ev5.3SM_PCS Codereview showing
the correctcalculation
R4.6
Unless the transmittedcommand is corrupted
Ev9.1Test results
showing the linkhas sufficient
reliability
R4.5
Unless the SM_PCSdoes not correctly
calculate the positionspecified by the ECS
C5.4The communications
link between theSM_PCS and the SM isof sufficient reliablity for
the application
R6.6
Unless the reliability of thelink was incorrectly
determined
IR2.4
If there is no SM malfunction, it iscapable of moving fast enough, andthe SM_PCS issues the appropriate
command, then the SM willaccurately be positioned to the
location requested by the ECS withintime T
Ev7.6
Data sheet and tests forthe link showing
sufficient reliability
IR6.7
If there is evidence of sufficientlink reliability then the likelihoodof corrputed data is tolerable for
the application
UM6.5
Unless the code does notexecute as specified
UC5.5Conditions in the
execution environmentinterfere with correct
execution
Cx2.3aAn appropriate
command is a SMposition calculated
from the percentagerequested by the ECS
UM8.1Unless the link
doesn'tconform
IR4.7
If the calculated position iscorrect and is not corrupted
during transmission, then theSM_PCS commands the SM to
move to the correct location
UM7.4
Hardware failureprevents correct
execution
N
Ev7.2
SM_PCS Unit Testshowing expected
behavior
UM6.3
Unless the reviewedcode is not thecurrent code
UM6.4
Unless the codereview overlooks an
error
UM7.3
The code iscompiled/assembled
incorrectly
IR5.4
If the SM initializes to thefully closed position, then
the SM_PCS is able toknow the location of the
SM at startup
UM7.3Unless the revieweddesign/code is not
the currentdesign/code
R5.9
Unless the codedoes not implement
the calculation asspecified
R3.9The number of stepsbetween the current
position and the desiredposition is computed
incorrectly
J
J2.2aThe request from
the ECS can comein up to F before theSM_PCS issues its
command
Ev3.2Data sheet for theSM showing that it
can move to anarbitrary positionwithin time T-F
R2.4Unless the SM_PCS
doesn't issue theappropriate commands to
the SM
R3.4The SM_PCS doesn'tknow the position of
the SM at thebeginning of each
frame
C4.7
At the start of each frame theSM_PCS issues a commandto move the SM min(|CurPos
- DesPos|,S) steps in theappropriate direction
R2.3
Unless the SM_PCSdoes not know the true
position of the SM
Cx1.1aAccurately: to the
nearest digital stepcorresponding to the
analog % openrequested by the ECS
IR5.8If S steps can be completed in a
frame, the SM_PCS task runs onceper frame, and the interarrival time
of commands is F, then the SMcompletely executes all of them
UM7.10
Unless the codereview overlooks an
error
UM7.4Unless the
reviewoverlooks an
error
Ev8.8Test results
showing the linkhas sufficient
reliability
R5.7
Unless the SM_PCSsends a new commandto the SM that overwritesan incompletely executed
previous command
IR5.11
If a code review and test resultsshow that the calculation is made
correctly then SM is moved thecorrect number of steps in the
correct direction
C4.4The SM_PCS knows
that the SM isinitialized to the fullyclosed position at
startup
Ev6.7SM_PCS codereview showing
the correctcalculation
R3.7
When desiredposition is S or fewer
steps away, theSM_PCS asks for
more or fewer stepsthan are needed
UM9.2Unless the
implementedclock does not
conform
Ev7.8
AADL model showingminimum inter-arrival
time >= F ms
R3.10
A corruptedcommand value isreceived by the SM
C3.1
The likelihood of a SMmalfunction is low
enough for theapplication
R3.8
The SM_PCS asksfor movement away
from the desiredposition
Ev8.4Scheduleshowing
SM_PCS isscheduled at a F
rate
UM4.3Unless the SM
does notconform to the
data sheet
R6.8
The code iscompiled/assembled
incorrectly
Ev6.3Design review and code
review of SM_PCSshowing that it keepsaccurate track of allcommands issued
UM8.6
Unless the model failsto capture informationaffecting inter-arrival
times
C4.8The communications
link between theSM_PCS and the SM isof sufficient reliablity for
the application
Cx2.3a
i.e., the CurPos"known" by theSM_PCS is not
consistent with reality
C7.5
The maximum number ofsteps (S) the SM_PCS
will ask the SM to move inany frame can be
completed in F
R3.6
When desired positionis more than S steps
away, the SM_PCS asksfor fewer than S steps
R5.5Unless there is noevidence that the
SM_PCS keeps track
Cx7.8aAADL model
takes intoaccount task
preemption andoverhead
R5.3Unless the SM doesnot initialize itself to
the fully closedposition
Ev8.1
Test resultsshowing that the SMinitializes to the fully
closed position
IR4.2If there is sufficient SM
reliability, then thelikelihood of malfunction is
tolerable for theapplication
UM6.1Unless the SM
doesn't conformto the data sheet
IR5.6
If a design review shows thatthe SM_PCS keeps track of
issued commands, then thereare grounds to believe that theSM_PCS keeps track of issued
commands
R4.1Unless the likelihoodof a malfunction was
incorrectly determined
Ev9.1SM datasheet and
calculations showingthat S steps can becompleted in F ms
R6.6The minimum
inter-arrival timebetween SM
commands is lessthan F
R6.5The time between
scheduled SM_PCStask initiations is
less than F
R5.12
Unless the reliability of thelink was incorrectly
determined
R8.3Unless S steps cannot
be completed in F
R7.6The task is
scheduled at arate different
from F
C1.1
The SM_PCS accurately positions
the SM to the position requestedby the ECS as soon as possible
but no longer than T ms.
Ev8.7
SM_PCS Unit Testshowing expected
behavior
IR2.5
If the SM doesn't malfunction and isfast enough when optimally
controlled, and the SM_PCS can keeptrack of where the SM is and issueproper commands, then the SM willbe positioned properly within time T
Ev5.2Test results
confirming thedata sheet claim
Ev6.11Data sheet and testsfor the link showingsufficient reliability
Ev7.1
Test results showingSM reliability conforms
to the data sheet
R2.1
Unless there is amalfunction in the
SM
IR5.13If there is evidence of
sufficient link reliability, thenthe likelihood of corrupted
data is tolerable for theapplication
R5.10
Unless the code does notexecute as specified
UM7.9
Unless the reviewedcode is not thecurrent code
UC6.10Conditions in the
execution environmentinterfere with correct
execution
Cx2.4aAn appropriate
command movestoward the desired
position as quickly aspossible
Ev6.2Data sheet for the SM
showing that itinitializes to the fully
closed position
IR8.2If calculations show that Ssteps can be completed ina frame, then S steps canactually be completed in a
frame
R3.3The SM_PCS doesn'tknow the position ofthe SM when it starts
up C4.6The SM completely
executes allcommands sent to it
by the SM_PCS
R7.7The hardware
clock drifts or isotherwiseinaccurate
A
A1.1a
ECS requests % open for SM via directmemory. SM_PCS runs in a frame of
length F. SM is commanded via number ofsteps to move in the frame and direction.
Maximum number of steps commanded ina frame is a precalculated constant S.
R6.9
Hardware failureprevents correct
execution
R6.4
The SM takes longerthan F to execute a
command
C4.5The SM_PCS computes theposition of the SM based on
its current computedposition and the commands
it sends to the SM
UM7.2Unless the SM
doesn'tconform
UM7.11Unless the link
does notconform
R2.2
Unless the SM is incapableof reaching the requestedposition within time T-F nomatter how commanded
Ev10.1Test results
showing that theclock is accurateenough for the
application
Ev5.1
Data sheet for theSM showing
sufficent reliability
R3.5
The SM_PCSasks for morethan S steps
Ev8.5Clock data
sheet showingsufficient
accuracy for theapplication
50Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Challenges in Software Reliant Systems
Four Pillar Improvement Strategy
Virtual System Integration
Incremental Lifecycle Assurance
Agenda
51Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Conclusion
Virtual system integration and analysis leads to early detection, error leakage reduction, and cost reduction
Incremental lifecycle assurance leads to reduced risk and increased confidence
Opportunity to make (re)certification more affordable through automation and priority-focused design and verification
52Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
ReferencesAADL Website www.aadl.info and AADL Wiki www.aadl.info/wikiBlog entries and podcasts on AADL at www.sei.cmu.eduAADL Book in SEI Series of Addison-Wesley http://www.informit.com/store/product.aspx?isbn=0321888944On AADL and Model-based Engineeringhttp://www.sei.cmu.edu/library/assets/ResearchandTechnology_AADLandMBE.pdfOn an architecture-centric virtual integration practice and SAVIhttp://www.sei.cmu.edu/architecture/research/model-based-engineering/virtual_system_integration.cfmOn an a four pillar improvement strategy for software system verification and qualificationhttp://blog.sei.cmu.edu/post.cfm/improving-safety-critical-systems-with-a-reliability-validation-improvement-frameworkWebinars on system verification https://www.csiac.org/event/architecture-centric-virtual-integration-strategy-safety-critical-system-verification and on architecture trade studies with AADL https://www.webcaster4.com/Webcast/Page/139/5357
53Virtual Integration and Incremental Assurance of Critical SystemsDec 8, 2015© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Contact Information
Peter H. FeilerPrincipal ResearcherRTSSTelephone: +1 412-268-7790Email: [email protected]
U.S. MailSoftware Engineering InstituteCustomer Relations4500 Fifth AvenuePittsburgh, PA 15213-2612USA
WebWiki.sei.cmu.edu/aadlwww.aadl.info
Customer RelationsEmail: [email protected] Phone: +1 412-268-5800SEI Fax: +1 412-268-6257