Upload
molly-hodges
View
221
Download
2
Tags:
Embed Size (px)
Citation preview
Principles for Safe and Automated Middleware Specializations for Distributed
Real-time Embedded Systems
Department of Electrical Engineering & Computer Science
Vanderbilt University, Nashville, TN, USA
Ph.D. Dissertation DefenseApril 2, 2012
Akshay Dabholkar [email protected]
www.dre.vanderbilt.edu/~aky
Research supported by NSF CAREER CNS# 0845789, Vanderbilt Discovery
http://www.dre.vanderbilt.edu/~aky/docs/Dissertation.pdf
2
Presentation Road Map
Motivation Overview of Solution Approach: Automated
Middleware Specialization Process Research Area Focus Safe Middleware
Adaptation for Real-Time Fault-Tolerance Research Contributions Concluding Remarks
Context: Distributed Real-time Embedded (DRE) Systems
Large-scale, system-of-systems Operation in resource-
constrained environments• Memory-constraints• Low Processor speeds, Power
availability• Component/Process/Processor failures
Stringent and simultaneous QoS demands• Efficient Resource Utilization• Timeliness• High Reliability
Examples• Intelligent Transportation Systems (ITS)• Inventory Control Systems• NASA’s Magnetospheric Multi-scale
mission (MMS)
3(Images courtesy: Google)
Overcoming Variability in DRE System Domain Concerns
Group Failover
Semantics
Maximize Throughpu
t
Resource Constrain
ed
Real Time Updates
Reconfigurable Conveyor Belt System
Intelligent Transportation System
Direct Application
Generalization
Development via General-purpose Middleware
• Feature-rich• Satisfies wide
range of DRE systems
• Uses extensible frameworks
• CORBA, .NET, J2EE, etc.Performance Requirements Reliability Requirements
Impediments to Using General-purpose Middleware General-purpose middleware
supports a wide range of DRE applications
However, individual DRE applications have streamlined requirements
Antagonistic Design Forces • Excessive features due to
wide applicability• Unnecessary overhead due
to high flexibility and configurability
• Moreover, focus is on horizontal decomposition into layers• Incurs time and space
overhead due to rigid layered processing
• Application concerns are tangled across middleware modularization boundaries
Preferred Approach to Overcome Generality in Middleware
Real-Time Fault
Tolerance
Maximize Throughpu
t
Resource Constrain
ed
Reconfigurable Conveyor Belt System
Intelligent Transportation System
Direct Application
Generalization
Specialization
Specific Application
Real Time Updates
Performance Requirements Reliability Requirements
Doesn’t mean develop
middleware from scratch
What is Middleware Specialization?
Resolves the tension between Generality and Specificity Creates specialized forms of middleware for each system by
Pruning away unnecessary features based on application requirements Augmenting application-specificity by embedding their semantics Optimizing performance by moving away from the rigid layered
processing by creating specialized processing paths Adapting at runtime to enable safe failure mitigation in real-time
Customized Middleware Stack Standards-based, General-
purpose, Layered Middleware Architecture
Specialization
Container
ClientOBJREF
in argsoperation()out args +
return
IDLSTUBS
ORBINTERFACE
IDLSKEL
Object Adapter
ORB CORE GIOP/IIOP/ESIOPS
Component(Servant)
Se
rvices
ProtocolInterface
ComponentInterface
ServicesInterface
DII
DSI
Taxonomy of Middleware Specialization Techniques
8
Dimensions can be combined to synthesize new specialization techniques
Overlapping dimensions share concepts e.g. MDE/AOP includes both feature pruning & augmentation and can be used for customization as well as tuning
Serves as a guideline for synthesis of tools for design, V&V, analysis of specializations
Three dimensional Taxonomy of Middleware
Specializations
How?
What?
When?
Taxonomy of specializations developed based on literature survey
Assessment of Taxonomy of Middleware Specialization Techniques
Group of techniques that perform a common function
9
DSMLs, Feature Diagrams, IDL used to capture application concerns (e.g., Bypass)
MDE, AOP used in generating rules for specialization (e.g., Bypass)
Pre-postulated, Just-in-time (e.g., Caesar, AFM, AspectOpenORB, JAsCO, PROSE, Abacus)
MDE (e.g., Modelware), Reflection (e.g., AspectOpenORB) used to deduce specialization
context
Augment, Prune middleware sources (e.g., Bypass, CIDE, AHEAD, FOCUS)
DSML, IDL used to map concerns to code artifacts (e.g., AHEAD, Bypass )
Taxonomy-induced Middleware Specialization Lifecycle
Taxonomy gives rise to a specialization lifecycle
10
DSMLs, Feature Diagrams, IDL used to capture application concerns (e.g., Bypass)
MDE, AOP used in generating rules for specialization (e.g., Bypass)
Pre-postulated, Just-in-time (e.g., Caesar, AFM, AspectOpenORB, JAsCO, PROSE, Abacus)
MDE (e.g., Modelware), Reflection (e.g., AspectOpenORB) used to deduce specialization
context
Transformation
Deduction
Specification
Generation
SpecializationLifecycle
Inference
Adaptation
Augment, Prune middleware sources (e.g., Bypass, CIDE, AHEAD, FOCUS)
DSML, IDL used to map concerns to code artifacts (e.g., AHEAD, Bypass )
Taxonomy-induced Middleware Specialization Lifecycle
Must deal with variability over the specialization lifecycle Multiple steps involved
11
Specify and Reason about the desired application features
Generate the middleware transformation rules that realize the specializations
Adapt middleware safely and predictably to changes in runtime conditions
Detect the context from the application models that drives specialization opportunities
Transformation
Deduction
Specification
Generation
SpecializationLifecycle
Inference
Adaptation
Transform middleware into specialized forms
Infer the specializations applicable to the context and actual middleware features desired
Ruling out Manual Middleware Specializations
Cumbersome to implement Not repeatable and reusable Lack proper structure or
process Difficult to
• maintain as middleware sources evolve over time
• guarantee their correctness• extend to other middleware
technologies
12
XML Schema Specialization
Rules
Foo (){ ….. ……. //hook …}
Middleware Developer
Application Devloper
Foo (){ ….. ……. …….. …}
Rule selection
Pre-processed Middleware
1
2
34
Specialized middleware codeE
volu
tio
n o
f s
pe
cial
iza
tio
ns
Proprietary, one-off solutions are insufficient
and expensive
FOCUS: A. Krishna et al., “Context-Specific Middleware Specialization Techniques …”,
EuroSys 2006
How to:• Devise an automated reusable, systematic, correct and maintainable
middleware specialization process ?
13
Presentation Road Map
Motivation Overview of Solution Approach: Automated
Middleware Specialization Process Research Area Focus Safe Middleware
Adaptation for Real-Time Fault-Tolerance Research Contributions Concluding Remarks
14
Research Synopsis
Feature Oriented Reverse Engineering based Middleware Specializations (FORMS)
• Coarse-grained Feature Pruning• Feature-oriented deduction of desired features• Prunes middleware sources using a novel reverse-
engineering algorithm• Provides Build Specialization
Generative Middleware Specializations (GeMS)• Fine-grained Feature Pruning• Automatically Deduces the specialization context• Identifies of specialization points by extending
FORMS• Generates source-to-source transformation
algorithms Generative Aspects for Fault-Tolerance
(GrAFT)• Fine-grained Feature Augmentation• Weaves reliability concerns in system artifacts• Provides model-to-text, model-to-code
transformations Safe Middleware Adaptation for Real-Time
Fault-Tolerance (SafeMAT)• Fine-grained middleware adaptation to
failures while maintaining safety, predictability and improving resource utilizations within the hard real-time constraints
Transformation
Deduction
Specification
Generation
SpecializationLifecycle
Inference
Adaptation
15
Research Synopsis
Feature Oriented Reverse Engineering based Middleware Specializations (FORMS)
• Coarse-grained Feature Pruning• Feature-oriented deduction of desired features• Prunes middleware sources using a novel reverse-
engineering algorithm• Provides Build Specialization
Generative Middleware Specializations (GeMS)• Fine-grained Feature Pruning• Automatically Deduces the specialization context• Identifies of specialization points by extending
FORMS• Generates source-to-source transformation
algorithms Generative Aspects for Fault-Tolerance
(GrAFT)• Fine-grained Feature Augmentation• Weaves reliability concerns in system artifacts• Provides model-to-text, model-to-code
transformations Safe Middleware Adaptation for Real-Time
Fault-Tolerance (SafeMAT)• Fine-grained middleware adaptation to
failures while maintaining safety, predictability and improving resource utilizations within the hard real-time constraints
Transformation
Deduction
Specification
Generation
SpecializationLifecycle
Inference
Adaptation
Challenge 1: Horizontal v/s Vertical Middleware Decomposition Middleware is traditionally decomposed
along the horizontal dimension into layers
However, applications only use a subset of features within each middleware layer
Application domains expect vertical decomposition along domain concerns
Moreover, application domain concerns are tangled across middleware modularization boundaries
It becomes hard to specify the desired features for middleware specialization and decompositionHow to:
• Reason about the middleware features desired by the application/application family?
• Modularize of middleware along domain concerns (i.e., Vertical Decomposition) without refactoring the middleware code?
17
Resolution 1: Feature Oriented Reverse Engineering based Middleware Specializations (FORMS)
FORMS: Automated Inference of desired Middleware Features
• Utilizes a lookup table that provides PIM-to-PSM mapping
FORMS: Closure Computation• Coarsely Prunes the middleware by finding the
feature modules using a novel closure computation algorithm by recursively inspecting source code dependencies
FORMS: Feature-Oriented Requirements Reasoning
• Utilizes a Feature Oriented Decision Tree
FORMS: Build Specialization• Specialize/Prune the middleware build
configurations and generate binaries using MPC perl scripts
Adaptation
Deduction
Transformation
Generation
Inference
Specification
Specialization Lifecycle
Resolution 1: Feature Oriented Reverse Engineering based Middleware Specializations (FORMS)
Original ACE (Adaptive Communication Environment) middleware • 1,388 PSM source files• 436 features• 2,456 KB static footprint
Specialized ACE middleware• ~500 PSM source files
64% reduction• ~ 100-175 features
60-76% reduction• ~ 1,500 KB footprint
41% reduction
RR
EagerRead-Write
ThreadLocking
Concurrent
Client-Server
Reactive
Callback
Asynchronous
Solution Approach: Facilitate Vertical Decomposition (along domain concerns) of features within a horizontally decomposed middleware without
refactoring the middleware code
System Type?
Server
RRFIFO
On-demandEagerRead-WriteScoped
ProcessThreadM.E.LockingPollingCallback
ReactiveIterativeConcurrentSynchronousAsynchronous
P2PClient-Server
Priority?
Strategy?Strategy?
Mechanism?Synchronization?Notification?
Request Handling?Data Delivery? Connection Mgmt?
Acceptor-Connector Reactive
Adaptation
Specification
Challenge 2: Fine-grained Pruning Middleware Specialization
Node ANode B
1 Collocated Components
2 Redundant Request Creation
2
2
3
Resolution of the same dispatch
4 Redundant de-marshaling checks
5Component Generality
3
How to:• Deduce the specialization context from application
invariants?• Infer the set of specializations from the context?• Identify the specialization points within the middleware
code?• Generate the specializations to improve developer
productivity?• Transform the middleware sources by executing
specializations?
Generation
SpecializationLifecycle
Deduction
Inference
Transformation
Specilization ContextsSpecilization ContextsSpecilization ContextsSpecilization ContextsSpecilization Contexts
Key Insight
GeMS: Automated Deduction of Specialization Context
• Interpreters that parse application models for application invariants that provide the context to drive specializations
Resolution 2: Generative Middleware Specializations (GeMS)
GeMS: Automated Inference of Specializations• Utilizes a lookup table for specializations that apply
GeMS: Specialization Transformation Generator• Finely Prunes the middleware sources by utilizing
design pattern optimization algorithms with the aid of a source inspection engine to determine the specialization points
GeMS: FOCUS Source code-level transformations
• Execute the generated transformations using FOCUS perl scripts
Specification
Adaptation
Generation
SpecializationLifecycle
Deduction
Inference
Transformation
Resolution 2: Generative Middleware Specializations (GeMS)1. Specification of desired
application features2. Deduction of
Specialization Context3. Inference
a. Map Application Invariants to Specializations from the catalog
b. Determine the middleware features from mappings
4. Generate Transformations through Algorithms
5. Transform middleware sources and build files into specialized forms
6. Compile to generate specialized middleware binaries
GeMS’s code generator substantially reduces middleware developer efforts
Adaptation
Resolution 2: Performance Metrics Evaluation of FORMS+GeMS Cumulative benefits of applying both FORMS and GeMS
when applied to The ACE ORB (TAO) specifically the ORB and POA frameworks
Static footprint is the size of compiled shared middleware library
Dynamic footprint is the combined average size of runtime executables of BasicSP application components each of which is running on a specialized middleware (TAO) version
10-15% savings if applied on individual specialization basis Substantial footprint reductions are mainly a result of
applying the FORMS closure computations Runtime performance improvements are mainly due to
GeMS framework optimizations
22
Specification
Generation
SpecializationLifecycle
Deduction
Inference
Transformation
Challenge 3: Augmentation of Application-Specific Semantics
Missing application-specific semantics (run-time middleware)• E.g., Group failover is DRE-specific &
often not provided as first class support out-of-the-box
However• Application-level solutions lose
transparency & reusability• It is costly to modify the middleware
manually
Therefore, automatic middleware instrumentation required to augment application-specificity
23
How to:1. Augment application-specific additional semantics in general-
purpose middleware retroactively?2. Automate the augmentation to improve productivity & reduce
cost?
Backup Fail Over Unit (FOU)
Primary Distributed Processing Unit
(DPU)A B C
A’ B’ C’
Inference
Adaptation
Resolution 3: Generative Aspects for Fault-Tolerance (GrAFT)
24
GrAFT: Aspect C++ Generator • Generates the application-specific semantics in
form of aspect code for fault handling and masking to enable transparent group failover
Deduction
Specification
Transformation
Generation
SpecializationLifecycle
GrAFT: Modeling Environment and Transformations
• Fine Grained Feature Augmentation• Provides model-to-text, model-to-code
transformations
GrAFT: Source code-level transformations• Finely Augments these application-specific
semantics in system artifacts by weaving in the generated aspects with application and client stubs
Resolution 3: Generative Aspects for Fault-Tolerance (GrAFT)
Specify Application-specific semantics of Group Failover
Parse application models and determine the components that require group failover semantics
Generate the fault detection, masking, and failover code through exception handling mechanisms
Weave in AspectC++ code in the generated code in the respective component stubs
25
Reconfigurable Conveyor Belt
System
GrAFT’s code generator completely eliminates middleware developer efforts
Relevant PublicationsFORMS Publications
1. FORMS: Feature-Oriented Reverse Engineering-based Middleware Specialization for Product-Lines, JSW 2011
2. Middleware Specialization for Product-lines using Feature Oriented Reverse Engineering, ITNG 2010
3. Developing and Evaluating a Taxonomy of Modularization Techniques for Middleware Specialization, ACoM 2008
4. Towards a Holistic Approach for Integrating Middleware with Software Product Lines Research, McGPLE 2008
First Author26
Second Author
GeMS Publications1. GeMS: An Automated Middleware
Specialization Process for Distributed Real-time and Embedded Systems, Elsevier-JSA 2012 (in submission)
2. A Generative Middleware Specialization Process for Distributed Real-time and Embedded Systems, ISORC 2011
3. Architecture-Driven Context-Specific Middleware Specializations for Distributed Real-time and Embedded Systems, LCTES-WIP 2010
4. An Approach to Middleware Specialization for Cyber Physical Systems, WCPS 2009
GrAFT Publications1. Fault-tolerance for Component-based Systems – An Automated Middleware
Specialization Approach, ISORC 20092. CQML: Aspect-oriented Modeling for Modularizing & Weaving QoS Concerns in
Component-based Systems, ECBS 20093. Towards A QoS Modeling & Modularization Framework for Component-based
Systems, AQuSerM 20084. MoPED: A Model-based Provisioning Engine for Dependability in Component-
based Distributed Real-time Embedded Systems, ECBS 2011
27
Presentation Road Map
Motivation Overview of Solution Approach: Automated
Middleware Specialization Process Research Area Focus Safe Middleware
Adaptation for Real-Time Fault-Tolerance Research Contributions Concluding Remarks
Motivation: Addressing Failures in System-of-Systems
Safety-critical applications such as in avionics, automotive, industrial automation domains• Composed of system-of-systems• Must handle variety of failures stemming from the
composition• Certified to be schedulable and guaranteed to meet
stringent QoS
However, individual subsystems have• Closed nature• Static execution schedules• Resource-constrained• Often over-provisioned in terms of allocated time
and required capacity of resources to guarantee predictability in worst-case scenarios
How to handle failures in system-of-systems in the context of• Closed and over-provisioned, individual
subsystems• Stringent real-time QoS assurance
28
Challenge 4: How to Safely and Predictably Adapt to Failures?
Fault tolerance solutions need additional resources, but• No additional resources are available
due to over-provisioning• Rigid execution schedules severely
constrain the extent of runtime failure adaptability in real-time
• Cannot compromise on system safety• Redesigning and reimplementing the
individual subsystems is not an option due to economic forces
Since system-of-systems are being formed, we need to think holistically about how to handle faults within this concept.
29
Primary Distributed Processing Unit
(DPU)A B C
Backup Fail Over Unit (FOU)
A1 B1 C1
Backup Fail Over Unit (FOU)
A2 B2 C2
?
Resource-Constrained X Over-Provisioned
Challenge 4: How to Safely and Predictably Adapt to Failures? Side Effects of Over-provisioning
Most of the time resources remain under-utilized Large amount of processor utilization and time
slack within each allocated task quantum that can be better leveraged for adaptive fault management
Therefore it is important to, • Identify availability of unused resources at runtime• Do No Harm Provision fast and resource-aware
failure adaptation to ensure safety and predictability while obeying real-time constraints
Key Insight Existence of significant slack in over-provisioned individual subsystems
30
Primary Distributed Processing Unit
(DPU)A B C
Backup Fail Over Unit (FOU)
A1 B1 C1
Backup Fail Over Unit (FOU)
A2 B2 C2
?
How to:1. Identify the opportunities for slack in the DRE execution schedule2. Design safe and predictable dynamic failure adaptation3. Validate system safety in the context of DRE system fault tolerance
31
Related Research: Middleware Adaptation TechniquesCategory Related Research (Middleware Adaptation Techniques)
Adaptive Passive Replication Systems
S. Pertet et. al., Proactive Recovery in Distributed CORBA Applications, in Proceedings of the IEEE International Conference on Dependable Systems & Networks (DSN 2004), Italy, 2004P. Katsaros et. al., Optimal Object State Transfer – Recovery Policies for Fault-tolerant Distributed Systems, in Proceedings of the IEEE International Conference on Dependable Systems & Networks (DSN 2004), Italy, 2004Z. Cai et. al., Utility-driven Proactive Management of Availability in Enterprise-scale Information Flows, In Proceedings of the ACM/IFIP/USENIX Middleware Conference (Middleware 2006), Melbourne, Australia, November 2006L. Froihofer et. al., Middleware Support for Adaptive Dependability, In Proceedings of the ACM/IFIP/USENIX Middleware Conference (Middleware 2007), Newport Beach, CA, November 2007
Load-Aware Adaptations of Fault-tolerance Configurations
T. Dumitras et. al., Fault-tolerant Middleware & the Magical 1%, In Proceedings of the ACM/IFIP/USENIX Middleware Conference (Middleware 2005), Grenoble, France, November 2005O. Marin et. al., DARX: A Framework for the Fault-tolerant Support of Agent Software, In Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE 2003), Denver, CO, November 2003S. Krishnamurthy et. al., An Adaptive Quality of Service Aware Middleware for Replicated Services, in IEEE Transactions on Parallel & Distributed Systems (IEEE TPDS), 2003
Runtime adaptations to reduce failure recovery times
Change of replication
styles, reduced degree of
active replication
32
Related Research: Real-time Fault-Tolerant Middleware & Software Health Management
Category Related Research (Real-time Fault-Tolerant Middleware)
Real-time Fault-tolerant Systems
D. Powell et. al., Distributed Fault-tolerance: Lessons from Delta-4, In IEEE MICRO, 1994K. H. Kim et. al., The PSTR/SNS Scheme for Real-time Fault-tolerance Via Active Object Replication & Network Surveillance, In IEEE Transactions on Knowledge & Data Engineering (IEEE TKDE), 2000S. Krishnamurthy et. al., Dynamic Replica Selection Algorithm for Tolerating Timing Faults, In the IEEE International Conference on Dependable Systems & Networks (DSN 2001), 2001H. Zou et. al., A Real-time Primary Backup Replication Service, in IEEE Transactions on Parallel & Distributed Systems (IEEE TPDS), 1999
Schedulability analysis to schedule
backups in case primary replica fails,
faster processing
times
Category Related Research (Software Health Management)
Software Health Management
A. Dubey, et. al., A deliberative reasoner for model-based software health management, In the International Conference on Autonomic and Autonomous Systems, 2012, A. Srivastava et. al., The case for software health management, In the IEEE International Conference on Space Mission Challenges for Information Technology (SMCIT), 2011
Detect , Diagnose and Reason only
known failures with
predefined failover
strategies
Adaptive Fault Tolerance (AFT) approaches improve overall resource utilizations, however• Mostly applied to soft real-time applications• Require additional resources consuming
precious time from the real-time schedule• Excessively dynamic
Related Research: Middleware Composition Techniques
33
Category Related Research (Middleware Composition)
CORBA-based Fault-tolerant Middleware Systems
P. Felber et. al., Experiences, Approaches, & Challenges in Building Fault-tolerant CORBA Systems, in IEEE Transactions on Computers, May 2004T. Bennani et. al., Implementing Simple Replication Protocols Using CORBA Portable Interceptors & Java Serialization, in Proceedings of the IEEE International Conference on Dependable Systems & Networks (DSN 2004), Italy, 2004P. Narasimhan et. al., MEAD: Support for Real-time Fault-tolerant CORBA, in Concurrency & Computation: Practice & Experience, 2005
QoS-specific Middleware Customizations
Wolf et al., “Supporting Component-based Failover Units in Middleware for Distributed Real-time and Embedded Systems”, Elsevier JSA 2010Wang et al., “Total Quality of Service Provisioning in Middleware and Applications”, Elsevier JMM 2003Balasubramanian et al., “Evaluating Techniques for Dynamic Component Updating”, DOA 2005
Middleware building blocks
for fault-tolerant systems
Focus only on composing one QoS at a time
Software Health Management (SHM) approaches ensure safe and predictable adaptations, however• Apply to only errors in components
implementations known a priori• Support only predefined failover strategies• Are resource agnostic
Resolution 4: Safe Middleware Adaptation for Real-Time Fault Tolerance (SafeMAT)
Safe Middleware Adaptation for Real-Time Fault-Tolerance (SafeMAT)
• Fine-grained middleware adaptation to failures while maintaining safety, predictability and improving resource utilizations within the hard real-time constraints
How do we safely adapt middleware to runtime failures while maintaining predictability in
real-time?
Deduction
Inference
Transformation
Generation
Specification
Adaptation
SpecializationLifecycle
SafeMAT: Platform Assumptions Build upon and leverage ACM (ARINC Component Model) Middleware - an
emulation of the avionics ARINC-653 specification for time and space partitioning in safety-critical real-time operating systems
Hierarchical fixed priority preemptive task model
35
• Specifies the platform in terms of modules (processors) that are composed of one or more partitions (processes) allocated as tasks
• Each partition has one or more components allocated as sub-tasks
• Each partition has dedicated execution time and memory space allocated and executes at highest priority and can only be preempted when it is allocated time quantum expires.
SafeMAT: System Model and Fault Handling
Component Execution States• Active – all ports are operational• Semi-Active – required ports operational,
provided ports disabled• Inactive – none of the ports are operational
Fail-Stop failures Semi-Active replication due to hard
real-time constraints and to avoid state synchronization overhead• One primary replica – active state –
handles all client requests• Multiple backup replicas – semi-active state
– only process client’s requests; not produce any output.
Failure Granularity• Component• Component Group• Partition (process)• Module (processor)
36
Two primary sources of failure for each component port • Logical Failure – component failure due to
internal software, concurrency & environmental faults, latent bugs in the developer code
• Critical Failure – process/processor failures, undetected component failures
Failover Strategies• Logical Failure – failover to alternate
backup replica only• Critical Failure – failover to identical or
alternate backup replicas
Replica Placement • Identical Replica – Always deploy to
different partition than primary• Alternate Replica – Can be deployed within
same partition
SafeMAT: Key Requirements Fine-grained resource monitoring capability
required that provides real-time utilization while not imposing significant overhead on the system to enable failure adaptation in real-time while utilizing available slack
Dynamic failure adaptation should -• Tolerate different failure types and granularities• Achieve better resource utilization• Safely and Predictably achieve failure recovery
To reduce the extent of recovery required, dynamic failure adaptation should be• Be Fast & Lightweight• Obey hard real-time constraints (predictability)• Flexible (account for failure type, granularity, replica
placements)
37
Primary Distributed Processing Unit
(DPU)A B C
Backup Fail Over Unit (FOU)
A1 B1 C1
Backup Fail Over Unit (FOU)
A2 B2 C2
?
ACM Middleware: Architecture
System Level • Module Scheduler• Alarm Aggregator• Diagnoser• Deliberative Reasoner (DR)
Module Level • Partition Creator• Partition Scheduler • Module Initializer
Partition Level• CLHM• Partition Initializer• Components
38A. Dubey, G. Karsai, and N. Mahadevan, “A component
model for hardreal-time systems: CCM with ARINC-653,” SPE 2011
SafeMAT: Architecture
System Level • Module Scheduler• System Resource Manager (sRM)• Failure Handler• Alarm Aggregator• Diagnoser• Resource Aware Deliberative Reasoner
(RADaR)
Module Level • Partition Scheduler • Module Resource Monitor (mRM)• Failure Handler
Partition Manager Level• Partition Launcher• Partition Resource Monitor in compute
mode (pRMc)• Failure Handler
Partition Level• CLHM• Partition Resource Monitor in notify
mode (pRMn)• Components
39
SafeMAT: Safe and Fast Failure Detection and Isolation Partition Manager
• Handles execution and failure management of each partition • Detects and Isolates the impact of failed partitions• Relieves partition recovery responsibility from the Module Manager• Enhances safety by preventing failed partitions from impacting the real-time
execution • Enables quick recovery by coordinating with RADaR
40
• Prevents partition resynchronization upon restart
• Instructs the dependent partitions to reread the object references if facet side is restarted
• Enables monitoring of partition resource utilizations through pRMc
SafeMAT: Distributed Resource Monitor (DRM)
Framework Components• Single System Resource Monitor (sRM)• Multiple Module Resource Monitors (mRM)• Multiple Partition Resource Monitors in
either compute (pRMc) or notify (pRMn) modes
Resource Liveness Monitoring• Auxiliary to failure handlers that monitor
exit statuses of partitions and their managers - dual monitoring capability
• Periodically collects liveness statuses from each of its own monitors to determine partition, partition manager, module failures
41
Configuration• Ability to monitor CPU utilizations at various
granularities of processor, process, component group, component and thread
• Can operate in reactive (on-demand) or periodic (collect history) modes
• Can report utilizations of only specific entities RADaR is interested in
Discovering Resource Allocations• Dynamically discover the exact runtime
allocations of threads to components, components to partitions, and partitions to modules to enable fast monitoring
Monitors Utilization and Liveness of distributed resources
SafeMAT: Enabling Hierarchical Failure Adaptation (HFA)
Component Failure Type• Logical• Critical
Various Failure Granularities due to Hierarchically Scheduled Real-time System• Component• Component Group (e.g. Subsystem)• Partition• Module
Primary-Backup Deployment Topology• Only alternate backup replicas on same partition as
primary• Identical backup replicas always on different partition• Both identical & alternate backup replicas on different
partition as primary• Same module • Different module
42
Adapt failover targets based upon failure type, granularity and backup replica placement
Capable of handling simultaneous module, partition, logical and critical component failures
SafeMAT: The Hierarchical Failover Adaptation (HFA) Algorithm Invoked whenever any of the
DRM and/or the SLHM frameworks detect a failure
To provide quick and efficient failover• sRM proactively pre-computes the
sorted list of least utilized backups • Sends the sorted list to the RADaR
piggybacked with the failed primaries
Hands over control to the SLHM which decides when to initiate failover that depends upon• # failures system can withstand• Time for system to stabilize (usually
at least a Hyperperiod long)
Ability to • Intelligently mitigate simultaneous
failures in an hierarchical fashion• Choose failover target not just
based on current utilization but also based on historical averages 43
SafeMAT: Empirical Evaluation
Inertial Measuring Unit (IMU)• 4 subsystem types • 7 primaries, 4 backups• 55 components• All secondary subsystems
are semi-actively replicated
ADIRU subsystem can withstand 2 accelerometer component failures
GPS and ADIRU subsystem run at 0.1 Hz and 1 Hz respecitvely
PFC fetches GPS data at 0.1 Hz
Display fetches PFC data at 1 Hz 44
A. Dubey, N. Mahadevan, and G. Karsai, “The inertial measurement
unit example: A software health management case study,” ISIS Tech. Rep., Vanderbilt University, 02/2012
SafeMAT: Performance Metric Evaluation (PME) – Runtime Utilization Overhead
We executed the IMU system for 100 iterations for faulty scenarios for both ACM-SHM and SafeMAT
We artificially introduced failures at 15, 20, 30, 35 iterations in the GPS Processor, Accelerometers 6, 5 and 4 respectively such that the values outputted by them are exceedingly high
SafeMAT added only 2-6% utilization overhead on the top of ACM-SHM
Do No Harm SafeMAT added negligible runtime utilization overhead thereby not overloading the system while performing better failure recovery within the available utilization slack 45
SafeMAT: Performance Metric Evaluation (PME) – Runtime Failover Overhead
Impact of Replica Placement We used the Boeing’s BasicSP
scenario to demonstrate the impact of replica placement
We altered the deployments of the backup replicas in three ways - • Same Partition• Different Partition, Same Module• Different Partition, Different Module
4646
SafeMAT: Performance Metric Evaluation (PME) – Runtime Failover Overhead
Impact of Replica Placement We used the Boeing’s BasicSP
scenario to demonstrate the impact of replica placement
We altered the deployments of the backup replicas in three ways -• Same Partition• Different Partition, Same Module• Different Partition, Different Module
SafeMAT roughly added 63-70%
4747
work over ACM-SHM when recovering only one component at a time
SafeMAT: Performance Metric Evaluation (PME) – Runtime Failover Overhead
Impact of Recovery Group Size We tested for different subsystems
within the IMU and BasicSP scenarios SafeMAT added only 9-15% runtime
failover overhead over groups of components
Recovery Times are dependent upon• Size of component group• Component deployments within the failover
group• Amount of network communication within the
DRM
Costs are amortized for large group sizes than individual components
No missed deadlines, application jitter was unaffected
Do No Harm SafeMAT added negligible runtime failover overhead thereby maintaining the predictability of the overall system 48
49
Presentation Road Map
Motivation Overview of Solution Approach: Automated
Middleware Specialization Process Research Area Focus Safe Middleware
Adaptation for Real-Time Fault-Tolerance Research Contributions Concluding Remarks
Doctoral Dissertation ContributionsPrinciples for Safe and Automated Middleware Specializations for
Distributed Real-time Embedded Systems
Focus Area Challenge Approach Contribution
Contemporary Middleware
Specializations
• Systematic & Automated Specialization Process
• DSML that validates QoS configuration & generates implementation artifacts
Specialization Taxonomy & Process Lifecycle
Feature Oriented Requirements Reasoning and Specializations
• Deduction of Middleware Requirements
• Automate Build Specialization
• Decision Tree-based reasoning and inference of middleware features
• Reverse Engineering to specialize Middleware build
FORMS• Coarse Grained
Feature pruning & Footprint reduction
Generative Middleware
Specializations
• Automatically Deduce Invariants
• Automatically Realize Specializations
• Autonomic determination of specializations
• Automatic generation of specializations
GeMS• Fine Grained
Throughput Latency & Improvement
Augmenting Domain
Specificity
• Transparently provision Reliability
• Automated generation of fault handling & masking aspects
GrAFT• Fine Grained
Feature augmentation
Safe Specializations
• Safely and predictably provision Reliability
• Predictably adapt to different failure types, granularity and deployment in real-time constraints
SafeMAT• Fine Grained
Resource-Aware Adaptations
50
Conference Publications4. Akshay Dabholkar, Abhishek Dubey and Aniruddha Gokhale (Oct 2012) Reliable
Distributed Real-time and Embedded Systems Through Safe Middleware Adaptation (In submission to) 31st International Symposium on Reliable Distributed Systems (SRDS 2012), Irvine, California, USA
5. Akshay Dabholkar, and Aniruddha Gokhale (March 2011) A Generative Middleware Specialization Process for Distributed Real-time and Embedded Systems Proceedings of the 14th IEEE International Symposium on Object/Component/Service-oriented Real-time Distributed Computing (ISORC 2011), Newport Beach, CA, USA
6. Akshay Dabholkar, and Aniruddha Gokhale (April 2010) Middleware Specialization for Product-lines using Feature Oriented Reverse Engineering Proceedings of the 7th International Conference on Information Technology : New Generations (ITNG 2010), Las Vegas, NV, USA.
Summary of Publications & Presentations
51
Journal Publications1. Akshay Dabholkar, Abhishek Dubey and Aniruddha Gokhale (2012) SafeMAT: Safe
Middleware Adaptation for Predictable Fault-Tolerant Distributed Real-time and Embedded Systems (In submission)
2. Akshay Dabholkar, and Aniruddha Gokhale (2012). AutoGeMS: An Automated and Generative Middleware Specializations Process for Distributed Real-time and Embedded Systems, (Submitted to) Elsevier Journal of Software Architecture (JSA 2012).
3. Akshay Dabholkar, and Aniruddha Gokhale.(April 2011). FORMS: Feature-Oriented Reverse Engineering-based Middleware Specialization for Product-Lines, Journal of Software Special Issue on Middleware and Network Application (JSW 2011), Vol.6, No.4
First Author
Conference Publications (cont.)7. Sumant Tambe, Akshay Dabholkar, and Aniruddha Gokhale (April 2011) MoPED: A
Model-based Provisioning Engine for Dependability in Component-based Distributed Real-time Embedded Systems. Proceedings of the 18th IEEE International Conference and Workshops on the Engineering of Computer Based Systems (ECBS 2011), Las Vegas, NV, USA
8. Sumant Tambe,Akshay Dabholkar, and Aniruddha Gokhale (April 2009). CQML: Aspect-oriented Modeling for Modularizing and Weaving QoS Concerns in Component-based Systems, Proceedings of the 16th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS 2009), San Francisco, CA, USA.
9. Sumant Tambe, Akshay Dabholkar, and Aniruddha Gokhale (March 2009). Fault-tolerance for Component-based Systems – An Automated Middleware Specialization Approach, Proceedings of The 12th IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 2009), Tokyo, Japan.
10.Nilabja Roy, Akshay Dabholkar, Natham Hamm, Larry Dowdy, and Douglas Schmidt (Sep 2008). Modeling Software Contention using Colored Petri Nets Proceedings of the 16th Annual Meeting of the IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008), Baltimore, MD, USA.
52Second Author
Technical Reports11.Sumant Tambe,Akshay Dabholkar, Amogh Kavimandan, and Aniruddha Gokhale
(June 2007) A Platform Independent Component QoS Modeling Language for Distributed
Real-time and Embedded Systems. Technical Report ISIS-07-809, Institute for Software
Integrated Systems, Vanderbilt University, Nashville, TN, USA.
Summary of Publications & Presentations
First Author Second Author
Workshop Publications12.Akshay Dabholkar, and Aniruddha Gokhale (March 2011). Safe Specialization of
the LwCCM Container for Simultaneous Provisioning of Multiple QoS, Proceedings of OMG’s Workshop on Real-time, Embedded and Enterprise-Scale Time-Critical Systems (OMG RTWS 2011), Washington DC, USA.
13.Akshay Dabholkar, and Aniruddha Gokhale (June 2009). An Approach to Middleware Specialization for Cyber Physical Systems, Proceedings of The 2nd International Workshop on Cyber-Physical Systems (WCPS 2009), Co-located with ICDCS 2009 pp. 73–79 Montreal, Quebec, Canada.
14.Akshay Dabholkar, and Aniruddha Gokhale (Oct 2008). Developing and Evaluating a Taxonomy of Modularization Techniques for Middleware Specialization Proceedings of the 2nd OOPSLA Workshop on Assessment of Contemporary Modularization Techniques (ACoM), Nashville, TN, USA.
15.Aniruddha Gokhale, Akshay Dabholkar, and Sumant Tambe (Oct 2008). Towards a Holistic Approach for Integrating Middleware with Software Product Lines Research Proceedings of the GPCE Workshop on Modularization, Composition and Generative Techniques in Product Line Engineering, (McGPLE), Nashville, TN, USA.
16.Sumant Tambe, Akshay Dabholkar, and Aniruddha Gokhale, & Amogh Kavimandan (Sep 2008). CQML: A QoS Modeling and Modularization Framework for Component-based Systems, Proceedings of the 3rd EDOC Workshop Advances in Quality of Service Management, (AQuSerM), München, Germany.
53
Summary of Publications & Presentations
Poster Publications17. Akshay Dabholkar, and Aniruddha Gokhale (April 2010). Architecture-Driven Context-
Specific Middleware Specializations for Distributed Real-time and Embedded Systems Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES-WIP-PS), Stockholm, Sweden
18. Akshay Dabholkar, Sumant Tambe and Aniruddha Gokhale (April 2009). An Systematic Approach to Middleware Specialization for Cyber Physical Systems Published in the Proceedings of the Cyber Physical Systems Week 2009 San Francisco CA, USA.
19. Akshay Dabholkar, and Aniruddha Gokhale (July 2008). Towards Employing End-to-End Middleware Specialization Techniques Proceedings of OMG’s Annual Real-time and Embedded Systems workshop (OMG RTWS) Washington, DC, USA.
20. Joe Hoffert, Akshay Dabholkar, Aniruddha Gokhale, & Douglas Schmidt (March 2007). Enhancing Security in Ultra-Large Scale (ULS) Systems using Domain-specific Modeling. Spring 2007 Conference for Team for Research in Ubiquitous Secure Technology (TRUST), Berkeley, CA.
54First Author Second Author
Summary of Publications & Presentations
Lack of common reasoning vocabulary and systematic specialization process• Middleware Specialization Taxonomy and Lifecycle
Process
Forward Engineering though systematic and elegant does not vertically decompose middleware implementations along domain concerns Feature Oriented Reverse Engineering based
Middleware Specializations (FORMS)
Generative techniques based on source code analysis offer a promising approach for automating the specialization process• Generative Middleware Specializations (GeMS)• Generative Aspects for Fault-Tolerance (GrAFT)
Middleware Adaptation to runtime failures needs to be safe and predictable in order to be viable for hard real-time systems• Safe Middleware Adaptations for Real-Time Fault
Tolerance (SafeMAT)
Concluding Remarks